Introduction

Systemic lupus erythematosus (SLE) is an autoimmune disease characterized by dysregulated interferon responses and loss of tolerance to self-antigens. Deposition of immune complexes in tissues results in local and systemic inflammation, often progressing to organ dysfunction and failure. Recent genome-wide association studies (GWAS) in SLE subjects of European ancestry have identified several new and confirmed risk loci.1, 2, 3, 4 Our group reported association of genetic variants in the region of TNFAIP3 with human SLE and identified two independent genetic risk effects.2 One effect, identified by rs6920220, is located approximately 185 kb upstream of TNFAIP3 and is also associated with risk for rheumatoid arthritis (RA).2, 5, 6, 7 The second effect, spanning the TNFAIP3 locus, is comprised of a low-frequency haplotype (2% in European ancestry) marked by rs10499197, rs5029939, rs7749323 and the nonsynonymous variant rs2230926.2, 8

TNFAIP3 encodes A20, a zinc-finger protein required for efficient termination of the nuclear factor (NF)-κB signaling axis downstream of TNFR, TLR, IL1R and NOD2.9, 10, 11, 12, 13 A20 is a unique dual-function ubiquitin-editing enzyme that catalyzes the deubiquitylation of several NF-κB pathway proteins, including TRAF6, RIP1, RIP2 and IKKγ/NEMO9, 11, 12, 14, 15 through an amino-terminal ovarian tumor domain. The carboxy terminal zinc finger domain of A20 functions as an E3 ubiquitin ligase catalyzing K48-linked ubiquitylation of substrate proteins, which targets them for proteosome degradation.12 The importance of A20 in attenuating NF-κB is evident in mice engineered to lack expression of A20, which die in the first 6 weeks of life due to uncontrolled systemic organ inflammation.11

To further characterize and determine the magnitude of the association signals in the region of TNFAIP3, we performed a meta-analysis of existing genotype data from two independent case–control datasets and one trio family dataset. In addition, we tested for evidence of genetic association in silico by imputing genotypes from the Phase II HapMap over a 5 Mb window spanning TNFAIP3, using our previously published GWAS dataset as the source of observed genotypes. We then tested whether the TNFAIP3 SLE risk haplotype defined by risk and non-risk alleles at rs5029939 were associated with specific SLE subphenotypes.

Results

Meta-analysis was performed with genotype data from four single nucleotide polymorphisms (SNPs) in a sample set comprised of 1453 independent SLE cases, 3381 independent control subjects and 713 independent trio families. The results of the meta-analysis demonstrated significant association approaching or exceeding criteria for genome-wide significance (P<1 × 10−8) for all SNPs (Table 1). SNP rs5029939 located in the second intron of TNFAIP3 and originally identified in our GWAS, produced a convincing meta-analysis P value of 1.67 × 10−14 and a combined odds ratio (OR) of 2.09 (95% confidence interval (CI): 1.68–2.60) in the case–control datasets (Table 1). In our previous study we reported an OR equal to 2.28 (95% CI: 1.71–3.06) for marker rs5029939.2 Note that while the meta-analysis OR at rs5029939 decreased to 2.09, the 95% CI around this OR was reduced, indicating an improvement in the precision of the estimate, a primary goal of meta-analysis. In SLE cases of European ancestry, HLA-DR3 and HLA-DR2 alleles are the only risk alleles to consistently demonstrate OR near 2.0 or higher.16 These results suggest that the genetic effect size marked by rs5029939 in the TNFAIP3 gene is similar to that of the HLA and larger than any of the recently identified SLE risk genes including IRF5,17 STAT4,18 BLK3 and ITGAM.19 Given the relatively low frequency of the risk allele at rs5029939, this effect aligns with the common disease, rare variant hypothesis of complex genetic disease.

Table 1 Study-specific and meta-analysis association results for four SNPs in the region of TNFAIP3

Genotypes were imputed to determine the contribution of untyped variants to the genetic association in the region of TNFAIP3 and to further define the boundaries of the SLE risk haplotype. Imputation was performed over a 5 Mb (135–140 Mb) interval centered on TNFAIP3 from marker rs4896151 to marker rs1977772 on chromosome 6q using our previously published GWAS dataset as the source of observed genotypes and the Phase II HapMap as the source of imputed genotypes.20 In addition to TNFAIP3, this interval contains at least 20 genes, some with a possible role in the immune system such as interleukin 20 receptor alpha (IL20RA), interleukin 22 receptor alpha (IL22RA), interferon gamma receptor 1 (INFGR1) and mitogen-activated protein kinase kinase kinase 5 (MAP3K5). Also included was the region upstream of TNFAIP3 associated with risk for RA6, 7 and the region downstream of TNFAIP3 near PERP, recently reported to be associated with SLE.8 Imputation expanded the number of SNPs in the 5 Mb region from 390 observed SNPs to 3670 total SNPs. Following exclusion of imputed SNPs based on quality control measures (information scores <0.7 and/or 2 proxy SNPs used to impute any give SNP (NPRX)), 2497 SNPs remained in the final imputed dataset (Figure 1).

Figure 1
figure 1

Results of imputation across a 5 Mb Region Centered on TNFAIP3. (a) Results showing full 5 Mb imputation interval. Imputed single nucleotide polymorphisms (SNPs) are in red and observed SNPs in blue. Locations of genes flanking TNFAIP3 are indicated at the top. (b) Expanded view of region surrounding TNFAIP3. Eleven imputed SNPs that demonstrated association with systemic lupus erythematosus (SLE; P value <1 × 10−4) are represented as red triangles. The observed SNPs, rs10499197, rs5029939 and rs7749323 that demonstrate significant association are represented as blue diamonds.

The strongest association signals were detected in the vicinity of TNFAIP3 (Figure 1) with both observed and imputed SNPs. No other region in the 5MB interval reached significance at P<1.0 × 10−4, including variants in the region 185 kb upstream associated with RA and SLE or the region near PERP (Supplementary Table 1). In contrast, 11 imputed SNPs spanning the TNFAIP3 locus demonstrated association with SLE at P<1.0 × 10−4 (Table 2). Imputation accuracy for all 11 SNPs was greater than 99%. The concordance rates between observed genotypes and imputed genotypes for the three observed SNPs (rs10499197, rs5029939, rs7749323) exceeded 99%, indicating robust imputation over this region. SNP rs5029939 was the most statistically significant variant among all observed and imputed SNPs (Table 2). The exon 3 missense SNP, rs2230926, was not imputed as it was not present in either the GWAS or HapMap datasets, however, rs5029939 is in strong linkage disequilibrium (LD; r2=0.99) and may indeed incorporate the effect at rs2230926.2

Table 2 Eleven imputed SNPs and three observed SNPs are associated with SLE in the region of TNFAIP3

Imputation also defined the length of the associated TNFAIP3 risk haplotype. Before imputation, association with SNPs on the 3′ end extended as far as rs7749323 and following imputation, additional SNPs extended the risk haplotype approximately 12 kb downstream to marker rs6932056, resulting in risk segment approximately 109 kb in length.

The distribution of allele frequencies and OR for the imputed SNPs was consistent with the presence of more than one haplotype. Therefore, we evaluated the haplotypic and LD relationships for the observed and imputed SNPs listed in Table 2. Three haplotypes with frequency greater than 1% were identified (Figure 2). Conditional logistic regression analysis implemented in PLINK21 was used to determine if the haplotypes contributed independent genetic risk for SLE using haplotype 1 as the reference haplotype. The omnibus likelihood ratio test (LRT) yielded a P value=4.0 × 10−4, consistent with the fact that variants in the region of TNFAIP3 influence risk for SLE. The analysis demonstrated an independent effect for haplotype 3 (LRT P=1.0 × 10−4) but not haplotype 2 (LRT P=0.55; Figure 2). Conditioning on haplotype 3 in comparison to the reference haplotype resulted in no evidence of association (LRT P=0.42). In contrast, significant evidence of association was seen when the reference haplotype was conditioned on haplotype 2 (LRT P=9.7 × 10−5; Figure 2). These results support the conclusion that genetic variation carried on haplotype 3 is responsible for the association with SLE. As was seen in the meta-analysis, SNP alleles carried exclusively on haplotype 3 produced OR 2.0.

Figure 2
figure 2

Conditional haplotype analyses for the imputed TNFAIP3 risk haplotype. Three haplotypes are shown with frequencies >1%. Imputed single nucleotide polymorphisms (SNPs) are in black font and observed SNPs are in blue font. Linkage disequilibrium (LD) relationships (r2) are shown in the figure below the table with black diamonds corresponding to high LD (r2=0.75–1.0) and gray diamonds corresponding to low (r2<0.5). LRT, likelihood ratio test.

Clinical data were available for 1351 female SLE cases of European descent and were used to define SLE subphenotypes based on revised American College of Rheumatology (ACR) criteria (malar rash, discoid rash, photosensitivity, oral ulcers, arthritis, serositis, nephritis, neurologic disorder, hematologic disorder, antinuclear antibody and immunologic disorder)22, 23 and presence of anti-Ro/SSA and anti-La/SSB autoantibodies (Table 3). Case subsets were compared to a group of 1172 female control subjects without a personal or family history of autoimmune disease. For comparison of anti-Ro/SSA and anti-La/SSB antibodies, the control group consisted of 348 subjects that were negative for these autoantibodies by serologic testing. Association analysis was performed by comparing of the frequencies of the risk (C) and non-risk (G) genotypes at rs5029939, which tags the SLE risk haplotype. There were 144 SLE cases (frequency=0.057) and 71 control subjects (frequency=0.031) that carried the CG genotype. Frequencies for the CC genotype in cases and control subjects were low (CC cases=0.006, CC control subjects=0) precluding analysis of the CC genotype.

Table 3 Association of alleles at rs5029939 with SLE subphenotypes compared to healthy controls

Of the subphenotypes evaluated, nephritis and hematologic disorder demonstrated lower P values and higher attributable risk and OR compared to the SLE phenotype, even though only 28 and 56% of the cases, respectively, were used in the analyses of these phenotypes (Table 3). Note that the analysis of the SLE phenotype without the nephritis cases or the hematologic cases resulted in an increase in the P value from 3.75 × 10−5 to 0.0012 and 0.01, respectively (Table 4). Excluding both the nephritis and hematologic subphenotypes from the SLE phenotype resulted in a nonsignificant association with the CG genotype of rs5029939 (P value=0.053). Taken together, these results suggest that SLE patients with the CG genotype at rs5029939 are over twofold more likely to develop lupus nephritis and/or hematologic manifestations compared to SLE patients with the GG genotype.

Table 4 Conditional analysis of clinical traits

We then performed an analysis of SLE cases only, stratified by SLE subphenotypes. This analysis failed to produce any statistically significant associations for any of the subphenotypes (Supplementary Table 2). This is likely due to reduced statistical power due to the smaller sample sizes that result when using only SLE cases. Considering the nephritis subphenotype for example, power analyses suggest that we would need approximately 560 cases (SLE with nephritis) to detect an effect size similar to the case–control results, whereas our data included 379 lupus nephritis subjects (Supplementary Table 2).

Next we evaluated if clusters of clinical subphenotypes were associated with the risk allele at rs5029939. To define the clusters we used a principle components approach, which produced five clusters from 10 of the 11 ACR criteria evaluated, the first three of which explained 56.5% of the total variation. Antinuclear antibodies were present in 98% of the case subjects, which precluded clustering of this subphenotype. Overall, the subphenotypes within each of the five clusters (cluster 1 – malar rash, photosensitivity and oral ulcers; cluster 2 – renal, immunologic and hematologic manifestations; cluster 3 – arthritis and serositis; cluster 4 – neurologic disorder; cluster 5 – discoid rash) were moderately correlated (0.33–0.57), but no correlation was observed with variables outside their respective clusters (Supplementary Table 3). A component score was then estimated for each cluster and each case subject using the principle components derived from the clustering procedure, generating five new covariates for each case subject. Logistic regression analysis was performed using rs5029939 (omitting homozygous risk individuals) as the dependent variable and SLE and the cluster component score as independent variables (Table 5). Only the results for the first three clusters are reported as clusters 4 and 5 explained only 2.2% of the total variation and subphenotypes within these clusters were not associated with rs5029939. In line with our analysis using individual subphenotypes, cluster 2 (renal, hematologic and immunologic manifestations) demonstrated a better fit to the model when compared to cluster 1, cluster 3 or SLE (Table 5). Importantly, when SLE was adjusted for cluster 2, association with rs5029939 was insignificant (Wald P=0.81), yet association with SLE remained when adjusting for cluster 1 (P=0.04) or cluster 3 (P=0.06). These results suggest the association between rs5029939 and subjects with renal, hematologic and immunologic manifestations is not due to confounding with SLE but rather represents a subphenotype specific genetic effect.

Table 5 Logistic regression of rs5029939 with SLE and cluster component scores

Discussion

Genome-wide association scans in human SLE have been successful in identifying novel risk loci.1, 2, 3, 4 Our group recently identified association between SLE and variants in the region of TNFAIP3, the gene encoding the ubiquitin-modifying enzyme A20.2 Genetic association in the region of TNFAIP3 has also been described for other autoimmune diseases including RA and Crohn's disease.5, 6, 7 For SLE and RA, TNFAIP3 is a genetically complex locus. The region 185 kb upstream of TNFAIP3, confers risk for both RA and SLE.2, 5, 6, 7, 8 A nearby independent effect that confers protection for RA has been inconsistently observed in SLE.2, 5, 6, 7, 8 Directly surrounding TNFAIP3 we previously reported an independent haplotype associated with SLE defined by a highly correlated (r2>0.98) set of SNPs (rs10499197, rs5029939, rs2230926 and rs7749323), an effect that has been replicated in an independent SLE cohort.2, 8 Whether this haplotype confers risk for RA is unknown. Finally, another association was reported in the region near PERP located 240 kb downstream of TNFAIP3;8 this effect awaits replication in either SLE or RA.

The meta-analysis and imputation results presented here further support the association between SLE and variants within and flanking the TNFAIP3 gene. Specifically, the evidence for association was strengthened for four SNPs through meta-analysis of 1453 SLE cases, 3381 control subjects and 713 independent SLE trio families. The strongest association was located at marker rs5029939 in intron 2 of TNFAIP3. Marker rs5029939 is a proxy for rs2230926, which results in a phenylalanine to cysteine substitution at position 127 of A20. Preliminary data suggest that the 127C allele may be less efficient in attenuating NF-κB signaling,8 however, additional work is necessary to determine if this effect is seen in cells that carry the specific genotypes. Most importantly, the meta-analysis improves our confidence in the estimate of the OR for the rs5029939 risk allele compared with our previous study. The OR for this marker and others in LD with it is approximately 2.0, thus approaching the magnitude seen only in HLA-DR3 and DR2 alleles in SLE patients of European ancestry.

Our imputation results provide support for 11 new SNPs that together with the 3 observed SNPs from our GWAS form a 109 kb SLE risk haplotype. All the SNPs on this haplotype are highly correlated thus it is not possible with the current dataset to determine if the functional allele is among the 14 identified SNPs or remains undiscovered. Preliminary bioinformatic analysis (not shown) does not support an obvious functional role for any of the observed or imputed SNPs with the exception of rs2230926 described above.

Apart from the association at rs2230926, which our data support through proxy SNP rs5029939, our imputation fails to support the other associations with SLE described by Musone et al.8 In the region 185 kb upstream of TNFAIP3, Musone et al. reported association with two protective variants, rs13192841 and rs12527282.8 These SNPs are proxies for rs10499194, which was associated with a protective effect in RA;7 an effect that was absent in our previously published SLE study.2 Similarly, the results from our imputation failed to reveal evidence for association at any of these markers (Supplementary Table 1). Therefore, we believe the presence of a protective association 185 kb upstream of TNFAIP3 is still in question for SLE. The previously reported association in the region of PERP marked by rs6922466 was also not significant in our imputation analysis (Supplementary Table 1). We acknowledge that our imputed dataset is likely underpowered to detect an association that was not detected in our observed GWAS dataset. In addition, locus and/or genetic heterogeneity in these regions may lead to association signals that are not reproducibly observed in independent SLE sample collections. Genotyping these and additional variants in larger SLE cohorts will be necessary to further characterize these associations.

Our analysis of SLE clinical subphenotypes shows that subjects heterozygous for the TNFAIP3 risk (CG) genotype at rs5029939 were over two times more likely to experience lupus nephritis (OR=2.3) and/or hematologic manifestations (OR=2.06) than GG homozygotes. This observation, combined with the lack of association when the SLE cases with nephritis or hematologic manifestations were removed from the analysis, suggests that the SLE risk haplotype at TNFAIP3 influences the development of these SLE subphenotypes. To more precisely determine if the TNFAIP3 risk haplotype directly influences risk for developing nephritis and SLE associated hematologic disorders, an analysis of a larger number of SLE case subjects stratified by presence or absence of nephritis and hematologic manifestations would be needed. The current case-only dataset was underpowered for this type of analysis and did not produce any statistically significant findings (Supplementary Table 2). We estimate that to validate an association between lupus nephritis and the CG genotype at rs5029939 that produces an OR of 2.3 as reported here using a case only approach would require approximately 560 SLE cases.

In summary, genetic variation in the region of TNFAIP3 has been shown to influence risk for human SLE. Our results support a potent genetic effect (OR2) located on a 109 kb DNA segment in the region of TNFAIP3. Lupus nephritis and hematologic disorders are among the most severe manifestations observed in the clinical management of SLE patients. Our observation that the TNFAIP3 risk haplotype may influence the development of these complications suggests that TNFAIP3 plays an important role in SLE pathogenesis. Further characterization of the role of TNFAIP3 in SLE will be aided by the identification of the precise functional variant(s) responsible for the association with human SLE.

Materials and methods

Datasets

For the meta-analysis in the region of TNFAIP3, we used genotype data from subjects of self-described European ancestry from our GWAS study of 431 SLE cases, 2155 control subjects and 740 trio families2 and an independent case–control dataset, referred to as BE2. The BE2 dataset was comprised of 1313 SLE cases and 1226 control subjects selected from the Lupus Family Registry and Repository (LFRR) and the University of Minnesota SLE collection.24 All datasets were evaluated for subjects genotyped in more than one study. We removed 291 subjects from the BE2 set resulting in 1453 independent SLE cases and 3381 independent control subjects for meta-analysis.

Genotyping

Genotyping methods for the GWAS and Trio sample sets have been described previously.2 Genotyping in the BE2 study were performed on the BeadXpress Platform (Illumina) using the GoldenGate Chemistry at OMRF. SNPs were discarded if they failed to pass any of the following quality control metrics: a minimum Gencall score of 0.4, 10% Gencall score >0.8, call rate >90% and Hardy–Weinberg proportions greater than 0.01. In addition, we manually evaluated each SNP to ensure that the cluster characteristics were robust.

Meta-analysis and imputation methods

SNPs were chosen for inclusion in the meta-analysis if they were genotyped in a minimum of two datasets and were significantly associated with SLE in at least one dataset. PLINK was used to merge the genotype data from our GWAS and BE2 studies. Meta-analysis statistics were then generated using the Cochran–Mantel–Haenszel (CMH) method implemented in SAS v. 9.1 (SAS Institute Inc., Cary, NC, USA). The four SNPs genotyped in the trio families were combined with the CMH-derived case–control P values using Fisher's method.25

Imputation was performed by merging the GWAS genotype data from the 5 Mb interval flanking TNFAIP3 with HapMap Phase II data from the same region using PLINK.21 This process generated a list of SNPs for which differences in strand orientation prohibited further merging of the data. The strand orientation of these SNPs was ‘flipped’ in the HapMap genotype file to match the strand orientation for the GWAS data file. SNPs with A/T or G/C alleles cannot be detected by PLINK and were strand corrected manually. Once the merged dataset was assembled, we imputed the genotype data using the ‘proxy_impute’ PLINK command. As a quality control measure, SNPs with an information score <0.7 and/or NPRX 2 (n=1173 SNPs) were removed, resulting in 2497 SNPs imputed SNPs. As an independent test for the quality of the imputation we also used the IMPUTE package to generate an imputed dataset with similar results (data not shown).26

SLE subphenotype analysis

SLE subphenotypes included the 11 criteria used for classification from the ACR, as revised,22, 23 and the presence of anti-Ro/SSA and anti-La/SSB autoantibodies. For the 11 ACR criteria, cases were given a score of 0, 1, 2 or 3 based on the degree of confidence that a particular manifestation was present via review of the medical records. Cases denoted with 3 were considered positive for the manifestation, whereas individuals given a 0 were considered negative. Cases receiving 1 or 2 were given a missing value for the condition. The presence of autoantibodies (anti-Ro, anti-La and antinuclear antibodies) were evaluated at the CLIA-approved OMRF Clinical Immunology Laboratory, Oklahoma City, OK, USA. Control subjects were assumed to be normal, healthy individuals and did not have a family history of SLE. Association of rs5029939 genotypes with clinical subphenotypes was assessed using SAS to model a generalized linear model assuming a binomial distribution for the dependent variable and using the logit and probit link functions to estimate OR and attributable risk, respectively. Probability values were not corrected for multiple testing.

SLE criteria were grouped using the VARCLUS procedure in SAS, an oblique principal component analysis procedure that divides a group of variables into a set of distinct clusters. Within each cluster the VARCLUS procedure attempts to maximize the variation in the data accounted for by the fewest principle components, therefore, grouping variables within a cluster that are correlated and separate variables into distinct clusters that are uncorrelated. Individuals included in the VARCLUS cluster analysis were all SLE cases. Cases were scored on a scale of 0–3 as described above and a component score for each cluster was estimated using the scoring coefficient from the cluster analysis obtained using the SCORE procedure in SAS. Therefore, five new variables were created for each observation, with each new variable representing a cluster. Regression analysis was performed using the LOGISTIC procedure in SAS, specifying the rs5029939 genotype (omitting homozygous risk individuals due to low sample size) as the dependent variable and SLE and cluster component score as independent variables.