Main

Systemic lupus erythematosus (SLE) is a potentially deadly systemic illness, and is sometimes considered a model for systemic humoral autoimmune diseases. It is characterized by autoantibody production leading to tissue injury through the formation and deposition of autoantibody–autoantigen immune complexes. Severity, risk and clinical expression of SLE vary by race, geography and sex, with a prevalence that is higher in women and some non-European-derived populations (reviewed in Refs 1, 2, 3).

SLE is an unusually heterogeneous disease, with various combinations of four of the eleven clinical criteria required for case classification4,5. A high sibling risk ratio (8–29), high heritability (>66%) and higher concordance rates between monozygotic twins (20–40%) relative to dizygotic twins and other full siblings (2–5%) all suggest that SLE has a complex genetic basis6,7.

Here we explore how innovations in genotyping have advanced our understanding of the genetic basis of SLE through their application to fine mapping and genome-wide association (GWA) studies, and how they are informing our understanding of disease pathophysiology. The number of convincingly established genetic associations with SLE has increased sharply over the last few years. We now have both genetic evidence that corroborates existing theories and new insights about biological pathways that contribute to the pathophysiology of SLE. This increased genetic understanding provides the opportunity to investigate potential new therapeutic strategies and to improve diagnostic and prognostic tests for the disease.

High-density genotyping

In the 1990s, SLE risk genes were investigated using genome-wide family-based linkage studies and led to the identification of FCGR2A, FCGR3A and PDCD1 as candidates8,9,10 (Table 1). However, linkage studies were limited in their ability to identify causal alleles because of a lack of dense marker sets, which hindered comprehensive fine mapping efforts. In addition, these studies typically have low power to map variants of small phenotypic effect size. In recent years, increased knowledge of the structure of the human genome through efforts such as the International HapMap project, together with technological developments that allow efficient and inexpensive high-throughput genotyping, have resulted in the availability of custom-designed dense marker sets11,12. The linkage markers used in the 1990s had an average between-marker distance of 10 centiMorgans; in the current dense marker sets these distances can be reduced to <20 base pairs. Combined with the recruitment of large DNA collections from patients with SLE and controls, these marker sets have enabled extensive fine mapping in candidate regions, which in several cases has led to the identification of causal variants (Table 1).

Table 1 Systemic lupus erythematosus (SLE) candidate genes identified or confirmed in recent studies

The last couple of years have also seen the application of the GWA study design, with its ability to screen hundreds of thousands of SNPs across the genome without previous knowledge of candidate regions or genes. To date, results from five GWA studies in SLE have been reported13,14,15,16,17, which have identified and robustly replicated several novel loci (ITGAM, BLK, BANK1, KIAA1542, PXK and TNFAIP3) (Table 1), confirmed association at a number of other previously implicated loci, and generated a large second tier of candidate loci (p values of 10−5 to 10−6) for further study. Before 2007, there were nine confirmed SLE susceptibility loci. With the progress made in the past 2 years by the utilization of high-density genotyping capabilities, there are now more than 20 loci identified that show robust association to SLE.

Biological pathways involved in SLE

The genetic associations identified to date indicate that many different pathways, processes and cell types are involved in generating the SLE phenotype (Fig. 1). Although this interpretation is biased by our prior beliefs and application of parsimony when assigning roles to the variants identified, these findings have reinforced our pre-existing understanding of SLE pathophysiology obtained from immunochemistry, animal studies and other diseases, and have refined our understanding to the molecular level. Most of these genes are involved in three types of biological process: immune complex processing; Toll-like receptor (TLR) function and type I interferon production; and immune signal transduction in lymphocytes.

Figure 1: Pathways that contain established candidate systemic lupus erythematosus (SLE) susceptibility loci.
figure 1

Genes involved in each pathway are indicated. BANK1, B-cell scaffold protein with ankyrin repeats 1; BLK, B lymphoid tyrosine kinase; C1Q, complement component 1, q subcomponent; C2, complement component 2; C4, complement component 4; CRP, C-reactive protein, pentraxin-related; FcGR2A, Fc fragment of IgG, low affinity IIa, receptor (CD32); FcGR3A, Fc fragment of IgG, low affinity IIIa, receptor (CD16a); HLA-DR, major histocompatibility complex, class II, DR; IFN, interferon; IRAK1, interleukin 1 receptor-associated kinase 1; IRF5, interferon regulatory factor 5; ITGAM: integrin, alpha M; MECP2, methyl CpG binding protein 2; PCDCD1, programmed cell death 1; PTPN22, protein tyrosine phosphatase, non-receptor type 22; PXK, PX domain containing serine/threonine kinase; STAT, signal transducer and activator of transcription; TLR, Toll-like receptor; TNFAIP3, tumour necrosis factor-α induced protein 3; TNFSF4, tumour necrosis factor superfamily, member 4; TREX1, three prime repair exonuclease 1; XKR6, XK, Kell blood group complex subunit-related family, member 6.

First, defects in apoptotic cell clearance, processing and presentation to lymphocytes — processes that are mediated by antigen-presenting cells — have been implicated in the development of SLE. Alleles at certain loci for which association with SLE has been identified or confirmed (for example, HLA-DR genes, CRP and genes encoding Fc fragment receptors) might affect the way that the encoded proteins react with immune complexes, providing molecular support for immune complex processing as an important theme in SLE pathogenesis (Fig. 2a). This suggestion is bolstered by the low levels of proteins involved in the complement cascade in the circulation of patients with active SLE, and by the association of SLE with the absence of complement proteins as a consequence of homozygous null alleles at any one of a number of classic complement pathway loci. ITGAM, also known as CD11b or complement receptor 3 (CR3), is the newest member of this pathway to be convincingly associated with SLE, and was identified simultaneously in two GWA studies and in an independent positional cloning study14,15,18. ITGAM, which encodes the α-chain of the αMβ2 integrin, is an integrin adhesion molecule that binds not only the complement cleavage fragment of C3b, but also a myriad of other possible ligands that are relevant to SLE. A strong candidate non-synonymous polymorphism, H77R, has been identified and seems to explain the entire association effect18 — this should help to differentiate between the possible ligands. H77R seems to cause significant structural changes to the ligand-binding domain of αMβ2 (Ref. 18). In addition, alloantibodies that are reactive against this polymorphism block the αMβ2-dependent adhesion of neutrophils to endothelial cells19.

Figure 2: Pathways in which identified systemic lupus erythematosus (SLE) risk alleles operate.
figure 2

Note that these are proposed models and it is possible, even likely, that the mechanisms by which these genes confer risk for SLE involve additional pathways. a | Phagocytosis. A presumed environmental trigger (for example, UV irradiation or viral infection) or dysregulated apoptosis leads to activation of antigen-presenting cells. These cells phagocytose self-antigen that is coated by opsonin molecules (for example, C3b), which are bound by their receptors (for example, ITGAM and ITGB2), leading to subsequent antigen-presenting cell activation and presentation of self-antigen to host lymphocytes. This pathway potentially plays a part in disease initiation and perpetuation. In terms of initiation, antigen-presenting cell hyperactivation leads to loss of self-tolerance. In terms of perpetuation, when immune complexes are not cleared, this leads to the production of autoantibodies. b | Type I interferon production. Recent data38 suggest that TREX1 digests cytosolic DNA and prevents activation of a cell-intrinsic type I interferon (notably, interferon-α) response pathway38. Similarly, activation of Toll-like receptors (TLR7, TLR8, and TLR9) on ligand recognition (CpG DNA or ssRNA) leads to the production of type I interferon by immune cells, particularly plasmacytoid dendritic cells, and the interferon responsive gene expression signature that is observed in SLE serum. c | Immune signal transduction. Various stages in the life cycle of lymphocytes are important for the development of the autoreactive B cell clones, which produce the pathological autoantibodies observed in SLE. Here we focus on activation events. Self-antigen recognition by B cells starts at the B cell receptor (membrane immunoglobulin M, IgM), where the balance of positive signals (B cell receptor crosslinking) and negative signals (FCGR2B ligation) are transduced by intracellular kinases, such as BLK and BANK1, leading to B cell activation. A similar process, leading to T cell activation, occurs after uptake of the self-antigen and presentation on a class II major histocompatibility complex (MHC) molecule, such as HLA-DR, to a CD4+ T lymphocyte, which subsequently provides 'help' to B lymphocytes. It should be noted that autoreactive clones must avoid deletion before activation events can lead to florid autoimmunity. BANK1, B-cell scaffold protein with ankyrin repeats 1; BCR, B cell receptor; BLK, B lymphoid tyrosine kinase; C1Q, complement component 1, subcomponent q; C2, complement component 2; C3, complement component 3; C3a, C3 cleavage product a; C3, C3b, cleavage product b; C4, complement component 4; CD274, programmed cell death 1 ligand 1 precursor; CD4, CD4 molecule; CRP, C-reactive protein; FCGR, Fc fragment of IgG receptor; HLA-DR, major histocompatibility complex, class II, DR; IFN, interferon; IgG, immunoglobulin G; IRAK1, interleukin 1 receptor associated kinase-1; IRF, interferon regulatory factor; ITGAM: integrin alpha M; ITGB2, integrin, beta 2; MHC2, CD74 molecule, major histocompatibility complex, class II invariant chain; PDCD1, programmed cell death 1; PDCD1LG2, programmed cell death 1 ligand 2 precursor; PTPN22, protein tyrosine phosphatase, non-receptor type 22; STAT, signal transducer and activator of transcription; TCR, T cell receptor; TLR, Toll-like receptor; TNFAIP3, tumour necrosis factor-α induced protein 3; TNFRSF4, tumour necrosis factor receptor superfamily, member 4; TNFSF4, tumour necrosis factor superfamily, member 4; TREX1, three prime repair exonuclease 1.

Second, interferons have been implicated in SLE pathophysiology since the 1970s20, and this has been supported by a range of more recent studies21,22. Type I interferon production is induced by immune complexes containing self-antigens and nucleic acids, which signal through TLR7 and TLR9 (Fig. 2b). Several SLE genes that were recently identified through candidate gene and GWA studies (IRAK1, TREX1, IRF5 and TNFAIP3, for example14,15,17,23,24,25,26,27,28) encode components of pathways upstream and downstream of type I interferon production. Understanding how these genes are involved in SLE aetiology will be vital as the overproduction of type I interferon can promote the expression of proinflammatory cytokines and chemokines, the maturation of dendritic cells, the activation of autoreactive B and T cells, the production of autoantibodies, and loss of self-tolerance29. The identification of specific genes involved in the type I interferon pathway promises to add to our understanding of SLE pathophysiology in two ways: they provide evidence that will be useful in determining the cells and pathways that drive SLE-relevant interferon production, and they should help to narrow down which of the responses to type I interferon are involved in SLE.

Third, signal transduction in immune cells, especially in B and T cells, is another pathway that has been shown to contain multiple SLE susceptibility genes (Fig. 2c). The activation of B cells though antigen-mediated crosslinking of the B cell receptor (surface immunoglobulin M) and subsequent interaction of autoreactive B cell clones with T helper 2 cells leads to loss of self-tolerance and autoimmunity. B and T cells have long been known to be involved in SLE pathogenesis, and signal transduction pathways involving these cells have been previously implicated. For example, PTPN22 is a selective phosphatase that modulates signal transduction in T cells, and represents a case in which a causal variant has been identified that contributes to disease susceptibility. The known R620W (1858C to T) risk allele is a gain-of-function variant with increased catalytic activity compared with the non-risk variant, and is thought to be a more potent suppressor of T cell receptor signalling30,31. This polymorphism is more common in northern Europeans (8–15%) compared with southern Europeans (2–10%), and is almost absent in Asian and African populations32.

Another PTPN22 polymorphism — the loss-of-function mutation R263Q, which is found in the PTPN22 promoter and leads to reduced phosphatase activity — has recently been identified33. Recent GWA studies for SLE have also identified new associations with other genes in B and T cell signalling pathways (for example, BANK1 and BLK), producing renewed attention by investigators to the mediation of B and T cell responses. BANK1 is thought to alter B cell activation to increase SLE risk, whereas BLK is thought to influence B cell tolerance and may affect mature B cell function15,16. Studies to uncover the exact function of BLK and BANK1 in SLE are currently underway, and they have the potential to provide new knowledge about the molecular pathways that affect B cell responses when exposed to antigen.

Finally, the most potentially informative results of candidate and GWA studies concern those loci that have no obvious connection to pathways that have been previously implicated in SLE (for example, PXK, XKR6 and KIAA1542). Elucidating the pathophysiological mechanisms underlying the association at these loci will be difficult. A striking example of this is XKR6 — a member of a novel family of PDZ conserved binding motif-containing proteins that shares homology with the Caenorhabditis elegans gene ced-8, which is implicated in regulating the timing of apoptosis34. XKR6 contains an intronic microRNA gene, mir-598, which is highly expressed in human peripheral blood mononuclear cells, especially activated B cells35. Identifying the causal variant in this case is likely to be a complex undertaking: the signal observed at XKR6 might be due to mutations affecting the function of XKR6 or of miR-598. Alternatively, the signal may be a proxy for an association with another gene in this region: a polymorphic inversion under apparent selection pressure on chromosome 8p23 encompasses the XKR6, C8orf12 and BLK genes, all of which have been implicated in SLE risk in GWA studies36. All of these scenarios are plausible and fit with our current understanding of SLE pathophysiology, and they may not be mutually exclusive.

Associations with SLE in the MHC region

The discovery that the major histocompatibility complex (MHC) region confers risk to SLE marked the inception of genetic studies of this disease. However, the unprecedented highly complex linkage disequilibrium structure of this locus, which extends 7.2 Mb14 across >400 genes in European-derived subjects, has hindered efforts to dissect the variants responsible for the considerable risk that this region confers in SLE. A recent meta-analysis of the results from the past 30 years of research found that the most consistent human leukocyte antigen (HLA) associations with SLE for MHC class II alleles in European populations were HLA-DR3 and HLA-DR2 (Ref. 37). However, in the recently published GWA results from European-derived women, greatest association with SLE was with the MSH5 gene, which is found in the MHC class III region14. Further study is required to determine whether MSH5, or one of its close neighbours, is a SLE risk factor that is independent of the HLA-DR genes that have been so frequently associated with SLE. The structures of MHC haplotypes differ between populations, and evaluation in non-Europeans has revealed that other alleles (for example, HLA-DR4) confer susceptibility to SLE risk in these populations37.

Although this region is one of the most extensively studied regions of the human genome, the precise contribution to the overall genetic risk of SLE remains to be determined. Therefore, studies performed in much larger cohorts that evaluate the entire MHC locus rather than specific regions, and that are inclusive of non-Europeans, have the potential to increase our understanding of SLE pathogenesis.

Genetic models of SLE risk

Given the large number of SLE susceptibility loci now known, important questions can be addressed. How many more loci are likely to be implicated in SLE pathogenesis? And what is their impact in terms of contribution to risk? Available evidence suggests that the genetic risk for SLE is derived from variation in many (perhaps as many as 100) genes, each of modest effect size (odds ratios 1.15 to 2.4) (see Table 1 for the 20 currently established SLE genes). Therefore, the genetic architecture of SLE more closely resembles that of Crohn's disease, which has >30 susceptibility loci. This contrasts with rheumatoid arthritis, an autoimmune disease with which SLE shares common elements of pathophysiology and some susceptibility loci (TNFAIP3, STAT4 and PTPN22) and for which GWA studies have yielded fewer genes (<10).

Interestingly, none of the identified associations in SLE exhibits evidence of epistasis. A stepwise multiple logistic regression analysis suggests that variants in PXK, IRF5, KIAA1542, ITGAM and the HLA region genes act independently14. When considered jointly, these variants explain 15% of the SLE sibling risk ratio of 8 to 29 and are strongly predictive of SLE with a high sensitivity and specificity that is comparable to that used in some clinical tests14,38. However, this estimate of 15% needs independent replication as it is likely to be an overestimation, biased by the fact that it is based on the influence of these variants in the samples in which the associations were initially discovered. As the number of robust genetic associations enlarges and their modes of inheritance are described, the practical utility of this new knowledge is likely to find application in new diagnostic and management strategies for SLE. In particular, the unusual clinical heterogeneity of SLE coupled with its clear genetic diversity argues for genetic tests that would classify the disease into subtypes, which might guide preventive and therapeutic strategies.

Challenges and future directions

The past few years have seen tremendous success in the identification of SLE susceptibility genes, with at least 20 robustly associated loci that contribute to disease risk. However, it is likely that many more remain to be discovered. In terms of future studies, the GWA design has its limitations. Because GWA studies rely on tagging common haplotype blocks, association signals from these studies are more likely to identify a marker in strong linkage disequilibrium with a causal variant than they are to identify the actual causal variant. Second, they are unlikely to have the power to detect association in SLE susceptibility loci that have already been robustly replicated for association with SLE. For example, the FCGR gene cluster on chromosome 1q23.3 has multiple ancient (as well as modern) gene duplications and rearrangements. Consequently, marker coverage at these loci is sparse. Similarly, because GWA studies assay common variants, rare risk variants (such as those described in TREX1 and variants of the complement component genes C2, C4 and C1Q) are unlikely to be detected by a GWA study. Indeed, recessive modes of inheritance are generally underpowered in GWA studies unless the risk allele is common. Furthermore, GWA studies have so far only been carried out for SNPs. However, there is increasing evidence that other types of common genetic variation (for example, copy number variants) contribute to complex disease, some of which have only recently been included in GWA study genotyping panels.

Each newly identified association presents new challenges. Finding the causal variants, understanding how they affect disease pathophysiology and dissecting their contribution to SLE risk remain major undertakings. For some genes, the effect sizes or risk allele frequencies may be so small that larger collections of patients with SLE are needed to identify a sufficient number of patients with the responsible risk allele for subsequent functional studies. Studies to evaluate the molecular differences in gene regulation or function that are due to the supposed causative genetic risk variants (for example, protein expression level and cellular function differences between cases and controls) are needed to explore the possible mechanisms through which the causal variant generates disease risk. Even when the gene has an obvious potential to explain pathogenesis and to be a component of the mechanism of disease, some inferences concerning function may be flawed because of hidden and cryptic relationships that are still unknown.

It is important to note that many of the associations detected to date have been in European populations. Although some of these genes also associate with SLE in non-European populations (for example, FCGR2A, IRF5 and ITGAM), SLE associations are mostly unexplored in African, Asian and Hispanic ancestries, mainly because previous studies were underpowered owing to limited sample collections. Furthermore, meta-analyses are needed to improve power and capitalize on existing results. Finally, understanding how the implicated genes interact with the environment (for example, Epstein–Barr virus antigens and smoking) will be an important goal that has so far not been tackled.