INTRODUCTION

Erythropoietic protoporphyria (EPP [MIM 177000]) results from pathogenic variants in ferrochelatase (FECH), the last enzyme in the heme biosynthetic pathway, causing accumulation of the light-sensitive molecule protoporphyrin IX (PPIX) in erythrocytes and secondarily in the plasma and biliary system.1 Although ~4% of EPP patients have two rare pathogenic FECH variants, the classical molecular alteration present in ~96% of EPP patients is a rare pathogenic FECH variant in trans of a common intronic FECH variant c.315-48T>C (rs2272783, historically called IVS3-48T>C), which is known to increase the use of an aberrant splice site. This combination decreases FECH enzymatic activity to ~35%.2 Although light tolerance varies among individuals, EPP is fully penetrant after considering the contribution of the hypomorphic c.315-48T>C variant.2,3,4 Between 2% and 10% of patients with EPP symptoms instead have X-linked protoporphyria (XLPP [MIM 300752]), in which the EPP phenotype and biochemical changes result from gain-of-function variants in delta-aminolevulinate synthase-2 (ALAS2), the first enzyme of erythroid heme biosynthesis.5,6,7,8 Clinically, patients with all forms of protoporphyria experience severe lifelong painful cutaneous photosensitivity.1 While patients may also experience iron deficiency anemia and gallstones, the most life-threatening complication is rapidly progressive cholestatic liver failure, which is often fatal unless a liver transplant is performed.1,9

Based on the incidence of EPP in Europe, the prevalence of EPP was estimated to be 0.00092%.10 The calculated prevalence ranged from zero in Poland to 0.00277% in Norway.10 The prevalence in the UK was estimated to be 0.00254%.10 It is unclear to what extent these differences represent actual disparities in disease prevalence or simply differences in the likelihood of reaching a diagnosis in the respective countries. As patients generally experience a delay in diagnosis of more than a decade, the rate of EPP diagnosis may not be an adequate representation of the true disease incidence and prevalence.10,11,12,13 No study to date has estimated the prevalence of EPP using large genetic data sets.

A better understanding of EPP prevalence, if truly underdiagnosed, could encourage efforts to decrease the barriers that patients currently face prior to diagnosis and to develop novel therapeutics. Because a new treatment for EPP called afamelanotide has recently been FDA approved, these efforts are increasingly critical to bring this therapy to individuals who could benefit from it.14,15

MATERIALS AND METHODS

Ethics statement

All clinical data were de-identified, requiring only data use agreements. This analysis obtained exemption from the Partners Healthcare institutional review board (IRB) and was performed in accordance with relevant guidelines and regulations.

Pathogenic FECH variants in the UK Biobank

The UK Biobank is a data set of 500,953 individuals aged 40–69, including both clinical and genetic data.16 Genotypes were assessed for all 500,953 participants using the UK Biobank Affymetrix single-nucleotide polymorphism (SNP) array (Table 1). All of the variants in the array were directly genotyped and not determined by imputation. Exome sequencing was performed on 49,960 of the participants, but evidence for large deletions was not assessed. The functional equivalent (FE) exome format was analyzed in this study, and the FECH gene was not within the UK Biobank regions that experienced FE exome data quality concerns.

Table 1 The frequency of pathogenic FECH variants that result in EPP when in transof c.315-48T>C.

The American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) recommendations were applied for the determination of variant pathogenicity, as described in the Supplemental methods.17,18,19 The estimation of the genetic prevalence of EPP was performed as described in the Supplemental methods, and code availability is addressed in the Supplemental methods.

The Porphyrias Consortium data set

The Porphyrias Consortium is a consortium of six university sites in the United States studying porphyria and additional satellite sites active in porphyria research.20 The Consortium’s Longitudinal Study of the Porphyrias (clinicaltrials.gov identifier: NCT01561157) collects genetic and clinical data on patients at these sites. The FECH gene and ALAS2 exon 11 were sequenced in all of the EPP patients, as previously described.7 Notably, all XLPP-causing ALAS2 pathogenic variants are in exon 11.5 Gene dosage analysis to evaluate for large deletions was performed on many but not all of the participants.

The Partners Biobank

The Partners Biobank is an ongoing research project that collects clinical, biochemical, and genetic data from patients seen at Massachusetts General Hospital, Brigham and Women’s Hospital, and other Partners Institutions. It currently contains clinical and biochemical data from 100,818 patients and genetic data in the form of a SNP array for 36,422 patients.16

The Women’s Genome Health Study

The Women’s Genome Health Study (WGHS) is a data set of genomic and biochemical data evaluating >25,000 healthy American women, with follow-up for over 23 years for major health events.21 Variants in FECH were identified in a subset of 22,618 WGHS participants with both European ancestry and genotype data available from the Exome Chip v1.1.

Clinical and biochemical associations with FECH variants in the UK Biobank

For the evaluation of associations between pathogenic FECH variants and EPP-related traits, the variant collapsing method to construct gene-based tests of association was used, as described in the Supplemental methods.22

Statistics

To test for categorical effects, values in each FECH variant group were compared with those with no FECH variants by t-test in a linear regression model. The linear effect for 0, 1, and 2 c.315-48T>C variants was also determined by linear regression. Meta-analysis was performed by inverse variance weighting (IVW). To evaluate for associations with hemoglobin <12.5 g/dL and with various diagnoses, logistic regression models were fit using glm in R v3.5.3. All analyses were adjusted for the top ten genetic principal components. P values for each of the comparisons are listed in Supplemental Table 3. Confidence intervals for allele frequencies and the genetic prevalence of EPP were calculated using nonparametric bootstrapping. Further details are provided in the Supplemental methods.

RESULTS

Frequency of pathogenic FECH variants

Among the 49,960 exome sequences available in the UK Biobank, 54 individuals had FECH variants that are pathogenic for EPP when in trans of the c.315-48T>C variant, with pathogenicity determined according to the ACMG criteria (Table 1, Table 2, Supplemental Table 1).17 Furthermore, among the 38,841 unrelated individuals of European ancestry, 0.118% (95% confidence interval [CI]: 0.0849–0.152%) of individuals (46 total) had one variant that is pathogenic for EPP when in trans of c.315-48T>C. The allele frequency of c.315-48T>C in this population was 0.0439. Consequently, the estimated genetic prevalence of EPP in individuals of European descent in the UK was 0.0052% (95% CI: 0.0036%, 0.0068%; see Supplemental methods).4 There were no statistically significant differences between the allele frequencies in the UK Biobank, gnomAD, Partners Biobank, and WGHS data sets, suggesting that the UK Biobank is not an outlier in terms of the frequencies of pathogenic FECH variants.

Table 2 The prevalence of EPP and the frequency of pathogenic FECH variants among unrelated individuals of European descent.

Calculating EPP prevalence based on exomic variant frequency alone would lead to an underestimate of EPP prevalence for two reasons. First, it has been described that a significant portion of pathogenic FECH variants found in trans of c.315-48T>C are large deletions (10% in the UK EPP data set), which are not assessed in the UK biobank and difficult to detect (Fig. 1).8,23 Second, the analysis is predicated on having observed any particular variant in a well-characterized EPP patient; however, 4.1% of known EPP patients in the Porphyrias Consortium data set and a published UK EPP data set have private missense or splice regions variants not found in other published EPP patient data sets. In the absence of an EPP diagnosis, those variants would not have been identified as pathogenic. Notably, the UK Biobank contains a number of FECH variants of uncertain significance, including a total of 37 missense variants in 143 individuals that are rare (minor allele frequency <0.0001) and predicted to be deleterious by SIFT and probably damaging by PolyPhen. These analytical biases were corrected for as described in the Supplemental methods, a correction achieved by comparing the frequencies of variant types in the UK Biobank, the Porphyrias Consortium data set, and in a previously published data set of EPP patients in the UK (Fig. 1).8 Including this estimate of unidentified pathogenic FECH variants, the calculated EPP genetic prevalence increased to 0.0059% (95% CI: 0.0042–0.0076%), which suggests that EPP may be 2.3 (95% CI: 1.7–3.0) times more common than previously thought in the UK (detailed calculations provided in Supplemental methods).

Fig. 1: Pathogenic FECH variants in the UK Biobank and the Porphyrias Consortium data set, alongside published erythropoietic protoporphyria (EPP) patient data from the UK.
figure 1

The FECH variants in the UK Biobank exome data and the Porphyrias Consortium data sets are displayed alongside published EPP epidemiologic data from the UK. Variants are displayed according to their predicted consequence, regardless of whether or not a more important consequence has also been determined in vitro. Large deletions are not reported in the UK Biobank. Each color depicts a different variant. No variant is duplicated between the seven predicted consequences. Both the c.315-48T>C allele and variants that are only pathogenic when inherited with another rare variant have been excluded from the above figure. The Porphyrias Consortium data set subset possessing both c.315-48T>C and one rare variant, n = 213; the same subset in the published UK EPP patient data set, p = 179; UK Biobank subset of those with 1 rare pathogenic variant, n = 54.

This estimate may still be an underestimate because it is only the estimate for EPP patients with a rare pathogenic variant in trans of c.315-48T>C and does not account for the possibility of two rare variants causing EPP, which is described in 2–4% of cases.7,8 The UK Biobank contains nine variants that have previously been found in at least one EPP patient, but only in trans of another rare allele. Only three of these met criteria for pathogenic/likely pathogenic according to the ACMG criteria (Supplemental Tables 1, 2).24 Additional enzymatic data for the six other variants would not change the EPP prevalence determination in this study because if pathogenic, it is still not clear if these variants can cause EPP in trans of c.315-48T>C or if all combinations of these rare variants can cause EPP. Notably, there were no ALAS2 variants pathogenic for XLPP in the data set, nor were there ALAS2 exon 11 truncation variants that would be expected to cause XLPP.

Given the small number of rare pathogenic variants in the data set, only the phase of the p.(Pro334Leu) and c.315-48T>C variants in the exome sequences could be computationally predicted, identifying a trans orientation in two of three individuals possessing both of these variants. Therefore, these two individuals are expected to have EPP. This is the same number of individuals predicted to have EPP in the exome data (Table 2).

In the UK Biobank, only three individuals carried an International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) code diagnosis of “hereditary erythropoietic porphyria,” which includes EPP, XLPP, and congenital erythropoietic porphyria (CEP [MIM 263700]) (Table 3). If all have EPP, the estimated prevalence of EPP among unrelated individuals of European descent in the UK Biobank is 7.5 (95% CI: 5.3–9.6) times greater than that of diagnosed EPP in the same population (0.00026%). CEP is extremely rare with an estimated prevalence of <0.0001%; consequently, a diagnosis of CEP is far less likely than a diagnosis of EPP or XLPP.25 While none of these three participants had identified pathogenic FECH or ALAS2 variants other than c.315-48T>C in FECH, few pathogenic loci were assessed as none of these individuals received exome sequencing.

Table 3 EPP-related clinical and laboratory findings according to FECH variant category.

The hypomorphic c.315-48T>C allele distribution among major ethnicities in the UK Biobank

Similar to previously published data, people of Chinese descent in the UK Biobank have the highest prevalence of c.315-48T>C, with 42.1% heterozygous and 8.8% homozygous individuals (Fig. 2).2 However, in people with British ancestry, the percentages of c.315-48T>C heterozygotes and homozygotes are 8.3% and 0.2%, respectively. After stratification for ethnicity, the occurrence of c.315-48T>C was found to comply with Hardy–Weinberg equilibrium (data not shown). Due to the limited sample sizes available for non-European ethnic groups in the UK Biobank, there was insufficient information to assess for population differences in the frequencies of rare pathogenic FECH variants. However, differences in c.315-48T>C were significant among ethnicities (P < 2.2e-16).

Fig. 2: The c.315-48T>C allele among major ethnicities in the UK Biobank.
figure 2

The fractions of individuals with 0, 1, or 2c.315-48T>C variants in the UK Biobank are depicted according to the major documented ethnicities. The numbers of individuals with each ethnicity are listed along with the minor allele frequency. Each ethnicity category with n > 1000 was included. These ethnicities are listed exactly as they were in the UK Biobank data set. MAF minor allele frequency.

Clinical and biochemical associations with FECH variants in the UK Biobank

Among unrelated individuals of European ancestry in the UK Biobank, the c.315-48T>C variant was significantly associated with decreased erythrocyte mean corpuscular volume (MCV) (P = 3.84e-5, Table 3). When the MCVs of those heterozygous or homozygous for c.315-48T>C were compared with those with no FECH variants separately, each comparison revealed a statistically significant decrease in MCV (P = 0.0005, P = 0.009, respectively, Table 3). In those who were compound heterozygotes for c.315-48T>C and a rare pathogenic FECH variant, MCV was significantly decreased compared with those with no FECH variants, despite there being only 13 compound heterozygotes among individuals of European ancestry (P = 0.009, Table 3). However, none of these individuals had diagnosis of EPP or an ICD-10 diagnosis code associated with photosensitivity. Additional computational corrections for variants in the SNP array that are associated with iron deficiency, namely rs4820268 in transmembrane protease serine 6 (TMPRSS6) and rs3811647 in transferrin (TF), did not affect these associations (data not shown).26 Common FECH variants that are genetically linked with c.315-48T>C, including IVS1-23C>T, were also analyzed for associations with MCV, and none of these associations were significant after correcting for c.315-48T>C (data not shown).

Regarding other biochemical and clinical features, hemoglobin was lower in homozygotes for c.315-48T>C compared with those with no FECH variants (P = 0.017, Table 3). Furthermore, in c.315-48T>C homozygotes there was a 1.4-fold increased prevalence of individuals with hemoglobin <12.5 g/dL (P = 0.003). There was no statistically significant association between the presence of a FECH variant and the diagnosis of anemia, gallstones, or liver disease (Table 3). There was no significant association between c.315-48T>C and MCV or between c.315-48T>C and hemoglobin in the much smaller Partners Biobank data set, a data set of individuals presenting for clinical care within the Partners Healthcare System (Table 3). The number of unrelated individuals of European descent in the Partners Biobank was 6.8% that of the UK Biobank (25,696 versus 379,390). Additionally, the hemoglobin of individuals of European descent in the Partners Biobank was lower than in the UK Biobank data set of healthy volunteers (p < 2.2e-16). In a second subset of 63,670 unrelated individuals of European descent in the UK Biobank, a group who were excluded from the first subset of 379,390 individuals due to genetic relatedness, there was again a significant association between MCV and c.315-48T>C (Table 3).

A meta-analysis of the Partners Biobank and two unrelated ethnic populations within the UK Biobank confirmed a significant linear association between MCV and c.315-48T>C (Table 3). In the meta-analysis, homozygotes for c.315-48T>C similarly had significantly lower MCV and hemoglobin and higher prevalence of hemoglobin <12.5 mg/dL versus those with no FECH variants (Table 3). The second subset of unrelated individuals of European descent in the UK Biobank could not be included in this meta-analysis, as all of the individuals in the second subset were genetically related to those in the first.

DISCUSSION

The frequency of pathogenic FECH variants in the UK Biobank provides evidence that EPP is underdiagnosed. Based on allele frequencies of c.315-48T>C and rare pathogenic FECH variants in the UK Biobank, the estimated genetic prevalence of EPP in individuals of European ancestry in the UK is 0.0052% (95% CI: 0.0036%, 0.0068%), and this estimate increases to 0.0059% (95% CI: 0.0042–0.0076%) when correcting for pathogenic variants that are not reported in the UK Biobank exomic sequences and for variants that cannot be identified as pathogenic due to insufficient clinical data. This calculated estimate is 2.3 (95% CI: 1.7–3.0) times higher than the estimate in the UK, which is based on the rate of diagnosis of EPP.10 Furthermore, the estimated genetic prevalence of EPP among unrelated individuals of European descent in the UK Biobank is 7.5 (95% CI: 5.3–9.6) times higher than the prevalence of diagnosed EPP in the same population (0.00026%). The significant decrement in MCV in the 13 compound heterozygotes identified in Table 3, which was not present in individuals with one rare pathogenic FECH variant, suggests that some of these individuals likely have undiagnosed EPP, or possibly subclinical EPP. In addition to shedding light on the prevalence of EPP, this study also demonstrates a novel association between the c.315-48C>T variant and both MCV and hemoglobin.

This study’s calculation of EPP prevalence is predicated on complete penetrance of the disorder, after accounting for the role of c.315-48T>C. Decreased penetrance in EPP cannot be excluded, but is probably rare, if present at all, as it has not previously been reported. Among more than 155 family cohorts of EPP patients that have been published in the literature, no occurrence of a nonpenetrant disease-associated genotype has been reported.2,27,28 Only one case of subclinical EPP has been described in the literature, in an individual with p.(Pro334Leu) in trans of c.315-48T>C. This individual had anemia and elevated PPIX, but no reported photosensitivity (Supplemental Table 1).29 Although none of the 13 individuals with both a pathogenic variant and c.315-48T>C in this study had a diagnosis code associated with photosensitivity, it may be expected that undiagnosed adult EPP patients would no longer seek medical attention for their symptoms, as even diagnosed EPP patients report negative experiences telling physicians about their symptoms.13 Furthermore, because there may be no visible skin changes despite severe pain, physicians may not diagnose it as a form of photosensitivity.30 In EPP, there are likely environmental or genetic factors affecting a patient’s degree of photosensitivity that have yet to be identified, which could also influence the likelihood of diagnosis. Future studies to better understand the variable expressivity in EPP could pave the way for new treatments.

This study provides evidence that EPP is underdiagnosed, which should encourage efforts to decrease the many barriers that EPP patients face in their attempt to find a diagnosis. First, because few physicians know about EPP and because of the minimal visible skin changes in some patients despite severe pain, there exists a large barrier to a physician’s consideration of EPP in the differential diagnosis, even among specialists.30 This could be remedied through solutions such as routine genomic sequencing and electronic medical record clinician decision support tools, as well as increased physician awareness of the disease. A new treatment for EPP called afamelanotide was approved in Europe a few years ago and recently approved by the US FDA; consequently, efforts to identify patients is increasingly important to bring this and future new therapies to individuals who could benefit from them.14,15 A second barrier in the United States is in the laboratory diagnosis of EPP, which must include measurement of erythrocyte total and metal-free protoporphyrin. Some major commercial labs, such as Quest and LabCorp, provide a “free erythrocyte protoporphyrin” test, which is actually a zinc protoporphyrin test.31 Because zinc protoporphyrin is often normal in EPP, the diagnosis may be missed.2,31 Because EPP has been considered rare, there has to date been insufficient motivation to address this problem.

Only two studies have described a role for the hypomorphic c.315-48T>C allele in patients outside of classical EPP, studies that described an incomplete EPP phenotype in four individuals homozygous for this allele.32,33 However, the presence of a deletion or cryptic intronic variant was not excluded; cryptic intronic variants were recently discovered in four individuals homozygous for c.315-48T>C and with the EPP phenotype.27 Interestingly, based on the computationally assessed pattern of growth of c.315-48T>C in Asian populations over time, there is evidence for positive selection for the variant in this population.2 In other genetic diseases, such as sickle cell anemia (MIM 603903), a protective effect of pathogenic variants has been demonstrated for particular clinical outcomes, accounting for the persistence of the variants in the population, so a similar selective advantage of c.315-48T>C is possible, perhaps related to anemia.34 Notably, in vitro studies revealed that red cells from EPP patients were resistant to malaria, an effect that was secondary to FECH deficiency and not PPIX accumulation, as this resistance to malaria was not present in red cells from individuals with XLPP.34

No previous study has demonstrated a significant association between the c.315-48T>C allele and either MCV or hemoglobin. A decrement in MCV and hemoglobin was not reproduced in individuals with one rare disease-causing allele, likely due to the small sample size. In the Partners Biobank, no significant association was detected between MCV and c.315-48T>C, possibly again an effect of data set size and decreased statistical power. Population differences between the two data sets could provide another explanation, with the Partners Biobank participants selecting for preferentially unhealthy subjects who have lower MCV and hemoglobin, as well as having more interpatient variability in these hematological characteristics due to a variety of medical conditions. Nonetheless, the collective evaluation of these data sets by a meta-analysis strengthens the evidence for an association between c.315-48T>C and both MCV and hemoglobin (Table 3).

The mechanism of reduced MCV and hemoglobin in individuals with c.315-48T>C is uncertain. There is a poorly defined clinical association of EPP with iron deficiency, which is seen in 20–50% of patients despite the appropriate regulation of both hepcidin and iron absorption.1,35,36 This iron deficiency is unexpected because although both iron and metal-free PPIX are substrates of FECH, only metal-free PPIX accumulates. Unfortunately, iron deficiency could not be tested in this study because serum iron parameters were not measured in the UK Biobank cohort. Apart from systemic iron deficiency, a possible explanation for reduced MCV is through a decrease in Mitoferrin-1 (MFRN1), which transports iron into the mitochondria for use by FECH. One study has reported a strong correlation between FECH activity and the expression of the FECH-complexed mitochondrial protein Mitoferrin-1 (MFRN1) among individuals with EPP, XLPP, and healthy controls; consequently, decreased MFRN1 could have a role in the observed association between c.315-48T>C and MCV.37,38 Alternatively, it is possible that the variant causes microcytosis through another mechanism, such as a slight reduction in heme synthesis, which could result in decreased hemoglobin synthesis through heme-dependent regulatory pathways such as heme-regulated elongation initiation factor kinase 1A (EIFK1A, also known as heme-regulated inhibitor [HRI]). Future in vitro studies should investigate and confirm the role of FECH deficiency in microcytosis mechanistically. The association between c.315-48T>C and both hemoglobin and MCV suggests that slight FECH deficiency predisposes to anemia without engendering an overt EPP phenotype. The extent to which FECH, along with other clinical or genetic factors, may predispose to clinically important anemia outside of EPP should be the topic of future investigations.

This study has several limitations. Because erythrocyte metal-free protoporphyrin levels could not be performed to conclude whether or not participants have EPP, decreased penetrance cannot be excluded. Regarding the nine variants that have been observed in EPP in trans of another rare variant but not c.315-48T>C, further clinical studies to ascertain which combinations of these variants can result in EPP may provide a better understanding of EPP prevalence. In addition, the correction for large deletions, likely cryptic variants, and private missense variants are predictions, and thus subject to error. Furthermore, this study had limited power to detect associations with EPP-related traits due to the small number of rare pathogenic FECH variants. Regarding the association between c.315-48T>C and both MCV and hemoglobin, ideally this would have been corroborated using another data set, and preferably one measuring iron. However, there were no other data sets available of a similar size for comparison that include both the genetic and biochemical data of relatively healthy individuals. An effect of unknown confounders or variants genetically linked with c.315-48T>C is possible. Additionally, because the UK Biobank evaluates disproportionately healthy participants in the UK, our observations may not apply equally to clinical features in other populations.

A strength of this study is that it evaluates the prevalence of pathogenic FECH variants in a large exomic data set and uses this data to estimate of EPP prevalence. Because of the evidence for EPP underdiagnosis found in this study, increased efforts to decrease barriers to diagnosis are essential, especially now that an effective therapy has been FDA approved. As this study primarily analyzed individuals of European descent in the UK, further research is needed to evaluate EPP prevalence and underdiagnosis in other ethnicities.39 Furthermore, study suggests a role for c.315-48T>C in erythrocytes outside of classical EPP, which should be further evaluated. EPP is a life-altering condition that limits quality of life with risk of hepatic complications that can be fatal. Although a new therapy has been developed for EPP, which represents a significant advancement in the field, the clinical impact of any treatment will be muted if a large percentage of EPP patients remain undiagnosed.