Introduction

The cohesin complex mediates sister chromatid cohesion and ensures accurate chromosome segregation, recombination-mediated DNA repair, and genomic stability during DNA replication and cell division. Accumulating evidence suggests that cohesin is also involved in regulating chromosomal looping/architecture and gene transcriptional regulation.1,2,3

Cohesin is a multisubunit protein complex composed of evolutionarily conserved core components encoded by SMC1A (MIM *300040), SMC3 (MIM *606062), RAD21 (MIM *606462) and either STAG1 (MIM *604358) or STAG2 (MIM *300826) depending on the chromosomal location. Direct interaction between SMC1A, SMC3, and RAD21 forms a tripartite ring structure that is used to entrap the replicated chromatin during sister chromatid cohesion (Fig. 1a). STAG1/2 are the core structural component of functional cohesin and critical for the loading of cohesin onto chromatin during mitosis.1,2

Fig. 1
figure 1

Cohesin complex and its underlying genetic variants. a Schematic diagram of the cohesin complex. The components are represented in different color shapes labeled with protein names. b Comparison of genic distributions between our clinical exome cohort and two phenotype-driven cohorts of clinically diagnosed Cornelia de Lange syndrome (CdLS) patients (from ref. 19 and Baylor-Hopkins Center for Mendelian Genomics [BHCMG], respectively).19 Y-axis, proportion of molecular diagnoses provided by variants in each gene; x-axis, genes; black, patients without CdLS listed as differential diagnosis; dark gray, patients with CdLS as one of the differential diagnoses; gray, CdLS cohort from ref. 19; light gray, CdLS cohort from BHCMG. c Comparison of genic variant frequencies between COSMIC and ExAC cohorts. Filled circles represent comparison between frequencies of putative loss-of-function (LoF) variants between COSMIC and ExAC; open circles represent comparison between frequencies of missense variants between COSMIC and ExAC. Y-axis, ratio between frequencies of genic variants (missense or putative LoF) in COSMIC and ExAC; x-axis, genes

In addition to the aforementioned structural components, cohesin also interacts with the regulatory factors of the cohesion cycle, including proteins encoded by NIPBL (MIM *608667), MAU2 (MIM *614560), PDS5A (MIM *613200) or PDS5B (MIM *605333), WAPL (MIM *610754), HDAC8 (MIM *300269), ESCO1 (MIM *609674), and ESCO2 (MIM *609353), to facilitate cohesin dynamics and function on chromatin (Fig. 1a).1,2

Precise orchestration of cohesin’s structural components and regulatory factors ensures faithful progression of the cohesion cycle (Fig. 1a). Defects of the structural or regulatory components of cohesin lead to various multisystem malformation syndromes described as “cohesinopathies”, a collection of syndromes with shared clinical findings such as distinctive facial features, growth retardation, developmental delay/intellectual disability (DD/ID), and limb abnormalities. Clinically, the most distinguishable type of cohesinopathy is the classic Cornelia de Lange syndrome (CdLS, MIM #122470), with the majority of cases explained by single-nucleotide and insertion/deletion variants (SNVs/indels) and exonic copy-number variants (CNVs) resulting in loss-of-function (LoF) alleles in NIPBL.4,5,6 The traditional phenotype-driven studies that included the mild end of the CdLS spectrum led to the discovery of SMC1A, SMC3, RAD21, and HDAC8 (MIM #300590, #610759, #614701, and #300882) as new cohesinopathy genes.4,5,6,7,8,9,10,11 The resultant CdLS phenotype is largely dependent on the genes being affected and pathogenic variant (PV) types.12 Although mild forms of CdLS present with less striking phenotypes and are more clinically challenging to recognize in comparison with the classic form, they have been found in an increasing number of patients with cohesinopathies.

Here, we used a genotype-driven approach to investigate the allelic series of genes encoding cohesin components based on a large cohort of patients (N = 10,698) with a variety of unselected clinical presentations who were referred for clinical exome sequencing (CES). We identified pathogenic or likely pathogenic variants in known CdLS genes (NIPBL, SMC1A, SMC3, RAD21, and HDAC8) in patients mostly without a clinical diagnosis of CdLS, representing a cohort on the mild end of the clinical presentation of cohesinopathies. By applying the same genotype-first approach in the CES cohort, we further established STAG1 and STAG2 as new cohesinopathy genes with variants that act by a putative LoF mechanism, corroborating recent reports of patients with developmental disorders carrying PV in these two genes.13,14,15 Additional studies of patients who had chromosome microarray analyses (CMA, N = 63,127) also identified deletion CNVs affecting STAG1 and STAG2, which further supports the human disease association of these two genes via a LoF mechanism. We also provide evidence supporting the candidacy of PDS5A and WAPL as cohesinopathy disease genes. Our findings emphasize the utility of CES to provide molecular diagnoses for disorders with extensive genetic and phenotypic heterogeneity, uncover the potential molecular etiologies of previously undiagnosed patients, and elucidate novel candidate cohesinopathy disease genes that potentially expand the genotype/phenotype characterizations of cohesinopathies.

Materials and methods

Samples

The study has been conducted through a collaborative effort between Baylor Genetics (BG) and Baylor-Hopkins Center for Mendelian Genomics (BHCMG), and has been approved by the Institutional Review Board of Baylor College of Medicine. Approved consents for publishing photos have been obtained. Please see Supplemental Appendix for detailed descriptions of samples in BG and BHCMG. Selected patients with STAG1, STAG2, or PDS5A variants were enrolled after obtaining informed consent for further phenotypic characterization based on clinical notes submitted along with the CES order.

CES and variant interpretation

CES was performed as previously described.16,17 The variant classification and interpretation were conducted by a clinical standard based on the American College of Medical Genetics and Genomics variant interpretation guidelines.18 Details of the CES experimental procedures and sample-wise quality control (QC) metrics can be found in Table S1. The possibility of mosaic variants in known CdLS genes19 was carefully evaluated. A variant is considered mosaic only if the variant read versus total read ratio is below 30% and confirmatory Sanger sequencing demonstrates a comparable mosaic fraction.

The variants identified in this study have been deposited to ClinVar (accession numbers SCV000747051-SCV000747088 and SCV000747090-SCV000747093).

Chromosome microarray analysis

The experimental design and data analysis of chromosome microarray analysis (CMA) were performed according to previously described procedures.20

X-chromosome inactivation assay

X-chromosome inactivation (XCI) studies were performed for the patient samples with STAG2 variants based on the protocol described by Allen et al.21 with modifications. Please see Supplemental Appendix for detailed protocols.

Estimation of pathogenic variant prevalence in somatic cancer samples

The datasets from the COSMIC (http://cancer.sanger.ac.uk/cosmic/download) and ExAC (Exome Aggregation Consortium, http://exac.broadinstitute.org/)22 databases were used for the calculation. The normalized PV abundance per gene in cancer samples is determined by the ratio between the PV frequencies of COSMIC versus the ExAC (y-axis in Fig. 1c). Please see Supplemental Appendix for details.

Results

Variants of established CdLS genes in the CES cohort

Based on a genotype-driven selection approach, we identified 33 patients with pathogenic or likely pathogenic variants in the well-recognized CdLS genes from the CES cohort. Those variants include heterozygous or hemizygous SNVs/indels in NIPBL (N = 5), SMC1A (N = 14, X-linked), SMC3 (N = 4), RAD21 (N = 2), and HDAC8 (N = 8, X-linked) (Table 1). Genic variant distribution was calculated to show the per-gene contribution to molecular diagnosis among the five known CdLS genes (Fig. 1b). Of the 33 variants, 29 occurred de novo in the proband, 3 were inherited from a parent, and 1 was of unknown inheritance (not maternally inherited, paternal sample not available, Table 1). Among the inherited variants, one variant in SMC1A was inherited from a symptomatic mother with a milder phenotype, demonstrating variable clinical presentation for X-linked dominant disorders; two variants in RAD21 were inherited from symptomatic parents with milder phenotypes, documenting variable expressivity of defects in RAD21.

Table 1 Summary of variants in the known Cornelia de Lange syndrome (CdLS) genes identified by Baylor Genetics clinical exome sequencing

The CdLS patients in this cohort may be enriched for atypical or mild CdLS phenotypes, because those with classic CdLS presentation are more likely to be referred for specific single-gene or panel testing instead of CES. We retrospectively examined the clinical notes submitted by the referral clinicians for their differential diagnoses prior to CES. CdLS was not included in the initial differential diagnoses for 60% of patients with a positive NIPBL finding, 93% with SMC1A, and 75% with SMC3 variants, and all those with RAD21 or HDAC8 variants (Table 1, Fig. 1b). These observations support the previous hypotheses that pathogenic variants in NIPBL have a better correlation with classic CdLS, while SMC1A and SMC3 pathogenic variants may contribute to milder CdLS features; the phenotypes caused by pathogenic variants in RAD21 and HDAC8 become more variable and sometimes present atypical CdLS features.12

As a comparison with the genic distribution of our CES cohort, we analyzed the data from a phenotype-driven cohort of CdLS patients.19 Moreover, we re-examined the genic variant distribution on an independent phenotype-driven CdLS cohort (N = 41) from BHCMG, in which pathogenic or likely pathogenic variants in NIPBL (N = 12), SMC1A (N = 6), SMC3 (N = 2), and HDAC8 (N = 1) were identified (Table S2). The genic variant distribution of the BHCMG CdLS cohort is overall comparable with that calculated from the phenotype-driven cohort.19 However, both of these largely deviated from our CES cohort (Fig. 1b). The proportion of patients with NIPBL pathogenic variants in our cohort was significantly lower in comparison with the aforementioned two phenotype-driven cohorts (chi-squared test, both with p < 0.001). The proportion of patients with SMC1A pathogenic variants in our cohort and the BHCMG were significantly higher than the other CdLS cohorts (chi-squared test, both with p < 0.02), indicating mild/atypical CdLS presentations in the BHCMG cohort. Therefore, the mutational spectrum in known CdLS genes in the CES cohort represent a distinct distortion and alternative perspective from phenotype-driven CdLS cohorts, where patients tend to present with classic phenotypes.11

Interestingly, 6/33 (18%) of the patients with positive findings from known CdLS genes carry a secondary diagnosis (Table 1), which is higher than the average observed fraction of patients with dual diagnoses from positive cases in the entire CES cohort (~5%) (ref. 23). This is not unexpected because the predicted extent of multilocus diagnosis can be as high as 14% under a Poisson distribution model.23 The high representation of dual diagnosis and resultant blended phenotypes observed in this study may contribute to the complexity of the patients’ phenotypes, further obscuring the underlying molecular causes, making clinical diagnosis challenging without the assistance from objective molecular testing.

Candidate disease genes in the cohesin structural and regulatory components

STAG1, STAG2, PDS5A, PDS5B, WAPL, and MAU2 encode close interacting factors of NIPBL, SMC3, SMC1A, RAD21, and HDAC8 in the cohesin pathway, and thus may potentially supplement the locus heterogeneity of cohesinopathies. According to the ExAC database, NIPBL, SMC3, SMC1A, and RAD21 have probability of LoF intolerance (pLI) scores of 1.00, while HDAC8 has a pLI of 0.92. Similarly, STAG1, STAG2, PDS5A, PDS5B, WAPL, and MAU2 all have pLI scores of 1.00, suggesting their intolerance to LoF variants (Table S3). In our CES cohort, we identified putative LoF (truncating/splicing) or de novo missense variants in STAG1 (3), STAG2 (2), PDS5A (2), and WAPL (1). Through collaboration with the Deciphering Developmental Disorder (DDD) study and BHCMG, three additional de novo variants in STAG2 were identified.

De novo heterozygous SNVs/indels in STAG1 (NM_005862.2), including one frameshift variant (c.2009_2012del [p.N670Ifs*25]) and one missense variant (c.1129C>T [p.R377C]), were identified in patients 1 and 2, respectively (Fig. 2a). Both patients had common clinical findings that included DD/ID, hypotonia, seizures, mild dysmorphic features, and skeletal abnormalities (Table 2, Table S4). In addition, one heterozygous de novo missense SNV, c.253G>A (p.V85I) in STAG1, was identified in patient 3 (Fig. 2a) along with a heterozygous de novo c.1720-2A>G SNV (observed twice in ExAC including one potentially being mosaic) in ASXL1 (Bohring–Opitz syndrome; MIM #605039). Patient 3 presented with global developmental delay, dysmorphic facial features, seizures, optic atrophy, mild hypotonia, skin hypopigmentation, hirsutism, possible autism spectrum disorder, and structural brain abnormalities (Table 2, Table S4). The concurrent de novo variants in STAG1 and ASXL1 could possibly contribute to a dual molecular diagnosis of this patient.

Fig. 2
figure 2

The variants in STAG1 and STAG2. a Single-nucleotide variants (SNVs)/indels in STAG1. b SNVs/indels and one copy-number variant (CNV) deletion in STAG2. For panels a and b, the white segment represents the full-length protein, and the black segments represent protein domains; the missense variants are annotated above the segment, while the putative loss-of-function (LoF) variants (including the CNVs deletion in STAG2) are underneath; the variants colored in red are reported in the current study. The boxed variant (p.A638Vfs*10) in panel b is reported as a research variant. c Diagram showing the CNV deletions overlapping STAG1 reported in DECIPHER and the current study. The red segments represent the deletions, which are divided in two groups: DECIPHER and Current Study. The bottom panel shows genes in the region. STAG1 is highlighted in red. d Photographs showing the front and side facial profiles of patients 8 and 9 with de novo variants in STAG2. The patient numbers and variants are listed under the photograph

Table 2 Genotypes and phenotypes of patients with SNVs/indels in STAG1, STAG2, and PDS5A identified in current study

De novo heterozygous/hemizygous SNVs/indels in STAG2 (X-linked, NM_006603.4), including two stopgain variants, two missense variants, and one frameshift variant, were identified in four females (patients 7–10; patient 7, c.418C>T [p.Q140*]; patient 8, c.1605T>A [p.C535*]; patient 9, c.1811G>A [p.R604Q]; patient 10, c.1658_1660delinsT[p.K553Ifs*6]); and one male (patient 11 [hemizygous], c.476A>G [p.Y159C]) (Fig. 2b).These patients shared common clinical findings of DD/ID, hypotonia, microcephaly, dysmorphic features, and skeletal abnormalities (Table 2, Table S4). Skewed X-inactivation (XCI) was observed in patient 8, whereas XCI was noninformative for patient 7 due to homozygosity of the marker being used for the XCI study (data not shown). In our study, truncating variants were identified in 3/4 female patients, but not in males. Although this observation is based on a limited number of patients, it is consistent with the hypothesis that truncating variants of X-linked genes may impose more severe pathogenic effect on males than females.

One heterozygous SNV, c.2275G>T (p.E759*), in PDS5A (NM_001100399.1) was identified in patient 13 with severe developmental delay, marked hypotonia, failure to thrive, dysmorphic features, hyperextensible knees, eye anomalies, and skeletal abnormalities (Table 2, Table S4). Interestingly, this patient also had a concurrent heterozygous de novo SNV, c.3325A>T (p.K1109*), in ASXL3 (Bainbridge–Ropers syndrome, MIM #615485), which presumably explains the major phenotypes. This PDS5A variant is predicted to introduce a premature stop codon in PDS5A in the longer transcript (NM_001100399.1) but does not affect the shorter transcript (NM_001100400.1), suggesting a potential mild defect caused by this variant. However, the role of different isoforms of PDS5A in the cohesin complex is not well established in the literature. Notably, the father of patient 12, who shared the PDS5A p.E759* variant, had speech impediment. Although the pathogenicity of the p.E759* variant in PDS5A remains to be investigated, it may modulate the patient’s phenotype and constitute a dual diagnosis together with ASXL3. In addition, one heterozygous de novo SNV (c.654+5G>C) in PDS5A was identified in another patient with neurodevelopmental disorders. This intronic PDS5A variant was predicted to affect splicing of the major messenger RNA (mRNA) transcript of PDS5A by prediction programs including SpliceSiteFinder-like and MaxEntScan (http://www.interactive-biosoftware.com/doc/alamut-visual/2.6/splicing.html).

Finally, one de novo heterozygous SNV in WAPL (NM_015045.3), c.2192G>A (p.R731H), was identified in one patient with neurodevelopmental disorders. This observation corroborates a previous report in which a partial duplication involving WAPL was identified in a patient from a phenotype-driven CdLS cohort,24 providing further evidence for WAPL as a candidate disease gene.

Each of the variants in STAG1, STAG2, PDS5A, and WAPL described above were not observed in the control population databases including ExAC and ESP5400 (National Heart, Lung, and Blood Institute [NHLBI] Exome Sequencing Project, http://evs.gs.washington.edu/EVS/). The interpretation of deleterious effects of the de novo missense SNVs identified in this study was supported by multiple prediction algorithms (Table S5).

We identified CNV deletions affecting STAG1 and STAG2 in our clinical CMA cohort, supporting LoF as the presumed disease-contributing mechanism; no putative LoF CNVs of PDS5A, PDS5B, WAPL, or MAU2 were identified. In total, we identified three CNV deletions affecting STAG1 (two de novo, one of unknown inheritance) in patients with developmental disorders (Fig. 2c, Table S6). In the literature, six CNV deletions overlapping STAG1 were reported, with the smallest two deletions being intragenic (exons 2–5 and exons 13–18, respectively).13 Moreover, eight cases with neurodevelopmental disorders were reported in the DECIPHER database harboring relatively small-sized deletions (<5 Mb) affecting STAG1 (https://decipher.sanger.ac.uk/)25 (Fig. 2c, Table S6). These STAG1-overlapping deletions identified in affected patients strongly indicate that haploinsufficiency is likely to be the disease-contributing mechanism for STAG1. In addition, a 33.9-Kb CNV deletion with unknown inheritance encompassing exons 15–32 of STAG2 (predicted to result in an in-frame deletion p.L473_L1198del), was identified in patient 12 with dysmorphic features, microcephaly, and seizures (Fig. 2b, Table S6). This female patient showed skewed XCI, consistent with the observation in patient 8.

Patients with STAG1 and STAG2 variants have phenotypes overlapping the CdLS spectrum

We evaluated the clinical phenotypes for patients 1–2 (STAG1) and patients 7–11 (STAG2). Patient 3 (STAG1) was excluded from the evaluation because the identification of concurrent de novo variants in ASXL1 together with STAG1 may largely complicate the STAG1-alone phenotypes.

Patients described in this paper presented for genetic evaluation due to developmental delay and/or congenital anomalies but not with classic distinctive facial features or a recognizable pattern of malformation suggestive for a particular syndrome such as CdLS (Fig. 2d). The most common features among these patients with STAG1 and STAG2 variants were DD/ID, behavioral problems, hypotonia, seizures, microcephaly, failure to thrive, short stature, mild dysmorphic features, and 2–3 toe syndactyly (Table 2).

Clinical profiling suggested many overlapping features with CdLS, which include DD/ID, growth failure including short stature and microcephaly, hearing loss, synophrys, micrognathia, limb anomalies, and hypoplastic male genitalia. Some other less common features of CdLS, such as cutis marmorata, myopia, congenital diaphragmatic hernia (CDH), and renal anomalies, among others, were also observed in several of these patients. A more detailed characterization is described in Table 2 and Table S4.

Among the distinctive craniofacial features present in over 95% of the patients with a clinical diagnosis of CdLS,11 our patients collectively had microbrachycephaly, low-set ears, synophrys, long curly eyelashes, broad nasal bridge, anteverted nares, long and smooth philtrum, thin upper lip, and micrognathia; however, these features were not present concurrently in a single patient. Interestingly while microcephaly is one of the most characteristic features in CdLS, only 4/7 patients (one STAG1 and three STAG2) had microcephaly. Although the numbers are small, a higher percentage of microcephaly was observed in patients with a STAG2 variant (3/5) in comparison with STAG1 (1/2). In contrast to CdLS, where mild to severe limb anomalies are common and are usually helpful to establish a clinical diagnosis, the patients in this study had common but more subtle findings in their extremities, such as fifth finger clinodactyly and syndactyly. Skeletal anomalies including scoliosis (3/7), vertebral anomalies (3/7), and rib fusion (2/7) were observed in our patients, all with variants in STAG2. Even though these skeletal anomalies can be observed in patients with classic CdLS, vertebral and rib anomalies would be considered as rare or atypical features for CdLS.

Comparing patients with STAG1 or STAG2 variants, DD/ID and mild dysmorphic features have been consistently observed, which is in line with the previous reports13,14,15 (Table 2). Despite the small cohort size, it seems that patients with STAG2 variants have more multisystem congenital anomalies such as CDH, congenital heart disease, and vertebral anomalies. Growth failure was observed as well, but apparently more in the postnatal period than prenatally. Patients with a STAG2 variant appear to have more severe growth failure especially in weight and length parameters compared with those with STAG1 variants.

Although STAG1 and STAG2 have been implicated in cancers due to their function in the cohesin pathway and the observation of chromosomal segregation defects in defective cell lines (e.g., STAG2 as an indicator for myeloid neoplasms), onset of tumors has not been observed in our study nor in the patients reported in the literature with developmental disorders caused by constitutional pathogenic variants in STAG1 and STAG2 (refs.13,14,15). Moreover, no obvious increased risk of cancer is reported in patients with other cohesinopathies caused by defects in genes such as NIPBL, SMC1A, and SMC3 (ref. 1). Consistent with this observation, our chromosome analysis of one patient (patient 7) did not reveal any evidence for chromosomal segregation defects (data not shown).

Discussion

In this study, we applied a genotype-driven approach to decipher the genetic causes of cohesinopathy from a CES perspective. We describe a series of disease-contributing variants in known cohesinopathy genes, and also provide molecular evidence supporting the candidacy of recently described or new disease genes.

NIPBL defects are underrepresented in this cohort likely due to ascertainment bias associated with its more clinically recognizable presentations. The scarcity of putative LoF variants for certain cohesin genes including PDS5B and MAU2 in this cohort indicates that LoF variants in these genes may exert strong pathogenic effects on early development leading to incompatibility with life. Alternatively, the lack of evidence supporting the pathogenicity of variants in PDS5B and MAU2 could reflect limitations of interpreting missense variants based on proband-only CES. HDAC8 and SMC1A are the only two well-studied X-linked genes among the cohesin components. They seem to be relatively spared from the strong selection in human development possibly due to protection of pathogenic alleles in the gene pool by XCI in females. Consistently, variants in these two genes are highly represented in the CES cohort as compared with cohorts assembled by phenotypic characterization (Fig. 1b).

Patients harboring STAG1 or STAG2 variants seem to share many of the clinical features seen in the well-described CdLS phenotype. Apparently affected patients in our cohort are developmentally and intellectually as impaired as those with CdLS. However, their spectrum of growth, craniofacial, and musculoskeletal features are not as severe as the spectrum of CdLS. Overall, only one patient (patient 3 [STAG1]) fulfills the diagnostic criteria for CdLS by meeting the CdLS characteristic facial features.26 Note that the concurrent de novo variant in ASXL1 may largely contribute to the differential diagnosis of CdLS for patient 3 (Table S7). Although the currently available clinical information we had might not be as sufficient for a diagnosis of CdLS or other cohesinopathies, a “CdLS-like” syndrome started to emerge. The STAG1/STAG2-related disorders seem to be at the mild end of the CdLS spectrum, making the clinical diagnosis for these two genes more challenging for physicians. Putting together the constellation of clinical features might help to end the diagnostic odyssey earlier, and with this series of cases awareness can be extended. Given the challenges, comprehensive genomic analysis, such as CES, should be offered to efficiently provide a molecular diagnosis for these cohesinopathy conditions.

Notably, the LoF PDS5A variant (patient 13) was inherited from a father with speech impediment. Although the phenotypic consequence of this variant remains unclear (as discussed in Results), its potential contribution cannot be completely ruled out. Unfortunately, samples from the paternal grandparents or other relatives are not available for testing. Defects in the cohesin complex, as demonstrated in the CdLS genes, are likely to be detrimental to proper organismal development, and milder phenotypic consequences have been observed.11 With our experience of known CdLS gene variants among 10,698 individuals, two distinct novel pathogenic variants in RAD21 as well as one novel pathogenic variant in SMC1A (X-linked) were identified in three unrelated patients with neurodevelopmental disorders, all inherited from affected parents with milder phenotypes (Table 1). Moreover, transmission of a pathogenic variant between generations has been reported in STAG1 (ref. 13). Therefore, with the reported variable expressivity of the cohesin defects, it is plausible that the reproductive potential, genetic transmission, and severity of phenotype may be dependent on various factors, including the components being affected, the PV types, the inheritance mode (e.g., X-linked or autosomal dominant), and the downstream pathways disrupted by defects in a particular component. Thus, additional genotype–phenotype correlation studies are warranted to further delineate the spectrum of cohesinopathies.

The mutational landscape of cohesin genes in somatic cancer may represent an alternative view to reflect contribution of these genes to biological processes, with minimum selection as compared with that imposed during early human development. Among cancer samples deposited to the COSMIC database subjected to genome-wide screening, truncating variants were observed in all cohesin genes. While missense variants did not show any substantive difference between cohesin genes, putative LoF variants in STAG2 were highly represented in the somatic cancer cohort (Fig. 1c). LoF variants in STAG2 have been significantly associated with several cancers,27,28 suggesting a likely pleiotropic effect of STAG2, possibly with strong involvement in tumorigenesis. Interestingly, we have observed a patient with mosaic STAG2 LoF variant in the CES cohort. The patient does not have neurodevelopmental problems, but instead presented with hematological malignancy. Therefore, we considered the STAG2 defect in this patient as not being causal for a cohesinopathy. Consequently, caution should be taken when interpreting variants in cohesin genes by considering the possibility that they may arise as somatic changes after the critical period of early human development.

Accumulating evidence suggests that cohesin contributes to the topological organization of the genome, regulates DNA replication, and facilitates long-range gene transcription regulation.2,29,30 In addition, the interactions between cohesin and other transcription machinery and chromatin remodeling complexes to recognize specific genomic loci and regulate gene transcription have aggregated these complexes into the same pathways of transcription regulation.30,31,32,33 Notably, genes encoding components of chromosome remodeling and transcription regulation machineries, such as ANKRD11, AFF4, KMT2A, TAF1, and TAF6, have been associated with phenotypes reminiscent of CdLS.3,19,34,35,36 Such findings expand the molecular mechanism underlying cohesinopathies into transcriptional regulation. Interestingly, gene expression studies of patients with elevated dosage of STAG2 reveal a dysregulated transcriptome and pinpoint altered expression levels of developmentally important genes.37 Therefore, the versatility of cohesin in cohesion and transcription regulation warrants a further investigation of its downstream effectors.

In summary, the genotype-first approach focusing on a specific pathway enabled us to investigate patients with nonclassic cohesinopathy phenotypes; this approach also allowed us to discover patients with variants in new or recently reported disease genes, namely STAG1, STAG2, and potentially PDS5A and WAPL, which may further expand the genetic heterogeneity underlying cohesinopathies. Future studies of cellular phenotypes, with regard to functional studies of DNA repair and transcriptome analysis, are warranted to further elucidate the mechanistic consequences due to defects in specific cohesin components, which may shed light on precision medicine efforts targeting distinct molecular pathways.