Main

The utility of the CRISPR–Cas9 system for gene therapy in humans has been recognized and extensively investigated1. Initial concerns about the off-target activity have been addressed by the development of sensitive detection methods, as well as modified Cas9 enzymes and improved delivery protocols that limit this type of damage2,3,4,5,6,7,8,9,10,11,12. The vast majority of on-target DNA repair outcomes after Cas9 cutting in a variety of cell types are thought to be insertions and deletions (indels) of less than 20 bp13,14,15. Although indels a few hundred nucleotides in size were also observed in experiments using Cas9 or other nucleases, they were reported to be rare16,17,18. Consequently, Cas9 has been assumed to be reasonably specific and the first approved clinical trials using Cas9 edited cells are underway (clinicaltrials.gov: NCT03081715, NCT03398967, NCT03166878, NCT02793856, NCT03044743, NCT03164135).

Studies using paired gRNAs to induce localized deletions also reported generation of more complex genotypes, such as inversions, endogenous and exogenous DNA insertions, and larger-than-expected deletions19,20,21,22,23. Single gRNAs were shown to induce deletions of up to 600 bp in mouse zygotes24. Deletions of up to 1.5 kb in a haploid cancer cell line potentially induced by single gRNAs have been described, but since the guides were directed to a small part of the genome and provided as a pool, the possibility of rare double-cutting events cannot be excluded25. Furthermore, the analysis of the alleles generated using both single and paired gRNAs has in most studies relied on amplification of short regions (<1 kb) around the target and potential off-target sites, limiting the scope of assessment. Lesions non-contiguous with the cleavage site, such as those reported in yeast upon I-SceI nuclease cutting, would also be missed by such short-range assessments26,27,28. Finally, cancer cell lines, whose genome and DNA repair mechanisms are abnormal, were often used in the context of studying Cas9-induced lesions, making extrapolations to normal tissues and cells problematic.

We speculate that current assessments may have missed a substantial proportion of potential genotypes generated by on-target Cas9 cutting and repair, some of which may have potential pathogenic consequences following somatic editing of large populations of mitotically active cells.

We first comprehensively explored allelic diversity induced by Cas9 at the X-linked PigA locus, which is hemizygous in male embryonic stem (ES) cells. In contrast to cancer-derived cell lines, ES cells have a normal karyotype and intact DNA repair mechanisms, which makes them more representative of a normal somatic cell. Although mouse ES cells and embryonic fibroblasts differ in their use of DNA repair pathways, it is not known how they compare to other somatic cells29. We introduced Cas9 and gRNA constructs targeting intronic and exonic sites of PigA into JM8 mouse ES cells using PiggyBac transposition. Cells with both constructs were selected and subsequently stained with FLAER reagent to quantify the proportion of PigA-deficient cells (Fig. 1a,b). Single gRNAs targeting exons 2 to 4 yielded very high rates of PigA loss (59–97%). Notably, single gRNAs targeting intronic sites also yielded PigA-deficient cells at significant frequencies. Ten different guides located 263–520 bp from the nearest exon caused 8–20% PigA loss, whereas two guides greater than 2 kb away induced 5–7% loss (Fig. 1c and Supplementary Table 1). We obtained similar results with transient expression using electroporation or lipofection of ribonucleoprotein complexes (RNP), proving that these observations were not a consequence of PiggyBac transposition, delivery method, antibiotic selection or cellular response to transfected plasmid DNA (Supplementary Fig. 1). Lower knockout efficiency using exonic guides correlated with slower editing dynamics when delivered by PiggyBac transposition (data not shown).

Figure 1: Frequency of PigA loss upon editing with exonic and intronic gRNAs in mouse ES cells.
figure 1

(a) Experimental design. Cells were transfected with separate PiggyBac transposons carrying gRNA and Cas9 genes and selected for stable transposition. PigA-negative cells (green) were sorted, single-cell clones isolated, the region around the cut site amplified, sequenced and mapped to the reference genome. (b) Examples of PigA editing revealed by FLAER staining, for two gRNAs and one control. Numbers on the x-axis identify individual gRNAs (Supplementary Table 1). (c) Frequency of PigA loss caused by Cas9 with intronic and exonic gRNAs (Supplementary Table 1; N = 6 biologically independent cell cultures). Each circle represents one cell culture. NC: negative control, a guide targeting Cd9. Thick bars represent exons, hollow ones indicate UTRs.

To understand what genetic changes underlie the generation of PigA-deficient cells, we amplified a 5.7-kb region around exon 2 from pools of cells edited with three selected gRNAs introduced by PiggyBac transposition, and sequenced the PCR products using the PacBio platform. We observed a depletion of read coverage on a kilobase-scale around the cut sites, consistent with the presence of large deletions (Fig. 2a). Cells edited with intronic guides and sorted for loss of PigA generally exhibited loss of the adjacent exon. If intronic regulatory sequences were present around the exon, the DNA of cells sorted for retention of PigA expression would be wild type or contain small indels around the cut site. However, the most frequent lesions in these cells were deletions extending many kilobases up- or downstream, away from the exon. We conclude that, in most cases, loss of PigA expression was likely caused by loss of the exon, rather than damage to intronic regulatory elements.

Figure 2: Analysis of the PigA locus edited with selected gRNAs.
figure 2

(a) Coverage of PacBio reads at the PigA locus. The locus was PCR-amplified from a pool of cells sorted for PigA expression (or from the unsorted population), and the resulting products were sequenced using the PacBio platform. The right panel depicts a 100-bp region centered at the cut site. NC: negative-control gRNA, ex: exonic gRNA (#56), 5′: 5′ intronic gRNA (#15), 3′: 3′ intronic gRNA (#10). The cut site of the gRNA (between 3rd and 4th nucleotide from the PAM sequence) is indicated with a vertical black bar. Genomic position is given with respect to the GRCm38 reference genome. N = 1. (bd) Examples of alleles. The bottom diagram line of each panel represents the PigA reference allele around exon 2, the diagram line immediately above shows the structure of the sequenced allele. (b) The top diagram line shows the genomic Hmgn1 gene structure; note the scale differs from that of PigA gene. (c) Exonic lesion non-contiguous with the cut site. (d) Inversion of a region containing the exon. Black horizontal line: direct reference match; orange bar: inversion; blue bar: insertion from another part of the genome; black arrowhead: gRNA target site. Gray and orange shadows represent, respectively, direct and inverted match between the reference and the sequenced allele. Lack of shadow at the reference locus represents a deletion in the sequenced allele.

Clustering of PacBio reads yielded 183 unique, edited, high-quality alleles derived from three different gRNAs. These alleles ranged from simple deletions and insertions to complex rearrangements (Fig. 2a,b, Supplementary Table 2 and Supplementary Data 1). One of the alleles contained an insertion with a perfect match to four consecutive exons derived from the Hmgn1 gene (Fig. 2b). We speculate this represents a de novo insertion from the spliced and reverse-transcribed RNA, rather than from one of the pseudogenized forms of Hmgn1, as the pseudogenes diverge in sequence from the observed insertion.

To fully characterize a variety of edited PigA loci, we isolated single-cell clones. The PigA loci around the gRNA target site were amplified using PCR primer pairs positioned progressively further apart (up to 16 kb), until amplicons were generated. These were sequenced using conventional Sanger sequencing technology (Supplementary Fig. 2a). This strategy allowed us to recover an allele in most cases (133/141, 94%; Supplementary Tables 2 and 3 and Supplementary Data 1).

Simple deletions overlapping both the cut site and the exon were found in almost three-quarters (69/93) of PigA-deficient alleles generated by single, intronic gRNAs (Supplementary Fig. 2b,c). The deletions varied in size, the largest spanning 9.5 kb. The remaining events were deletions combined with large insertions or more complex, multiple-lesion alleles. We obtained similar results using electroporation of RNP (Supplementary Fig. 1b). To assess the frequency of large deletions without strong selection for that outcome, we used an exonic gRNA causing 97% PigA loss. Although two-thirds of alleles (32/48) from PigA-deficient cells had indels <50 bp, as expected, >20% (10/48) had deletions >250 bp, extending up to 6 kb (Supplementary Fig. 2d). Because the deletions generated with the exonic gRNA were bidirectional, this is consistent with the average frequency of generating PigA-deficient cells with intronic guides positioned 263–520 bp from an exon (12%).

Notably, 23 of 133 recovered alleles contained additional lesions (single-nucleotide polymorphisms (SNPs), indels, large deletions and insertions) that were non-contiguous with the lesion at the cut site. In 13 out of 23 cases, the only exonic lesion detected was non-contiguous with the cut site (Fig. 2c). Furthermore, we observed alleles in which the intronic gRNA caused an inversion of a region containing the exon (Fig. 2d). Had the assessment been limited to the immediate vicinity of the cleavage site, such alleles would have been misclassified as wild type, and their phenotypic consequences would have been underestimated.

Insertions were present in 35 out of 133 recovered alleles. We could not find convincing local mapping for insertions shorter than 7 bp (13 alleles), which we speculate to be mostly non-templated nucleotides. The large majority of other insertions were constituted from sequence, which mapped to the PigA locus and encompassed inversions and duplications ranging from 11 bp to 2.5 kb (17 alleles; Fig. 2c,d and Supplementary Fig. 2c). The remaining five alleles contained DNA sequences that mapped to other parts of the mouse genome, such as interspersed repeats, or to exogenous, transfected sequences.

Six alleles did not contain lesions overlapping the nearest exon. Three of these were also wild type around the cut sites and are likely to contain lesions in other exons or larger rearrangements. The remaining three alleles contained only intronic lesions, which may interfere with splicing. In eight cases, it was not possible to recover any product with exon-spanning primers (Supplementary Fig. 3a, black primer pairs). To understand this class of events, we performed additional PCRs targeting each end of the PigA locus (Supplementary Fig. 3a, gray primer pairs). In three cases, just one end or neither end of the locus could be amplified, suggesting a larger deletion. In the remaining five cases, both ends were amplified. Since no product connecting the two ends could be obtained, these are likely to be translocations, inversions or large insertions (Supplementary Table 4).

To understand the diversity of potential deletion outcomes, we have repeated our original experiment in biological quadruplicate using the 5′ intronic gRNA. Cells with large deletions were enriched by sorting for PigA-negative cells and deletion fingerprints were generated by PCR. Each biological replicate differed substantially, despite a large number of unique deletion events sampled, indicating that the diversity of potential deletion outcomes is vast (Supplementary Fig. 4 and Supplementary Note).

Given that PigA is mono-allelic in the XY ES cells used in this study we wished to exclude the possibility that the observations reflect some peculiarity of the lack of a homolog. The autosomal Cd9 locus was selected for this purpose as it is non-essential in ES cells and its protein product can be readily detected by cell surface staining. An exonic guide yielded 88% Cd9 loss, while 5′ and 3′ intronic guides generated 4.2% and 5.4% Cd9 loss, respectively (Fig. 3a,b and Supplementary Table 1a). Taking into account a 1.6% background of Cd9low cells in the untransfected condition, we estimate the true proportion of Cd9 loss due to intronic cutting to be between 2.6–3.8%. This is consistent with results at the PigA locus, assuming both Cd9 alleles have to be destroyed to prevent Cd9 expression.

Figure 3: Analysis of Cas9 editing at the autosomal Cd9 locus in mouse ES cells.
figure 3

Experimental setup is analogous to the PigA experiment in Figure 1a. A mouse ES cell line derived from an F1 cross between CAST and BL6 was used. (a) Positions of primer pairs and gRNAs (Supplementary Tables 1 and 6). Genomic position is given with respect to the GRCm38 reference genome. (b) Examples of Cd9 editing revealed by antibody staining, for two gRNAs and one control (Supplementary Table 1; N = 7 biologically independent cell cultures). (c) PacBio alleles derived from Cd9-positive, mixed (bimodal) and Cd9-negative, individually sequenced single-cell clones, displayed as a pileup. Display conventions as in Figure 2. N = 1. (d) Recombinant alleles. Two of the sequenced single-cell clones contained alleles indicative of a cross-over event between the homologous chromosomes. Red vertical bars in CAST allele (gray bar) indicate positions of sequence divergence from the BL6 reference genome (black bar), dotted black line indicates missing sequence (deletion), thin black line indicates an intron. LOH: loss of heterozygosity.

To describe the genetic events underlying Cd9 loss, we isolated single-cell clones edited with the 3′ intronic guide, ascertained their expression status by flow cytometry and sequenced the area around the cut site using PacBio and Sanger technologies. The largest deletion spanned 5.5 kb. A pileup of 185 resolved alleles derived from 93 single-cell clones shows a clear enrichment for deletions overlapping the exon in clones negative for Cd9 compared to positive clones and ones exhibiting a mixture of Cd9-positive and Cd9-negative cells (Fig. 3c). The bimodal expression pattern of some of the clones may be the result of a mixed clone or a protracted repair event that was resolved during clone outgrowth. The haplosufficient nature of the Cd9 gene is demonstrated by the fact that we could detect at least one allele with an intact exon in all but one of the 66 Cd9-positive and mixed clones. Similarly, only one of the 27 Cd9 negative clones had an intact exon, this exception presumably harboring other undetected lesions. We have further confirmed by PCR genotyping that large deletions are a common outcome in single-cell clones edited at the Cd9 locus using additional intronic and exonic guides (Supplementary Table 5 and Supplementary Note).

The experiment at the Cd9 locus was performed in mouse ES cells derived from an F1 cross between Mus musculus (BL6) and Mus musculus castaneus (CAST) mouse strains, which allowed us to distinguish the homologous chromosomes. In no case was the repair outcome identical between homologs within a clone, despite 15 alleles reoccurring between clones. This result is consistent with the great diversity of outcomes at the PigA locus. Just over half of the edited clones (52 out of 93) contained precisely one CAST and one BL6 allele, as expected. Notably, in 18 clones only one allele was detected, potentially due to translocations, very large deletions, insertions or inversions, monosomy or loss of heterozygosity (LOH) either local or chromosome-wide. 21 clones contained an abnormal number of alleles, which could have resulted from a mixed clone, large duplication, repair events happening during clone outgrowth or aneuploidy induced by Cas9 cutting. Finally, two clones contained recombinant BL6-CAST alleles (Fig. 3d). In one case, an LOH event distal to the breakpoints converted part of the CAST allele to BL6. In another case, the BL6-CAST crossover boundary did not coincide with the breakpoint. We conclude that the creation of these alleles likely involved interhomolog strand invasion as they cannot be explained by a simple rejoining of the resected ends of two broken chromosomes.

To investigate whether the observed on-target extensive DNA repair-associated damage is an intrinsic property of undifferentiated mouse ES cells, we examined the consequence of editing in a human differentiated cell line. An immortalized human female retinal pigment epithelial cell line (RPE1) was used. Although this is a female cell line, X-inactivation renders it functionally hemizygous at the PIGA locus. Editing PIGA with single exonic and intronic gRNAs delivered with PiggyBac vectors, resulted in a loss of PIGA at frequencies comparable to those observed in mouse ES cells (Fig. 4a,b). PCR genotyping and Sanger sequencing of 41 PIGA-deficient single-cell clones edited with intronic gRNAs revealed large deletions, insertions, inversions and non-contiguous lesions overlapping the exon (Fig. 4c–e). In some clones only one small, intronic indel allele was detected, which we interpret as an inconsequential edit of the inactive chromosome, coupled with a loss-of-function lesion on the active X-chromosome; the lesion would inactivate one or both primer binding sites.

Figure 4: Frequency of PIGA loss upon editing with exonic and intronic gRNAs and structure of the recovered alleles in human RPE1 cells.
figure 4

Cas9-expressing cells were transfected with PiggyBac transposons carrying a gRNA and selected for stable transposition. PIGA-negative cells were sorted, single-cell clones isolated, and the region around the cut site amplified, sequenced and mapped to the reference genome. (a) Examples of PIGA editing revealed by FLAER staining, for two gRNAs and one control. (b) Frequency of PIGA loss caused by Cas9 with intronic and exonic gRNAs (Supplementary Table 1; N = 3 biologically independent cell cultures). Position of the primers with the largest span (6 kb) is indicated. (ce) Recovered alleles. 5′ intronic guide #275 (c), 5′ intronic guide #274 (d), 3′ intronic guide #276 (e). The position of the gRNA is shown as a vertical line intersecting with the PIGA gene structure. Pure insertions and deletions of <50 bp are indicated with orange and black circles, respectively. Combined insertion/deletion events of <50 bp and SNPs ('indels') are indicated with a red circle. Black lines represent deletions >50 bp. Orange bars indicate size of the >50-bp insertions (but not their map position). They are centered on the insertion locus or on the associated deletion. Thin, horizontal, dashed line separates clones.

Similar results were obtained in lineage-negative cells from the bone marrow of mice homozygous for a Cas9-GFP cassette at the Rosa26 locus. Progenitor cells enriched by removal of differentiated cells on magnetic columns were electroporated with a crRNA:trRNA complex against the GFP locus, and GFP-negative single-cell clones were isolated and genotyped around the cut site with three different primer pairs spanning up to 3.6 kb. At least one large deletion product between 100 bp and 3 kb in size was detected in 35 out of 96 clones (Supplementary Fig. 5a,b). We verified eight deletion products by Sanger sequencing across the deletion junction (Supplementary Fig. 6a). Only wild-type-size products were detected in the remaining clones and none of the 96 control clones exhibited any deletion bands (Supplementary Table 5, “progenitor” experiment).

The editing in this study was conducted at actively transcribed loci in normal ES cells and progenitor cells, both with intact DNA repair processes, as well as in an immortalized, differentiated human cell line; each are surrogates for various clinical editing applications. We show that extensive on-target genomic damage is a common outcome at all loci and in all cell lines tested. Moreover, the genetic consequences observed are not limited to the target locus, as events such as loss-of-heterozygosity will uncover recessive alleles, whereas translocations, inversions and deletions will elicit long-range transcriptional consequences. Given that a target locus would presumably be transcriptionally active, mutations that juxtapose this to one of the hundreds of cancer-driver genes may initiate neoplasia. In the clinical context of editing many billions of cells, the multitude of different mutations generated makes it likely that one or more edited cells in each protocol would be endowed with an important pathogenic lesion. Such lesions may constitute a first carcinogenic 'hit' in stem cells and progenitors, which have a long replicative lifespan and may become neoplastic with time. Such a circumstance would be similar to the activation of LMO2 by pro-viral insertion in some of the early gene-therapy trials, which caused cancer in these patients30. Results reported here also illustrate a need to thoroughly examine the genome when editing is conducted ex vivo. As genetic damage is frequent, extensive and undetectable by the short-range PCR assays that are commonly used, comprehensive genomic analysis is warranted to identify cells with normal genomes before patient administration.

Methods

Mouse ES cell culture and transfection.

gRNA-expression vectors contain a U6 promoter with an “F+E” scaffold31 and a PGK-Puro-2A-BFP cassette, flanked by PiggyBac repeats. The Cas9-expression vector contains a Cas9-Blast cassette expressed from a short EF1α promoter in a pKLV backbone13,32. CAST/BL6 (CB9; a gift from Prof A. Fergusson-Smith), AB2.2 mCherry/GFP reporter (a gift from X. Gao and P. Liu) or JM8.A3 mouse ES cells33,34 were cultured in M15 media (high-glucose DMEM, with 15% FSC, beta-mercaptanol and L-glutamate) on STO-neo-LIF-puro (SNLP) feeder cells.

Complexes of lipofectamine LTX (2.5 μl), plus reagent (0.5 μl), 200 ng hyperactive PiggyBac transposase35, 100 ng of the PiggyBac Cas9-Blast plasmid and 50 ng of the PiggyBac gRNA-Puro plasmid were prepared in 50 μl OptiMEM following manufacturer′s instructions. Cells were trypsinized, washed in M15, resuspended in M15+LIF and seeded onto a gelatinized 24-well plate, containing the lipofectamine DNA complexes, at 3 × 105 cells per well. From day 2, M15+LIF media containing puromycin (3 μg/ml) and blasticidin (10 μg/ml) was used. The same setup was used for RPE1 cell line, except the Cas9-Blast plasmid was omitted. A similar setup was used for lipofection of RNP complexes with 20 pmol of both hybridized crRNA:trRNA (Sigma) and EnGen Cas9 NLS (NEB). Neon Transfection System (Thermo Fisher Scientific; 1,600 v/10 ms /3 pulses) was used for electroporation of 1.5 × 105 cells in buffer R with 6 pmol each of crRNA:trRNA, electroporation enhancer (IDT) and Cas9 protein or 9 pmol each of crRNA:trRNA and Cas9 protein. Around 3 × 105 cells were collected on day 14 (or day 17, in case of the RPE1 cells), stained in PBS+0.1% BSA for 30 min at room temperature with 1 μg/ml FLAER reagent (Cedarlane) or anti-Cd9-PE antibody (cat. 124805, Biolegend), washed twice and analyzed using a Cytoflex flow cytometer. For single-cell cloning and PacBio experiments, cells were transfected in six-well plates with five times more cells and reagents, expanded onto 10-cm dishes and sorted by fluorescence-activated cell sorting for loss of FLAER or Cd9 staining on day 14 using MoFlow XDP (Beckman Coulter). Single-cell clones were isolated and grown in 96-well plates. DNA was extracted by proteinase K digestion followed by ethanol precipitation. PCR reaction were conducted using primers in Supplementary Table 6 and LongAMP polymerase (NEB) following manufacturer's instructions.

Bioinformatics.

Primers were designed using Primer3-BLAST (Supplementary Table 6). Guide RNAs were designed using Benchling and CRISPRscan36. Alignment of Sanger-sequenced PCR products was performed using BLAT (v 36) and converted into BAM format using a customized script from T. Marschall (https://github.com/ALLBio/allbiotc2/tree/master/synthetic-benchmark). Mixed traces were resolved using PolyPeakParser37. Analysis of PacBio data was performed using command line version of SMRT-Link software (pbtranscript 1.0.1.TAG-1470). For PigA locus pileup, circular consensus sequences were called with at least one full pass and minimum predicted accuracy of 0.9. Individual PigA and Cd9 alleles were reconstructed by following “Running Iso Seq using SMRTLink” tutorial on github, except “–targeted_isoseq” option was used at the clustering step. Resulting alleles were mapped to the reference genome using bwa mem (v 0.7.17-r1188). In case of the PigA locus, mapped reads were clustered furthered using a custom script. Genome coverage was calculated with “bedtools genomecov –dz” (v 2.27.1) using circular consensus sequences (PigA locus) or reconstructed alleles (Cd9 locus). All downstream analysis was performed using custom R (v 3.3.2) and bash scripts and visualized with ggplot2 package. Flow cytometric data were processed with FlowJo (v 10.4.1).

Mouse bone marrow cell culture and transfection.

Bone marrow cells from a homozygous C57BL/6 CAS9-EGFP knock-in mouse38 were isolated by flushing tibias and femurs in HBSS (Life Technologies) supplemented with 2% FBS and 10 mM HEPES (Sigma). Lineage negative cells were isolated using Direct Lineage Cell Depletion Kit Mouse (Miltenyi Biotec) and cultured in X-Vivo (Lonza) with 2% FBS, 50 ng/ml stem cell factor, 50 ng/ml thrombopoietin, 10 ng/ml IL-6 (Peprotech). After culturing for 3 h, 1 × 105 cells were electroporated (1550 v/20 ms/1 pulse) in buffer T with 44 pmol of preassembled crRNA:trRNA duplex (guide #311, Supplementary Table 1; IDT) using the Neon Transfection System. GFP-negative cells were sorted 4 d after the electroporation and plated into Methocult M3434 media (6,000 cells per 3 ml, StemCell Technologies). Seven days later, single colonies were picked into 25 μl of direct PCR lysis buffer (Peqlab) with 1 μg/ml proteinase K and analyzed by PCR (Supplementary Fig. 5 and Supplementary Table 5).

Life Sciences Reporting Summary.

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.

Data availability.

PacBio sequencing data are accessible at the European Nucleotide Archive under accession numbers ERS2396492 (PigA) and ERS2396493 (Cd9). Barcoding information is in Supplementary Data 2. Correspondence and requests for materials, additional data and code should be addressed to A.B. (abradley@sanger.ac.uk).

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.