Main

We investigated the bacterial populations that caused the cholera epidemic in Yemen by sequencing 42 V. cholerae O1 serotype Ogawa isolates that were recovered during this epidemic. Thirty-nine of these isolates were collected from patients with cholera who lived in three different governorates of Yemen (Fig. 1a, b). They span both waves of the epidemic, having been collected between 5 October 2016 and 31 August 2017. The three remaining isolates were collected from patients from a temporary refugee centre on the Saudi Arabia–Yemen border on 30 August 2017 (Fig. 1b). We also sequenced 74 7PET isolates from South Asia, the Middle East, and Eastern and Central Africa (Extended Data Fig. 1 and Supplementary Table 1). We placed these new isolates in the context of a global collection of 1,087 7PET genomic sequences2,3,4 (Supplementary Table 1) and constructed a maximum-likelihood phylogeny of 1,203 genomes, using 9,986 single-nucleotide variants (SNVs) that were evenly distributed across the non-repetitive, non-recombinant core genome (Fig. 2a).

Fig. 1: Geographical location of the sequenced V. cholerae O1 El Tor isolates and number of reported cholera cases.
figure 1

a, Aggregate number of suspected cholera cases per week in Yemen until 31 December 2017 (http://yemeneoc.org/bi/), showing the two epidemic waves. The dates of the isolates sequenced in this study are shown under the epidemic curve. b, Geographical location of the 42 V. cholerae O1 El Tor isolates from Yemen. The three isolates collected in Saudi Arabia (denoted by the asterisk) were obtained from Yemeni refugees from Hajjah District and are considered to be ‘Yemeni isolates’ throughout the manuscript. The number of cases per governorate is indicated according to a previous study1. The governate map of Yemen was created using QGIS version 2.16 (https://qgis.org) and the shape file was approved for use by the UN Office for the Coordination of Humanitarian Affairs (OCHA), OCHA Yemen country office (https://data.humdata.org/dataset/yemen-admin-boundaries). The small inlay map was created using QGIS version 2.16 using the Natural Earth base map version 4.0.0 (https://www.naturalearthdata.com).

Fig. 2: Phylogenetic relatedness of the V. cholerae O1 El Tor isolates from the 2016–2017 epidemic in Yemen.
figure 2

a, Maximum-likelihood phylogeny of 1,203 genomic sequences. M66 was used as the outgroup. The scale bar denotes substitutions per variable site (SNVs). Branches are coloured according to geographical location, inferred by stochastic mapping of the geographical origin of each isolate onto the tree. The inferred introduction events into Africa are indicated by the letter ‘T’. The sublineage labelled AME (Asia/Middle East) contains the most recent Middle Eastern isolates. b, Maximum clade credibility tree produced with BEAST for a subset of 81 representative isolates of the distal part of genomic wave 3 (that is, those with the ctxB7 allele). Geographical location of the isolates is indicated in the same colours as in a. Selected nodes supported by posterior probability values ≥0.8 are shown. Acquisition of the polymyxin susceptibility-associated non-synonymous SNV in VC1320 (vprA) is indicated. c, The geographical distribution of selected 7PET sublineages. An asterisk denotes data from a previous study4. The date ranges shown for introductions are the 95% credible interval estimate of the most recent common ancestor in years. Dashed lines and the grey area in T13 indicate that the sublineage was detected in East Africa before its appearance in Yemen. This does not represent a precise route of transmission. The maps were created with Tableau Desktop version 10.1.5 using the base map from © OpenStreetMap contributors (https://www.openstreetmap.org), available under an Open Database License.

We also detected a strong temporal signal, making it possible to estimate time-scaled phylogenies (Fig. 2b and Extended Data Figs. 24), which showed that the epidemic in Yemen originated from a recently emerged 7PET wave 3 clade5, which contains the cholera toxin subunit B gene variant ctxB7 (Fig. 2). All of the isolates from Yemen clustered together (median pairwise SNV difference of 3 (range, 0–13)), confirming that the two epidemiological waves that were observed during the epidemic in Yemen, which had very different clinical attack rates1, were produced by a single clone rather than arising from two separate introductions. We estimated the date of the most recent common ancestor of the isolates from Yemen to January 2016 (95% Bayesian credible interval, September 2015 to June 2016) (Fig. 2b, Extended Data Fig. 3 and Extended Data Table 1). Our phylogenetic analysis shows that the isolates from Yemen are different from those that have been circulating in the Middle East over the last decade, such as those isolated in Iraq in 2007 and 2015, and in Iran from 2012 to 2015 (Fig. 2a). These isolates from the Middle East also belong to 7PET wave 3, but are attributed to different sublineages on the phylogenetic tree and were imported from South Asia on two separate occasions. The isolates from Yemen are most closely related to isolates that were collected from outbreaks in Eastern Africa (Kenya, Tanzania3 and Uganda4) from 2015 to 2016 (Fig. 2). Collectively, these isolates belong to a new sublineage (T13), which corresponds to the most recent, newly identified introduction of 7PET into East Africa. All of these T13 isolates are different from those previously recovered in West or East Africa (sublineages T12 and T10, respectively) (Fig. 2). Our data suggest that this 7PET wave 3 clade, which contains all isolates with the ctxB7 allele, first emerged in South Asia in the early 2000s (Fig. 2b), consistent with the first detection of ctxB7 isolates in Kolkata, India in 20066. This ctxB7 clade has been exported to areas outside Asia on at least three separate occasions: West Africa (T12 introduction event)2 in 2008 (estimates with 95% credible interval), Haiti in 20107,8 and East Africa (T13 introduction event)4 between 2013 and 2014 (estimates with 95% credible interval) (Fig. 2b, Extended Data Fig. 3 and Extended Data Table 1).

In addition to the ctxB7 allele, all of the analysed isolates from Yemen had the following genomic features (Table 1): (1) the toxin-coregulated pilus gene subunit A gene variant tcpACIRS101; (2) a deletion (ΔVC0495–VC0512) within Vibrio seventh pandemic island II (VSP-II); and (3) an SXT/R391-integrating conjugating element (ICE) called ICEVchInd5/ICEVchBan5, which is associated with multiple-drug resistance.

Table 1 Characteristics of the 2016–2017 cholera epidemic strain from Yemen

Consistent with the genomic evidence, all of the isolates from Yemen have a similar narrow phenotype of antimicrobial drug resistance to nalidixic acid, the vibriostatic agent O/129 and nitrofurantoin (Table 1). Mutations in the DNA gyrase gene gyrA that resulted in an S83I amino acid substitution and mutations in the topoisomerase IV gene parC that resulted in an S85L substitution explain the resistance of the isolates from Yemen to nalidixic acid and their decreased susceptibility to ciprofloxacin. An approximately 10-kb deletion in ICE variable region III resulted in the loss of four genes that encode resistance to streptomycin (strA and strB), chloramphenicol (floR) and sulfonamides (sul2). The fifth gene of this region, which encodes resistance to the vibriostatic agent O/129 (dfrA1), is present in the isolates from Yemen. This deletion is not unique, as similar deletions that encompass the strA, strB, floR and sul2 genes, flanked by transposase genes, have independently arisen several times in 7PET wave 3 isolates2,9. The resistance of V. cholerae to nitrofurans is due to the loss of expression of a reductase enzyme that converts the drug into its active form10. By combining phenotypic and genotypic data, we found lesions in the VC0715 and VCA0637 genes of nitrofuran-resistant isolates (Extended Data Table 2). VC0715 and VCA0637 encode orthologues of the NfsA (52% amino acid identity) and NfsB (58% amino acid identity) proteins of Escherichia coli K12 (GenBank accession number NC_000913), respectively. In E. coli, disruption of the nitroreductases that are encoded by these genes confers nitrofuran resistance11. In all 7PET wave 3 isolates, including the isolates from Yemen, the observed mutations in VC0715 led to a R169C amino acid substitution and the mutation in VCA0637 introduced a premature stop codon (Q5Stop) that probably abolishes protein function.

The isolates from Yemen were also susceptible to polymyxins. This is an important finding, because resistance to polymyxin B has been used as a marker of the V. cholerae O1 El Tor biotype since the beginning of the seventh cholera pandemic in 196112,13. Unlike the El Tor biotype, the classical biotype (responsible for the six previous pandemics)14 is susceptible to polymyxin B. Polymyxin resistance is conferred by changes to the lipid A domain of the surface lipopolysaccharide, thereby altering its charge12,13. The vprA (VC1320) gene, disruption of which is known to restore susceptibility to polymyxin in 7PET isolates, is required for expression of the almEFG operon that encodes the genes that are required for the glycine modification of lipid A12. A specific non-synonymous SNV in vprA genes (predicted to result in a D89N substitution, Extended Data Fig. 5) was present in 97% (63 out of 65) of polymyxin B-susceptible isolates (Extended Data Table 2), including all of the isolates from Yemen. The first polymyxin-susceptible 7PET isolates with this VprA D89N substitution in our dataset were identified in South Asia in 2012 (Fig. 2b), consistent with microbiological data from Kolkata, India, where polymyxin B-susceptible V. cholerae O1 isolates emerged in 2012 and replaced polymyxin-resistant strains after 201415.

7PET isolates from the ctxB7 clade have been associated with the two largest cholera epidemics in recent history. In addition to the current Yemeni epidemic, the introduction of this sublineage into Haiti in 2010 in the wake of a devastating earthquake, resulted in one million cases and almost 10,000 deaths by 201716,17. These two major events highlight the threat that cholera continues to pose to public health in vulnerable populations. The UN (United Nations) estimates that 16 out of 29 million people in Yemen lack access to clean water and basic sanitation because of the destruction of public and health infrastructures during the years of civil conflict18. The complexity of the situation in Yemen before the epidemic was set against a backdrop of large acute watery diarrhoea or cholera outbreaks across the Horn of Africa (Extended Data Fig. 1), which serves as a major hub of migration into Yemen19,20. This region, which links Asia to Africa at the southern entrance of the Red Sea, has long been a crossroads of trade and communication routes. Several importations of 7PET cholera from Asia into the Horn of Africa are likely to have followed this route, such as T3 in 19702.

The available genomic data for the historical and current importations of the 7PET sublineage into Africa are not consistent with a local origin, but instead highlight the importance of human-mediated spread of the epidemic 7PET lineage from South Asia. An inability to obtain samples from countries in this region hampered our efforts to reconstruct the routes of transmission in East Africa before the appearance of this strain in Yemen more precisely.

In summary, a single recent 7PET sublineage with an unusual antimicrobial resistance phenotype is responsible for the cholera epidemic in Yemen. Our study illustrates the key role of genomic microbial surveillance and cross-border collaborations in understanding the global spread of cholera, the evolution of virulence and determinants of antibiotic resistance.

Methods

Data reporting

No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.

Bacterial isolates

The 116 7PET isolates sequenced in this study are listed in Supplementary Table 1 and originated from the collections of the French National Reference Centre for Vibrios and Cholera, Institut Pasteur, Paris, France (n = 6); the Central Public Health Laboratory of Baghdad, Iraq (n = 11); the Ministry of Health of South Sudan (n = 14); the Pasteur Institute of Iran (n = 4); the Maharishi Valmiki Infectious Diseases Hospital, Delhi, India (n = 29); the Central Public Health Laboratory of Sana’a, Yemen (n = 39), Amref Health Africa, Kenya (n = 1), the Kenya Medical Research Institute (n = 9) and the Ministry of Health of Saudi Arabia (n = 3). The isolates were characterized by standard biochemical, culture and serotyping methods21.

Antibiotic susceptibility testing

Antibiotic susceptibility was determined by disc diffusion on Mueller–Hinton agar, in accordance with the guidelines of the Antibiogram Committee of the French Society for Microbiology22. The following antimicrobial drugs (Bio-Rad) were tested: ampicillin, cefalotin, cefotaxime, streptomycin, chloramphenicol, erythromycin, azithromycin, sulfonamides, trimethoprim-sulfamethoxazole, vibriostatic agent O/129, tetracycline, doxycycline, minocycline, nalidixic acid, norfloxacin, ofloxacin, pefloxacin, ciprofloxacin, nitrofurantoin, polymyxin B and colistin (polymyxin E). E. coli CIP 76.24 (ATCC 25922) was used as a control. The minimum inhibitory concentrations (MICs) of nalidixic acid and ciprofloxacin were determined by ETESTS (bioMérieux). The MICs of colistin and polymyxin B were determined with custom-produced Sensititre microtitre plates (ThermoFisher Scientific) and MIC test strips (Liofilchem), respectively, on 34 isolates chosen on the basis of resistance phenotype, year and country of isolation.

Total DNA extraction

Total DNA was extracted with the Wizard Genomic DNA Kit (Promega), the Maxwell 16-cell DNA purification kit (Promega) or the DNeasy Blood & Tissue Kit (Qiagen) in accordance with the manufacturer’s recommendations.

Whole-genome sequencing

High-throughput genome sequencing was carried out at the genomics platform of Institut Pasteur (n = 107) or at the Wellcome Sanger Institute (n = 9) on Illumina platforms generating 92–295-bp paired-end reads, yielding a mean of 117-fold coverage (minimum 13.5-fold, maximum 639-fold). Short-read sequence data were submitted to the European Nucleotide Archive (ENA) (http://www.ebi.ac.uk/ena), under study accession numbers PRJEB24611 and ERP021285 and the genome accession numbers are provided in Supplementary Table 1.

Genomic sequence analyses

The genomic sequences were processed and analysed as previously described2. In brief, for each sample, sequence reads were mapped against reference genome V. cholerae O1 El Tor N16961 (GenBank accession numbers LT907989 and LT907990) using SMALT version 0.7.4 (http://www.sanger.ac.uk/science/tools/smalt-0) to produce a BAM file. Variants were detected with samtools mpileup23 version 0.1.19 with parameters ‘-d 1000 –DsugBf’ and bcftools23 version 0.1.19 to produce a BCF file of all variant sites. The bcftools variant quality score had to be greater than 50 (quality > 50) and mapping quality greater than 30 (map quality > 30). The majority base call was required to be present in at least 75% of reads mapping to the base (ratio ≥ 0.75) and the minimum mapping depth required was four reads, at least two of which had to map to each strand (depth ≥ 4, depth strand ≥ 2). A pseudogenome for each sample was constructed by substituting the base call at each site (variant and non-variant) in the BCF file in the reference genome. While this paper was under review, another paper4 was published that included three genome sequences from Ugandan isolates that belonged to the T13 sublineage. These three genome sequences were available as contig files and were added to the alignment with Snippy version 4.1.0 (https://github.com/tseemann/snippy), using the ‘–ctgs’ flag to call SNVs between the contigs and the reference genome. Short reads were assembled with SPAdes24 version 3.8.2 and annotated with Prokka25 version 1.5.

The code for the pipelines from the Sanger Institute used can be found here: https://github.com/sanger-pathogens/vr-codebase.

Phylogenetic analysis

Repetitive (insertion sequences and the TLC-RS1-CTX region) and recombinogenic (VSP-II) regions were masked from the alignment2. Putative recombinogenic regions were detected and masked with Gubbins26 version 1.4.10. A maximum-likelihood phylogenetic tree was built from an alignment of 9,986 chromosomal SNVs, with RAxML27 version 8.2.8 under the GTR model with 100 bootstraps.

BEAST28 version 1.10.1 was used to estimate time-resolved phylogenies for a spatially and temporally representative subset of 81 7PET isolates under the GTR nucleotide substitution model. We tested a combination of molecular clock and tree prior models to identify the best fit (Extended Data Table 1). Both path and stepping-stone sampling showed the best fit to be an uncorrelated relaxed clock (lognormal distribution of rates) model with a Bayesian skyline coalescent tree prior. Priors were kept at default values, with the exception of the ‘constant.popSize’ value, which was set to a lognormal distribution (initial value = 1, mu = 1, sigma = 10) under the constant population coalescence tree prior. The choice of model had little influence on the dating of key nodes in this analysis (Extended Data Table 1). For each model, we ran three independent Markov chain Monte Carlo chains over 50 million steps, sampling every 2,000 steps. We used a burn-in of 5 million steps for each chain and then combined chains, resampling every 10,000 steps. The effective sample size for all estimated parameters was greater than 200. We tested for an adequate temporal signal, using TempEst29 version 1.5, by calculating the linear regression between the root-to-tip distance and isolation date for each sample. We also performed 20 date-randomization tests with the R package TipDatingBeast30 to assess the mean rate under the uncorrelated lognormal relaxed molecular clock (ucld.mean parameter).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.