Introduction

Mitochondria are involved in energy metabolism, apoptosis, aging, disease and oxidative phosphorylation1,2,3. Arthropod mitochondrial genomes (mtDNA) are generally circular, duplex molecules of 14–19 kb in length4,5,6. Insect mtDNA genomes contain a remarkably conserved set of 37 genes including 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes, 13 protein-coding genes (PCGs) and a control region (CR) or A + T-rich non-coding region6,7. Mitochondrial genomes have been widely used in phylogenetic studies and comparative and evolutionary genomics of insects and as molecular markers of population genetics and evolution6,8,9,10.

Liriomyza chinensis belongs to the group Phytomyzinae, family Agromyzidae and order Diptera and causes significant damage to Allium spp.11. The damage incited by L. chinensis on onions is very similar to other Liriomyza spp., the mining of leaves by larvae and puncturing of foliage by females for feeding and oviposition reduces photosynthesis, which leads to lower crop quality and quantity12,13,14,15. The leafminer L. chinensis has become a serious pest of onions in many countries and regions, especially in East Asia11,16,17. The taxonomic status of L. chinensis is particularly controversial over the past decades, but now its classification status is settled. Kato considered L. chinensis as a sub-species of Dizygomyza cepae (also known as L. cepae)18, and Hering reported that larvae of L. chinensis and L. cepae had identical spiracle structures. The male genitalia of L. chinensis show similarity to L. cepae and both species have characteristically pale wings and solid black scutellum19, these characteristics are different from the typical form of Liriomyza, represented by L. nietzkei11. Subsequent speciation produced L. chinensis in China, Japan and Malaysia, and L. cepae in western Europe, which are reproductively isolated from L. nietzkei. Consequently, Spencer upgraded the classification of L. chinensis from sub-species to species and assigned it to the genus Liriomyza11.

The systematics of Agromyzids is rather poorly understood due to their small size and morphological homogeneity. For molecular phylogenetic study of Agromyzids, Scheffer et al. investigated the phylogenetic relationships among genera within the Agromyzidae using parsimony and Bayesian analyses of the mitochondrial COI gene, the nuclear ribosomal 28 S gene, and the single copy nuclear CAD gene20. But the study on phylogenetic relationship of the genus Liriomyza based on whole mitochondrial genome was relatively limited. The development of improved sequencing technology has generated the complete or near complete mitogenomes of five Agromyzids including L. sativae21, L. trifolii22,23, L. bryoniae23, L. huidobrensis24, and Chromatomyia horticola25, which provide the basis for studying the phylogeny of Agromyzid species.

The mtDNA sequence of L. chinensis has not been previously reported and would be valuable in clarifying the taxonomic issues described above. In this paper, we report the complete mitochondrial genome of L. chinensis and provide a thorough description of its structural features. The L. chinensis mitogenome was compared with mtDNA sequences of five other related species to better understand taxonomy and phylogeny within the Agromyzidae.

Results and Discussion

Genome organization

The complete mitochondrial genome of L. chinensis is a circular 16,175 bp molecule (GenBank accession no. MG252777). It includes 37 mitochondrial genes (13 PCGs, 22 tRNA genes and two rRNA genes) and a large non-coding region (control region) (Fig. 1). The gene order in the L. chinensis mitochondrial genome is identical to D. melanogaster26, which is the classic structure for Diptera. There are 23 genes located on the J-strand (nine PCGs and 14 tRNAs) and 14 genes on the N-strand (four PCGs, eight tRNAs and two rRNAs). Sixteen intergenic spacers were identified with a total length of 64 bp; these ranged in size from 1–19 bp with the longest intergenic spacer located between tRNAGlu and tRNAPhe. There were nine overlapping genes in the mitochondrial genome of L. chinensis; the longest overlap was 8 bp and mapped between tRNATrp and tRNACys (Table 1).

Figure 1
figure 1

Map of mitochondrial genome of L. chinensis. Genes outside the map are transcribed in a clockwise direction (J-strand), whereas those inside the map are transcribed counterclockwise (N-strand). The second circle shows the GC content and the third shows the GC skew. GC content and GC skew are plotted as the deviation from the average value of the entire sequence.

Table 1 Annotation of the mitochondrial genome of L. chinensis.

The mitochondrial genome length of L. chinensis was similar to other Agromyzidae family members. Exceptions were L. sativae and C. horticola; the former contains a short A + T region21, and the latter lacks an A + T region due to incomplete sequencing25. The gene order is identical to the ancestral trnItrnQ-trnM arrangement. There were no gene rearrangements in the six Agromyzidae mitogenomes, which indicates that the mitochondrial gene order is highly conserved in Agromyzidae. Furthermore, the length and position of intergenic spacers was also highly conserved.

Nucleotide Composition

The nucleotide composition of the L. chinensis mtDNA showed an obvious bias for A and T. The A + T content of the whole genome was 78.3% (A = 41.3%, T = 37.0%, G = 8.9%, C = 12.8%). The A + T content of isolated PCGs, tRNAs, rRNAs, and the control region exceeded 75%, and the control region had the highest A + T content (89.4%) (Table 2). This strand bias in nucleotide composition is a universal phenomenon in metazoan mitochondrial genomes and is evident by a comparative analysis of AT- and GC-skews4,6,27. The Agromyzids mtDNAs showed a positive AT- and negative GC-skew over the entire genome (Table 3). The PCGs, tRNAs and rRNAs of the six Agromyzids mtDNAs show a relatively consistent A + T content and AT-skew (Table 3). The A + T-rich region L. chinensis exhibited a lower A + T content (~90%) in comparison to the other congener mitogenomes. The underlying mechanism of the A + T bias has been explained by asymmetric mutation and selection pressure during replication and transcription28. We tried to determine if there were any relationships between the A + T content and phylogeny but clear patterns were not evident. The A + T nucleotide bias has significance for the study of replication, transcription and rearrangement of the mitochondrial genome.

Table 2 Nucleotide composition of the L. chinensis mitogenome in different regions.
Table 3 Nucleotide composition in regions of Agromyzidae mitogenomes.

Protein-coding genes

The nucleotide bias was also reflected in the 13 PCGs, which had a relatively high A + T percentage (~76.1%, Table 2). The average A + T content among PCGs in L. chinensis was 76.1%. The A + T content of the third codon position (81.4%) was higher than the first (74.7%) and second codon (72.3%) positions (Table 2), which may suggest that both higher mutation rates and increased A + T content are related and depend on a relaxed selection at the third codon position21,29.

Eleven PCGs of L. chinensis were found to initiate with ATN (five with ATT and six with ATG). However, ND1 and COI started with GTG and the special quadruplet start codon of ATCA (Table 1), respectively, which agreed with L. trifolii and L. sativae, but differed from other Agromyzidae species10,21. These special start codons are converted into typical initiation codons by RNA editing during transcription30, which can reduce the intergenic spacer and avoid gene overlap31.COI genes generally use nonstandard and varied start codons in insects. Among the six Agromyzids, five Liriomyza species all used “ATCA” as special quadruplet start codon21,23, while for C. horticola, which used the “TTG” as nonstandard start codon25. Consequently, the use of nonstandard initiation codon in COI gene was not unexpected and it was shown to be dependent on the translated amino acid sequence and subsequent sequence alignments32. Eight PCGs used the typical termination codons TAA and TAG (ND1), whereas ND2, ND5, ND4, and CYTB used incomplete stop codons with T as a termination signal (Table 1). ND4L used TA as a termination signal, which has been reported in other Agromyzids10. Incomplete termination codons are common in metazoan mitochondrial genomes. It has been speculated that the polyadenylation site is generated by adding A to the 3’ end of the mRNA transcript, which is then converted into a complete stop codon for termination of transcription33.

The relative synonymous codon usage (RSCU) values of the L. chinensis mitogenome were calculated and illustrated, and the RSCU for Agromyzidae is shown (Fig. 2). The use of anticodons NNA and NNU indicated a preference for A or T in the third nucleotide of PCG anticodons. All possible codons are present in the PCGs of L. chinensis and the other four Liriomyza spp., whereas GCG was not found in C. horticola. Previous research indicates that codons with high G and C content are generally not favored, a phenomenon with low GC content that is found in some insects, such as moths34,35, stonefly36, whitefly37 etc.

Figure 2
figure 2

The mitochondrial genome relative synonymous codon usage (RSCU) across six Agromyzidae flies. Codon families are provided on the X axis.

The rates of nonsynonymous (Ka) and synonymous substitutions (Ks) and the Ka/Ks ratio were calculated for all PCGs in the six Agromyzidae mtDNA genomes using D. melanogaster as a reference sequence (Fig. 3). All Ka values were less than Ks values; consequently, the Ka/Ks ratios were less than 1 (Fig. 3), indicating the likelihood of purifying selection in these species38. Agromyzidae species generally show relatively consistent evolutionary rates, which may be related to their relatively constant habitat as larval leafminers39,40.

Figure 3
figure 3

Evolutionary rates of Agromyzidae flies mitochondrial genomes. The number of nonsynonymous substitutions per nonsynonymous site (Ka), the number of synonymous substitutions per synonymous site (Ks), and the ratio of Ka/Ks for each Agromyzidae flies mitochondrial genome are given, using that of D. melanogaster as a reference sequence.

tRNA genes

The tRNA genes of all the six Agromyzidae species contained an A + T content exceeding 76%. Twenty-two complete tRNAs were identified in the L. chinensis mtDNA, and 20 were discovered using tRNAscan-SE. tRNAArg and tRNASer(AGN) could not be detected by software, but were instead determined through comparison with published Agromyzidae mitochondrial genomes (Fig. 4). The typical number of tRNA genes in the mtDNA of Agromyzids was 22, but in L. trifolii and L. bryoniae, there were two additional tRNA genes in the A + T-rich region near 12 S rRNA10. Since the anticodons of the four additional tRNAs were atypical, Yang et al. suggested that these additional tRNAs were generated by gene duplications that could be folded into tRNA-like secondary structures but were nonfunctional10. All L. chinensis tRNAs folded into the typical clover-leaf structure except for tRNASer(AGN), which lacked the dihydrouridine (DHU) arm (Fig. 4). The DHU arm of tRNASer(AGN) formed a large loop instead of the conserved stem-and-loop structure (Fig. 4). Atypical numbers and structures of tRNA have been reported in other insects8,38,41,42,43. The factors that may have led to these truncated tRNAs remain unknown, although truncation may be a result of generalized evolutionary pressures for size reduction in mitogenomes6,44, but such an explanation requires the existence of compensatory mechanisms. Thus, the abnormal structure of tRNASer(AGN) warrants further study. The L. chinensis tRNAs ranged from 63 (tRNAArg(R)) to 72 (tRNAVal(V)) nucleotides; the length of the tRNA usually depends on the size of the variable and D-loops45. Based on the secondary structure of the tRNAs in the L. chinensis mtDNA, there were 19 unmatched nucleotides including 14 G-U and five U-U pairs; these mapped to the amino acid, TψC, and anticodon arms. A total of 27 mismatched bases were reported in the tRNA of L. sativae, these included 21 G-U, four U-U, one A-A and one A-C pairs that were located in the AA (8 bp), DHU (10 bp), AC (5 bp) and TψC arms (4 bp), respectively21.

Figure 4
figure 4

Inferred secondary structures of tRNAs from the L. chinensis mitogenome. The tRNAs are labelled with the abbreviations of their corresponding amino acids. Structural elements in tRNA arms and loops are illustrated as for trnV.

rRNA genes

The boundaries of rRNA genes were identified by sequence alignment with published Dipteran sequences. The L. chinensis mtDNA contained the 16 S rRNA and 12 S rRNA, which mapped between tRNALeu(CUN)/tRNAVal and tRNAVal/control region, respectively (Fig. 1). The 16 S and 12 S rRNA genes are 1323 and 785 bp with an A + T content of 83.3 and 81.0%, respectively. The two rRNAs mapped to the same location as described for other Agromyzidae mitogenomes. Both the 16 S and 12 S rRNAs have been widely used for population genetics, molecular phylogeny and species identification46,47. However, the rRNA sequences of Agromyzids were relatively conserved23, and thus would not provide much useful insight regarding population genetics. However, the secondary structure of rRNA genes in Liriomyza spp. may contain potentially useful information. For example, the 12 S rRNA of L. huidobrensis showed more variability in sequence and structure of the H51-H100 region arm as compared to L. trifolii; thus, this region may be a potential marker for identification of Liriomyza spp.24.

Control Region (A + T-rich region)

The control region of the L. chinensis mtDNA is located between the 12 S rRNA and tRNAIle genes (Fig. 1); it consists of 1367 nucleotides and has the highest A + T content (89.4%) of the mtDNA genome (Table 1). This region varies greatly in length among insects, ranging from 70 bp to 13 kb48,49, and it accounts for most of the variation in mtDNA size. The control regions in L. trifolii, L. bryoniae, L. huidobrensis, and L. sativae have a high A + T content (>90%), also map between the 12 S rRNA and tRNAIle genes, and are 1338, 1354, 1416, and 741 bp in length, respectively21,22. The A + T-rich region is the fastest evolving region in the mitochondrial genome50, and a comparative analysis of mtDNA sequences in Drosophila shows that divergence in the control region is very significant in most species51. However, there are five conserved structural elements have been found in the control region of many insects including a poly-T stretch, a [TA(A)]n-like stretch, a highly conserved stem-and-loop structure, a pair of sequences immediately flanking the stem/loop structure with reiterated TATA and G(A)n T consensus sequences, and a G + A-rich stretch downstream of the secondary structure49. We identified several conserved structural elements in the control region of the L. chinensis mtDNA; these included one poly-T stretch, two (TA)n stretches and one poly-A stretch (Fig. 5). The A + T-rich region is the largest noncoding region in mtDNA and is associated with replication and transcription, which is why it is named the control region6,52. It is highly variable both in content and size due to insertions and deletions, variation in copy numbers of tandem repeats, and extensive change in the length of a variable domain50,53,54. Studies have shown that the A + T-rich region harbors sufficient polymorphisms to be a suitable marker for studying population genetics and phylogenetic reconstruction of closely related taxa55,56.

Figure 5
figure 5

Predicted structural elements in the control region of L. chinensis. The genes flanking the control region, 12 S RNA and tRNAIle(I), are represented in gray boxes; the red-shaded rectangles indicate A + T-rich regions; the purple/green box indicates conserved poly A/T structures; yellow boxes indicate conserved sequence blocks with other leafminer species; and blue box indicates (TA) n stretches by using the Tandem Repeats.

Phylogenetic analysis

We performed phylogenetic analysis of mtDNA using the nucleotide sequences of 13 PCGs in six Agromyzid mitochondrial genome sequences; D. melanogaster served as an outgroup. The topology of two phylogenetic trees constructed separately by maximum likelihood (ML) and Bayesian inference (BI) analyses were very similar. L. sativae grouped with L. trifolii, while L. huidobrensis and L. bryoniae were in another group, and L. chinensis was situated between the other Liriomyza spp. and Chromatomyia (Fig. 6). Phylogenetic analyses indicated that L. trifolii, L. sativae, L. huidobrensis and L. bryoniae are closely related; however, it was difficult to determine which Liriomyza spp. was most closely related to L. chinensis (Fig. 6).

Figure 6
figure 6

Inferred phylogenetic relationships among Agromyzidae based on nucleotide sequences of 13 protein-coding genes using Bayesian inference (BI) and maximum likelihood (ML). Numbers at each node indicate bootstrap support; percentages of ML bootstrap support values (first value) and Bayesian posterior probabilities (second value), respectively. D. melanogaster was used as an outgroup59. The scale bar indicates the number of substitutions per site.

Interspecific divergence spanned from 8.5% (L. sativae and L. trifolii) to 20.8% (C. horticola and L. chinensis) (Table 4). The genetic distance between DNA sequences is an important characteristic for classification and identification57, and a 2% genetic distance was previously proposed as the threshold between species58. In this study, the genetic distance far exceeded 2%, which is consistent with the classification of the six Agromyzids as distinct species. The genetic distance among Liriomyza spp. was close to each other, while the genetic distance between C. horticola and other Liriomyza species was far away. In general, the pattern of genetic distance was consistent with the phylogenetic tree. During the long-term evolution progresses, the highly invasive polyphagous species (such as L. trifolii, L. sativae, L. huidobrensis and L. bryoniae) have similar host niches and environmental stress, so the genetic distance was close to each other. The convergence of environmental variation and ecological factors can influence speciation59; however, the genetic distance between L. chinensis and other Liriomyza species was relatively far, which probably resulted from the substantial differences in food preferences of L. chinensis60.

Table 4 Interspecies average divergence of Agromyzidae based on the Kimura-2-parameters model.

Based on our data, the genus Liriomyza has relatively conserved mtDNA genomes and phylogenetic relationships, which conform the assignment of L. chinensis to the genus Liriomyza and provide a useful supplement to traditional taxonomic classification.

Materials and Methods

Sample and DNA extraction

Specimens of L. chinensis were collected from onions at Laiwu (36.12°N, 117.04°E) in Shangdong, China. All specimens were preserved in 100% ethanol and stored at -20 °C until DNA extraction was performed. Genomic DNA was extracted from samples using AxyPrepTM Multisource Genomic DNA Kit (Axygen, California, USA) and then used for PCR.

PCR amplification and sequencing

The mitochondrial genome of L. chinensis was amplified from extracted genomic DNA using short, overlapping PCR fragments (<1.2 kb). Twenty-five universal primer pairs specific for Diptera mtDNA61 were designed using Primer Premier 5.0 software (Supplementary Table S1). Conditions for PCR amplification were as follows: initial denaturation for 5 min at 94 °C, followed by 35 cycles of denaturation for 1 min at 94 °C, annealing for 1 min at 45–55 °C, elongation for 1.5 min at 72 °C, and a final extension step of 72 °C for 10 min. These PCR products were analyzed by 1.0% agarose gel electrophoresis and purified with an Axygen DNA Gel Extraction Kit (Axygen Biotechnology, Hangzhou, China). All amplified products were sequenced in both directions. If the sequenced result was bimodal, fragments were cloned into pGEM-T easy and re-was sequenced after cloning.

Sequence Assembly, Annotation and Analysis

Protein-coding genes (PCGs) and rRNA genes in the L. chinensis mtDNA were identified by comparative analysis with other Agromyzidae family members. PCGs were aligned using Clustal X version 2.062 and the boundaries of individual genes were confirmed with ORF finder (https://www.ncbi.nlm.nih.gov/orffinder/). The mitogenomic map was depicted with CG View Server (http://stothard.afns.ualberta.ca/cgview_server/), and PCG nucleotide sequences (lacking start and termination codons) were translated using MEGA v. 6.063. Both the A + T content and codon usage were calculated using MEGA v. 6.0. Skew analysis was carried out with formulas AT-skew = [A−T]/[A + T] and GC-skew = [G−C]/[G + C]64. The software package DnaSP v. 5.1065 was used to calculate synonymous substitution (Ks) and nonsynonymous substitution rates (Ka). Most tRNAs were recognized by tRNAscan-SE v. 1.21 (http://lowelab.ucsc.edu/tRNAscan-SE/), and tRNAs that could not be identified using tRNAscan-SE were confirmed by sequence comparison with other Dipteran insects66. The tandem repeats in the putative control region were analyzed with Tandem Repeats Finder (http://tandem.bu.edu/trf/trf.advanced.submit.html).

Phylogenetic Analysis

Phylogenetic analyses were based on nucleotide sequence of 13 PCGs derived from L. chinensis and five other Agromyzidae - L. sativae, L. trifolii, L. bryoniae, L. huidobrensis, and C. horticola mitogenomes available from GenBank (GenBank accession nos. HQ333260.1, GU327644.1, JN570504.1, JN570505.1 and KR047789, respectively). The mitogenome of Drosophila melanogaster (U37541.1) was used as the outgroup26. The nucleotide sequences of the 13 PCGs were initially aligned with Clustal X, translated into amino acids using default settings, and then analyzed with MEGA v. 6.0. Alignments of individual genes were concatenated using default settings, and the stop codon was excluded. Phylogenetic analysis was conducted using maximum likelihood (ML) and Bayesian inference (BI), which were conducted with MEGA v. 6.0 and MrBayes v. 3.1.267. The ML method was used to infer phylogenetic trees with 1000 bootstrap replicates. BI analyses were conducted under the following conditions: 1,000,000 generations, four chains (one cold chain and three hot chains) and a burn-in step for the first 10,000 generations. The confidence values of the BI tree were expressed as Bayesian posterior probabilities in percentages. Simultaneously, interspecific genetic divergence was calculated by MEGA v. 6.0 using the Kimmura-2-parameter model68.

Accession codes

Sequence data used in this study was deposited in GenBank (accession number MG252777).