Introduction

Tobacco bushy top disease (TBTD) is one of the important tobacco diseases and is caused by multiple causal agents. TBTD was first reported in Zimbabwe, based on its disease symptoms and vector transmission way1. It was reported that the TBTD in China was caused by tobacco bushy top virus (TBTV)2, tobacco vein distorting virus (TVDV)3, tobacco bushy top virus satellite RNA (TBTVsatRNA)4 and tobacco vein distorting virus associated RNA (TVDVaRNA)5. The occurrence of TBTD in Yunnan Province, China, was first recorded in 19936. Later, TBTD was found in multiple tobacco growing regions such as Chuxiong, Baoshan and Dali in Yunnan Province7. The overall incidences of naturally infected tobaco plants were 91%, 32.5%, 35.7%, 100%, 42.5% and 100% in 1993 to 1998, respectively. The total infected crop was approximately 51,300 ha and harvest losses were estimated to exceed US$ 33 million till 20017. The devastating disease has become a limiting factor in the production of tobacco in the province.

In 2002, TBTV and TVDV were found in diseased tobacco plants and aphid vectors through RT-PCR using degenerate primers for umbraviruses or poleroviruses8. In other two studies, TBTVsatRNA and TVDVaRNA were identified in tobacco plants with TBTD symptoms in China through double stranded RNA (dsRNA) analysis and sequencing4,5. Tobacco bushy top virus is a member in the genus Umbravirus, family Tombusviridae and Tobacco vein distorting virus is a member in the genus Polerovirus, family Solemoviridae9. In a recent study, TBTD was found in several tobacco species in Ethiopia and these diseased plants were infected with Ethiopian tobacco bushy top virus, a new member of genus Umbravirus, Potato leafroll virus, a member in the genus Polerovirus, family Solemoviridae, and an Ethiopian satellite RNA (ETBTVsatRNA)10. To date, how TBTV, TVDV, TVDVaRNA and TBTVsatRNA co-infect tobacco and other natural plants in fields and whether TBTD plants have other unidentified viruses are remain unknown.

Umbraviruses (members in genus Umbravirus) are a group of positive-sense single-stranded RNA viruses. Umbravirus RNA does not have a 5′ cap and a 3′ poly(A) tail, and does not encode a capsid protein (CP)9, and thus doesn’t form typical virion structure. For successful aphid transmission, umbraviruses need the presence of their helper viruses, mostly a polerovirus or a enamovirus. For instance, aphid transmission of groundnut rosette virus (GRV) depends on the presence of groundnut rosette assistor virus (GRAV, polerovirus) and groundnut rosette virus satellite RNA (GRVsatRNA)11. Aphid transmission of pea enation mosaic virus 2 (PEMV-2) requires the presence of pea enation mosaic virus 1 (PEMV-1, enamovirus)12,13.

Poleroviruses (members in genus Polerovirus) are also positive-sense single-stranded RNA viruses. The genome size of poleroviruses range from 5.6 to 6.0 kilobases (kb). Polerovirus RNA does not have a Poly(A) tail and a tRNA-like structure at its 3′ end. Polerovirus, Polemovirus, Sobemovirus and Enamovirus are the four genera in the family Solemoviridae, there were 5 (Enamovirus), 20 (Sobemovirus), 1 (Polemovirus), 26 (Polerovirus) species in these four genera. Polerovirus genomic RNA contains 7 ORFs. Three ORFs (e.g., ORF0, ORF1 and ORF1-ORF2) are translated from the 5′ half of the genomic RNA while ORF3a, ORF3, ORF4 and ORF3-ORF5 are expressed from the subgenomic RNA transcribed from the 3′ half of the genomic RNA. Most poleroviruses have limited natural host range, mainly in family Solanaceae, and some in family Amaranthaceae and family Cruciferae14. While some poleroviruses have a broad host range, for example, beet western yellows virus (BWYV) infects more than 150 plant species in over 20 families9.

A recent study showed that many plants without typical TBTD symptoms were also infected with one to three TBTD causal agents, while the tobacco plants infected with all the four causal agents did15, indicating that the development of TBTD symptoms in plants is a complicated process and is associated with all the four reported viruses. Earlier studies on the TBTD in China were focused mainly on tobacco host plant and in Yunnan Province. In recent years, one or more TBTD causal agents have been identified in other plant species and in other regions of China16,17,18. To further investigate the TBTD causal agents in China, we performed High throughput sequencing from two tobacco plants showing typical TBTD symptoms. Result of this analysis showed that, in addition to the four known agents, these two plants were also infected two new poleroviruses: tobacco polerovirus 1 (TPV1) and tobacco polerovirus 2 (TPV2). We then tested 1713 samples from 29 plant species in 11 provinces/autonomous regions in China by RT-PCR using primers specific for TBTV, TVDV, TVDVaRNA or TBTVsatRNA. The results presented in this paper should allow us to better understand TBTD, and the potential risk of TBTD outbreak in many crop species. These knowledge should benefit the development of an effective management strategy for this diseases.

Materials and methods

Field samples collection

Two tobacco plants (referred to as YBSh and YKMPL) showing TBTD symptoms were collected from two different tobacco planting fields in Baoshan and Kunming Cities, Yunnan Province, China in 2015 and 2016 respectively, and were stored in insect-proof greenhouse. To survey the occurrence of TBTV, TVDV, TBTVsatRNA and TVDVaRNA in fields, 1550 virus-like leaf samples were collected from plants belonging to 29 species in 13 families in Yunnan Province from 2013 to 2018. And 65 pepper plants with virus-like symptoms were collected from Guizhou, Liaoning, Henan, Hainan, Shandong, Zhejiang, Hubei Province and Tibet, Inner Mongolia Autonomous Region, China, and 83 tomato plants with virus-like symptoms were collected from Liaoning, Henan, Hainan, Shandong, Shaanxi, Zhejiang Province, and Tibet, Inner Mongolia Autonomous Region. In addition, 11 crofton weed leaf samples were also collected from Guizhou, 1 purple perilla and 3 dahlia were collected from Liaoning (Supplementary Table 1). The voucher IDs form of plant that are not cultivated on commercial scale was shown in Table 1.

Table 1 Sample ID of Crofton weed, Purple Perilla & Dahlia.

High throughput sequencing and data analyses

To verify the viruses infecting the two TBTD-symptoms tobacco field samples, the YBSh and YKMPL leaf samples were collected and then quick-frozen by liquid nitrogen and stored at − 80 °C tentatively. The two samples were sent to Biomarker Technologies (Beijing, China) for High throughput sequencing (HTS) RNA-Seq sequencing after depletion of the rRNAs with Epicentre Ribo-ZeroTM kit, which was then sequenced using the Illumina HiSeq X-ten platform with PE150 bp (Illumina, San Diego, CA, USA). Sequence data were analyzed using CLC Genomic Workbench 9.5 (QIAGEN, Hilden, Germany) as described19. Reads without sequence similarity and not mapping to the reference tobacco genome were assembled de novo by Trinity program. The generated contigs were used as queries for BLAST searches; contigs that were not identified as sequences already included in the databases were sorted out as candidate genomic fragments of the novel virus.

Full genome amplification and sequencing of the viruses in samples YBSh and YKMPL

The sequence gaps between the aligned contigs were filled by RT-PCR using virus-specific primers. The 5′- and 3′-end sequences of TBTV, TVDV, TBTVsatRNA and TVDVaRNA were determined by the rapid amplification of cDNA ends (RACE) technique using SMARTer RACE 5′/3′ Kit (Clontech, USA).

The genomic sequences of the viruses were assembled using the DNASTAR 7.0 package (DNASTAR Inc., Madison, WI, USA), and then submitted to the GenBank database in NCBI. To characterize the two newly identified viruses, ORF finder software (https://www.ncbi.nlm.nih.gov/orffinder/) was used to predict their ORFs. Pairwise comparisons were performed using the EMBOSS Needle Pairwise Sequence Alignment software available at the http://www.ebi.ac.uk/Tools/psa/emboss_needle/nucleotide.html. Phylogenetic relationship between the two new viruses and the other known poleroviruses was determined by the MEGA 5.0 software. The sequences were all linearized at the start of the RdRp gene and then aligned using MEGA 5.0. The alignments were used to infer Neighbor joining trees in MEGA 5.0 with P-distance model and 1000 bootstrap replicates as described20.

Viruses detection and sequence confirmation of the two new poleroviruses in the field tobacco samples

To detect the two newly identified poleroviruses in the field-collected samples, total RNAs were extracted from 817 tobacco leaf samples using the TRIpure Reagent (Bioteke, Beijing, China) for reverse transcription-polymerase chain reaction (RT-PCR). RT-PCR reactions were performed using specific primers based on the two new poleroviruses sequences (Supplementary Table 2) and PrimeScript™ One-Step RT-PCR Kit Ver. 2 (TaKaRa Biotechnology, Dalian, China) as instructed. Positive RT-PCR products were gel purified and cloned individually into the pMD19-T vector (TaKaRa). The resulting plasmid DNAs were sequenced by BGI (BGI, Guangzhou, China) and the resulting viral sequences were assembled using the DNASTAR 7.0 package (DNASTAR Inc., Madison, WI, USA).

Detection of TBTV, TVDV, TBTVsatRNA and TVDVaRNA infections in the field-collected samples

To determine the occurrence of TBTV, TVDV, TBTVsatRNA and TVDVaRNA in the field samples, virus specific primers (Supplementary Table 2) were used and the four viruses were simultaneously detected in the field-collected samples through multiplex RT-PCR as previously described15.

Identification and preservation of plant samples

The species identification was carried out with the help of Dr. Yunheng Ji, (Kunming Institute of Botany, Chinese Academy of Sciences). All samples were stored in the Virus Laboratory (State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University).

Ethical guidelines

All the protocols involving plant adhered to relevant ethical guidelines.

Results

Symptomology of TBTD and viruses detected in the two TBTD affected tobacco plants by HTS

The most frequently observed TBTD symptoms on flue-cured tobacco (Nicotiana tobacum) in field include small leaves, irregular necrotic lesions on leaves, yellowing or chlorosis, internode shortening and stunting (Fig. 1). Diseased tobacco plants became chlorosis, significantly stunted and failed to flower when infected at an early stage (Fig. 1A, B), the fully infected plants were thus unmarketable. While late infections developed lateral branches proliferation, small leaves, foliar yellowing or chlorosis, stunting and without impaired flowering (Fig. 1C, D), only the lower and uninfected leaves were marketable from these plants.

Figure 1
figure 1

Symptoms of TBTD-affected flue-cured tobacco cultivar K326 in field. Symptoms of an early infection on tobacco plant including chlorosis, internode shortening and serious stunting (A). Early infected plant (indicated by the red arrow) and healthy looking tobacco plants (B). Late infection symptoms on tobacco plants showing lateral branches proliferation, small leaves, foliar yellowing or chlorosis, stunting (C). Late infected plant (left) and healthy looking tobacco plant (right) (D).

Two TBTD symptoms tobacco plants YBSh and YKMPL were collected for HTS RNA-Seq sequencing to further verify the viruses infecting the TBTD-symptoms field tobacco plants. A total of 36,744,321 and 33,356,805 clean RNA reads were obtained from the YBSh and YKMPL samples through HTS after removing the failed reads, respectively. These clean reads were assembled using the Illumina HiSeq X-ten platform with PE150 bp and the CLC Genomic Workbench 9.5 (QIAGEN, Beijing, China) as described19. In which 56,964 contigs for YBSh and 52,916 contigs for YKMPL larger than 200 bp were assembly generated by de novo. A total of 122,028 reads were associated with TBTVsatRNA, 22,370 reads were associated with TVDVaRNA, 1775 reads were associated with TBTV, 15,430 reads were associated with TVDV, 6397 reads were associated with TPV1 and 8594 reads were associated with TPV2 in sample YBSh. A total of 116,779 reads were related with TBTVsatRNA, 1124 reads were related with TVDVaRNA, 14,388 reads were related with TBTV, 2923 reads were related with TVDV, 277 reads were related with TPV1 and 339 reads were related with TPV2 in sample YKMPL. The resulting contigs were subjected to BlastX and BlastN searches against the databases at the NCBI, results revealed that both tobacco plants YBSh and YKMPL infected with six viruses.

The results showed that both the YBSh and YKMPL tobacco samples were co-infected with 6 different viral agents of TBTV, TBTVsatRNA, TVDV, TVDVaRNA and two novel poleroviruses (designated as isolates YBSh and YKMPL, respectively). Based on the results of sequence alignment, we tentatively named these two new poleroviruses as tobacco polerovirus 1 (TPV1) and tobacco polerovirus 2 (TPV2). To validate the reliability of HTS results, the total RNA samples isolated from the YBSh, YKMPL samples as well as healthy tobacco sample were analyzed by RT-PCR. TPV1 and TPV2 detection primers were designed according to the virus contigs identified through HTS, while the primers and multiplex one-step RT-PCR used to detect TBTV, TBTVsatRNA, TVDV, TVDVaRNA were described previously15. The results showed that the PCR products representing the six viruses were indeed present in the YBSh and YKMPL samples, but not in the sample from healthy plant (Figure S1A–C).

Characterization of the TBTV, TVDV, TVDVaRNA and TBTVsatRNA of YBSh and YKMPL isolates

To obtain the full-length genome sequences of TBTV, TVDV, TVDVaRNA and TBTVsatRNA of YBSh and YKMPL isolates, overlapping amplicons cloning strategy was used in a series of sequential RT-PCR with virus specific primers designed according to the virus sequences from HTS. The primers, amplification strategies (position in the virus genomes), size of the amplicons, and specify chemistry for sequencing were add to the supplementary materials (Fig. S2; Table S3). At least three clones from each amplicon were sequenced on both strands using M13 forward and reverse primers as well as specific sequencing primers if necessary. Results showed that the full-length genome sequences of the two TBTV isolates both were determined to be 4152 nucleotides (GenBank accession number: TBTV-YBSh, MW579556; TBTV-YKMPL, MW579557). Pairwise comparison of the complete nucleotide sequences of different TBTV isolates showed that TBTV-YBSh shared 97.0% nt sequence identity with TBTV-YKMPL. TBTV-YBSh and TBTV-YKMPL shared 94.7% (TBTV-MD-II, KM067277) to 98.7% (TBTV-MD-I, KM016225) and 85.6% (TBTV-YDHo, KX216406) to 98.8% (TBTV-MD-I) nt sequence identity with other TBTV isolates available in GenBank. The full-length genomic sequences of the two TVDV isolates were determined to be 5920 nt (GenBank accession number: TVDV-YBSh, MW579560; TVDV-YKMPL, MW579561). Pairwise comparison of the complete nucleotide sequences of different TVDV isolates showed that TVDV-YBSh shared 99.5% nt sequence identity with TVDV-YKMPL. TVDV-BSh and TVDV-KMPL shared 97.6% and 97.4% nt sequence identity with TVDV (EF529624). The full-length TVDVaRNA-YBSh and TVDVaRNA-YKMPL sequences were detemined to be 2971 nts (GenBank accession number: TVDVaRNA-YBSh, MW579562; TVDVaRNA-YKMPL, MW579563). Pairwise comparison of the complete nucleotide sequences of different TVDVaRNA isolates showed that TVDVaRNA-YBSh shared 98.3% nt sequence identity with TVDVaRNA-YKMPL. TVDVaRNA-YBSh and TVDVaRNA-YKMPL share 95.8% and 94.2% nt sequence identity with TVDVaRNA (EF529625). The full-length TBTVsatRNA-YBSh and TBTVsatRNA-YKMPL sequences has determined to be 824 nts (GenBank accession number: TBTVsatRNA-YBSh, MW579558; TBTVsatRNA-YKMPL, MW579559). Pairwise comparison of the complete nucleotide sequences of different TBTVsatRNA isolates showed that TBTVsatRNA-YBSh shared 90.8% nt sequence identity with TBTVsatRNA-YKMPL. TBTVsatRNA-YBSh and TBTVsatRNA-YKMPL shared 98.5% (TBTVsatRNA Longling, KU997687) and 89.0% (TBTVsatRNA YN4, AM238656) nt sequence identity with other TBTVsatRNA isolates available in GenBank. These results indicate that these four viruses are identical to the TBTD causal agents reported previously.

Sequence analysis and genome organization of TPV1 and TPV2

Two new poleroviruses, TPV1 and TPV2, were found both in the YBSh and YKMPL field samples through HTS. The nearly full-length genome sequences of isolates TPV1-YBSh (GenBank accession number: MW579552), TPV1-YKMPL (GenBank accession number: MW579553) and TPV2-YBSh (GenBank accession number: MW579554), TPV2-YKMPL, (GenBank accession number: MW579555) were confirmed to be 5722nt, 5725nt and 5907nt, 5912nt, respectively, by series of sequential RT-PCR and SMARTer®RACE 5′/3′ kit (Clontech Laboratories. lnc, USA) with virus specific primers based on the HTS data followed by Sanger sequencing. Pairwise comparison results of the nearly complete sequences showed that TPV1-YBSh shared 99.7% nt identity with TPV1-YKMPL, and TPV2-YBSh shared 98.9% nt identity with TPV2-YKMPL, respectively. The genomic sequences of TPV1-YBSh and TPV2-YBSh, therefore, were used in the subsequent sequence analysis. The genomic nucleotide sequence identity between TPV1 and TPV2 is 54.2%, suggesting they are two distinct species. Blast search results indicated that TPV1 and TPV2 had the highest nt sequence identity with the known poleroviruses. The genome structures of TPV1 and TPV2 were predicted using the ORFfinder software (https://www.ncbi.nlm.nih.gov/orffinder). The genomic organization and structure of TPV1 and TPV2 is typical of poleroviruses when comparing with PLRV, and both TPV1 and TPV2 contain seven ORFs: ORF0, ORF1, ORF1-ORF2, ORF3a, ORF3, ORF4 and ORF3-ORF5 (Fig. 2). The ORF0 of TPV1 has 744 nts and encodes a 28 kDa P0 protein. The ORF1 has 1911 nts and encodes a 69.4 kDa P1 protein, and the ORF1-ORF2 frame has 3179 nts and encodes a 118 kDa P1–P2 protein through an -1 frameshift translation strategy. The intergenic region (IR) between ORF2 and ORF3a is 81 nts. The ORF3a has an ATA initiation codon and its 3′-terminal 40 nts overlaps with the 5′-termial of ORF3. The ORF3a encodes a 5.1 kDa P3a protein. The ORF3 has 609 nts and encodes a 22.4 kDa P3 protein. The ORF4 has 552 nts, overlapping completely with ORF3, and encodes a 17 kDa P4 protein. The ORF5 has no initiation codon for independent translation. The ORF3 and ORF5 are predicted to form an ORF3-ORF5 frame encoding a 74 kDa read-through protein (RTP)  through a read-through translation strategy. The 3′ end untranslated region (UTR) of TPV1 has 145 nts. Analysis of the nearly full-length genome sequence of TPV2 showed that in contrast to TPV1, TPV2 ORF0 has 765 nts and encodes a 28 kDa P0 protein. The ORF1 contains 1869 nts and encodes a 68.5 kDa P1 protein and the ORF1-ORF2 frame has 3323 nts and encodes a 115.8 kDa P1–P2 frameshift translated protein. The IR reigon between the ORF2 and ORF3a has 83 nts. The 3′-terminal 20 nts of ORF3a overlaps with the 5′-terminal of ORF3. The ORF3a contains 138 nts and encodes a 5.1 kDa P3a protein. The ORF3 contains 621 nts and encodes a 22.8 kDa P3 protein. The ORF4 contains 471 nts and encodes a 20.4 kDa P4 protein. The ORF3-ORF5 frame encodes an 80.3 kDa read-through protein, and the 3′-terminal UTR of TPV2 has 200 nts (Fig. 2).

Figure 2
figure 2

Genome structures of TPV1, TPV2 and PLRV.

Sequence comparison of TPV1 and TPV2 with other poleroviruses

To further characterize TPV1 and TPV2, the nearly full-length sequences of TPV1 and TPV2 were compared with the corresponding region of the poleroviruses available in GenBank. The results showed that TPV1 shared the nt sequence identity from 50.4% (sugarcane yellow leaf virus, ScYLV, AF157029) to 79.1% (tobacco virus 2, TV2, KY038943) among the 19 poleroviruses (Table 2). The nucleotide and amino acid sequences of different ORFs were compared for TPV1 with 24 selected poleroviruses, the results revealed that TPV1 shared nt sequence identity of 49.2% (TPV2) to 97.3% (TV2, KY038943) in ORF0 with other poleroviruses, 29.6% (maize yellow dwarf virus-RMV, MYDV-RMV, KC921392) to 96.4% (TV2) in ORF1, 57.4% (suakwa aphid-borne yellows virus, SABYV, JQ700308; beet chlorosis virus, BChV, AF352024) to 97.0% (TV2) in ORF1-ORF2, 56.7% (ScYLV, AF157029) to 95.2% (turnip yellows virus, TuYV, NC_003743) in ORF3, 50.6% (pepper vein yellows virus 1, PeVYV-1, NC_015050) to 91.3% (TuYV) in ORF4, 50.0% (ScYLV) to 78.1% (beet mild yellowing virus, BMYV, X83110) in ORF3-ORF5; and had aa sequence identity of 16.5% (ScYLV) to 97.6% (TV2) in ORF0 with other poleroviruses, 26.2% (BChV) to 96.1% (TV2) in ORF1, 40.1% (SABYV) to 98.1% (TV2) in ORF1-ORF2, 43.0% (ScYLV) to 94.1% (TuYV) in ORF3, 29.2% (ScYLV) to 88.0% (TuYV) in ORF4, 31.3% (ScYLV) to 77.7% (beet western yellows virus, BWYV, AF473561) in ORF3-ORF5, respectively (Table 3).

Table 2 Nucleotide sequence comparisons using the nearly full-length genomic sequences of TPV1 with that of other poleroviruses.
Table 3 Nucleotide and amino acid sequence comparisons of different ORFs of TPV1 with those of other poleroviruses.

It is worthy to note that the TPV1 ORF5showed the highest differences with that of other poleroviruses. In contrast, the TPV1 ORF3 has the highest nucleotide sequence similarities or the aa sequence identities with that of poleroviruses. The aa sequence identities between the TPV1 P4 or the ORF3-ORF5 readthrough protein and those of other 19 poleroviruses are all less than 90%. The results also showed that TPV1 had the highest nt and aa sequence identity over 96% in ORF0, ORF1 and ORF1-ORF2 with TV2, while had 54.0–65.9% and 38.5–62.6% identity at nt and aa sequence level in ORF3, ORF4 and ORF3-ORF5 with TV2, respectively. With the exception of TV2, TPV1 shared the highest aa sequence identity of 54.7% (ORF0), 67.0% (ORF1), 71.7% (ORF1-ORF2), 94.1% (ORF3), 88% (ORF4) and 77.7% (ORF3-ORF5) with other poleroviruses. In Table 3 we can see that there have high identities between TPV1 and TV2 in 5′ proximal ORFs, TPV1 and TuYV in 3′ proximal ORFs, It is speculated that there may be recombination events in TPV1, TV2 and TuYV. The values are under the current species demarcation criteria for the Solemoviridae9, indicating that TPV1 should be a novel species in genus Polerovirus.

TPV2 shared the nt sequence identity from 50.2% (SABYV, JQ700308) to 70.4% (TVDV, EF529624) among the 19 poleroviruses (Table 4). The nucleotide and amino acid sequences of different ORFs were compared for TPV1 with 23 selected polervirouses, the results revealed that TPV1 shared nt sequence identity of 43.3% (ScYLV, AF157029) to 64.1% (PYLCV, HM439608) in ORF0 with other poleroviruses, 41.9% (PLRV, NC_001747) to 67.6% (PYLCV, HM439608) in ORF1, 50.1% (SABYV, JQ700308 and CYDV-RPV, L25299) to 71.0% (PeVYV-1, NC_015050) in ORF1-ORF2, 55.7% (ScYLV, AF157029) to 92.3% (PeVYV-1, NC_015050) in ORF3, 50.5% (CtLRV, AY695933) to 93.4% (PeVYV-1, NC_015050) in ORF4, 50.0% (MYDV-RMV, KC921392) to 72.0% (TVDV, EF529624) in ORF3-ORF5, respectively (Table 5). In Table 5 we can see that there have high identities between TPV2 and PeVYV-1 in 3′ proximal ORFs. It is speculated that there may be recombination events in TPV2 and PeVYV-1. The values are also under the current species demarcation criteria for the Solemoviridae9, suggesting that TPV2 could be a distinct member in genus Polerovirus.

Table 4 Nucleotide sequence comparisons using the near full genomic sequence of TPV2 with that of other poleroviruses.
Table 5 Nucleotide and amino acid sequence comparisons of different ORFs of TPV2 with those of other poleroviruses.

To determine the phylogeny among TPV1, TPV2 and thirty representative viruses in the 4 genera of family Solemoviridae, phylogenetic tree was constructed with their RdRp nt sequences by MEGA 5.0 program. The results showed that TPV1 and TPV2 can be clustered with 56 members of family Solemoviridae (Fig. 3). TPV1 is closely related to TV2 and potato leafroll virus (PLRV), while TPV2 is closely related to TVDV, PeVYV-1 and PeVYV-2. The phylogenetic analysis results further revealed that TPV1 and TPV2 are two distinct new poleroviruses.

Figure 3
figure 3

Neighbor joining tree (NJ) of the RdRp nt sequences of TPV1, TPV2 as well as those of other viruses in the family Solemoviridae. The phylogenetic trees are based on alignments of the nt sequences. The sequences were aligned with Clustal W and NJ trees constructed with MEGA 5.0. The scale bar indicates the genetic distance. *Two novel poleroviruses.

Survey of the TPV1 and TPV2 infections in field tobacco plants

To verify the occurrence of TPV1 and TPV2 in field, 244 leaf samples were randomly selected from 817 virus-like tobacco fields samples collected in 2013 to 2018 in Yunnan Province, and tested for TPV1 and TPV2 infections through RT-PCR. The results showed that 8 samples were single infected with TPV1 (detection rate of 3.28%) and 59 samples were single infected with TPV2 (24.18%) (Table 6). In addition, 13 samples were infected with both TPV1 and TPV2 (5.33%). The average detection rate of TPV 1 or TPV2 were up to 32.79%, suggesting that TPV1 and TPV2 were common on tobacco these years.

Table 6 RT-PCR detection results for TPV1 and TPV2 from 244 field tobacco samples.

Then 33 TPV1, TPV2, or TPV1 + TPV2 infecting samples were selected and tested for TBTV, TVDV, TBTVsatRNA and TVDVaRNA infections by RT-PCR with virus specific primers. The results showed that TPV1 and TPV2 always co-infected field plants with two to four TBTD casual viruses (Table 7). For example, five samples were co-infected with all six viruses, and 11 samples were co-infected with five different viruses. No single TPV1 or TPV2 infection was detected, and TPV1 or TPV2 always co-infected with both TVDV and TVDVaRNA. It’s speculated that TPV1, TPV2 may have a synergistic relationship with the causal agents of TBTD, and the interactions among TPV1, TPV2 and the causal agents of TBTD is also worthy for further study.

Table 7 RT-PCR detection of TPV1, TPV2 and the four TBTD causal viruses in 33 field collected tobacco samples.

Survey of the TBTV, TVDV, TBTVsatRNA and TVDVaRNA infections in different field plants

To survey the occurrence of TBTV, TVDV, TBTVsatRNA and TVDVaRNA in field plants, 817 tobacco leaf samples and 733 leaf samples from plants belonging to 29 plant species were collected in Yunnan Province from 2013 to 2018 (Supplementary Table 1). In addition, 65 pepper leaf samples from nine provinces/autonomous regions, and 83 tomato leaf samples from eight provinces/autonomous regions of China were also collected. All the sampled plants showed virus-like symptoms. Eleven crofton weed leaf samples were also collected from Guizhou, 1 purple perilla and 3 dahlia were collected from Liaoning. These collected samples were then tested for TBTV, TVDV, TBTVsatRNA and TVDVaRNA infections through RT-PCR with virus specific primers as described by Liu et al. in 2014. The results showed that 22 plant species in 12 families of Asteraceae (Crofton weed, Dahlia and Sticktight), Solanaceae (Black Nightshade, Potato, Pepper, Tomato, Tobacco), Fabaceae (Broad bean, Pea, Kidney bean), Brassicaceae (Brassica pekinensis, Radish, Oilseed rape), Cueurbitaceae (Pumpkin), Caricaceae (Papaya), Poaceae (Wheat), Araceae (Amorphophallus konjac), Araliaceae (Sanqi), Dioscoreaceae (Yam), Liliaceae (Garlic) and Amaranthaceae (Alligator weed) were infected with at least one of the four assayed viruses (Table 8). Among the virus infection 12 families, family Fabaceae, Brassicaceae, Cueurbitaceae, Caricaceae, Poaceae, Araceae, Araliaceae, Dioscoreaceae, Liliaceae and Amaranthaceae had not been reported as the hosts of TBTV, TVDV, TBTVsatRNA and TVDVaRNA. In this study, sticktight, broad bean, pea, oilseed rape, pumpkin, tomato, crofton weed and black nightshade plants were firstly found to be infected with all four assayed viruses of TBTV, TVDV, TBTVsatRNA and TVDVaRNA (Table 8). The presence of the causal agents of TBTD were firstly confirmed in Guizhou, Hainan, Henan, Liaoning, Inner Mongolia and Tibet besides Yunnan. These results suggest that the causal agents of TBTD were widely distributed in China and have spread to a broader plant hosts.

Table 8 Detection of the four TBTD causal viruses in 29 species of plants collected from 11 different provinces or autonomous regions in China.

In this study, 663 out of the 1713 tested leaf samples were found to be infected with TBTV, TVDV, TBTVsatRNA and/or TVDVaRNA, with the average detection rate of 38.70% (Table 9). The result also showed that, among the four assayed viruses, the infection rate of TVDV was the highest (37.5%) while the infection rate of TBTVsatRNA was the lowest (8.1%) (Fig. 4). Six hundred and sixty-three samples were detected at least one causal agents of TBTD. It was found that the combination of the two to four causal agents of TBTD in the field was commonly. Meanwhile, there are 364 samples co-infected with two causal agents of TBTD (TBTV + TVDV, TBTV + TBTVsatRNA or TVDV + TVDVaRNA) (accounting for 54.9% of these 663 TBTD diseased samples) . Fivty-one samples co-infected with 3 causal agents of TBTD combined with TBTV + TVDV+TVDVaRNA, TBTV+TVDV + TBTVsatRNA (accounting for 7.7% of these 663 TBTD diseased samples), 86 samples co-infected with all four causal agents of TBTD (accounting for 13.0% of these 663 TBTD diseased samples). The causal agents of TBTD were mainly in 2 agents combination in the field, more common for 3 or 4 agents co-infections. TVDV was found in most of the pathogen combinations, which indicated that TVDV played an important role in the occurrence of TBTD in the field. In this study, 156 samples were single infected with TVDV (9.11%) and 6 samples were single infected with TBTV (0.35%), whereas no single TBTVsatRNA or TVDVaRNA infection was detected. TVDV was found in most of the virus combinations indicating that TVDV plays an very important role in the occurrence of TBTD in the field. There were 21 samples infected by TBTV or TBTVsatRNA but ansence TVDV, which declared that there may be other viral agents that can asist TBTV and TBTVsatRNA complete vector transmission.

Table 9 Detection of the four TBTD causal viruses in field collected samples.
Figure 4
figure 4

Detection rate of TBTV, TVDV, TBTVsatRNA and TVDVaRNA in field collected samples.

Discussion

Several viral agents have been reported to cause TBTD in some countries. For example, an early study had suggested that TBTD in Zimbabwe was caused by a co-infection of TVDV and TBTV1. Later researches had shown that tobacco plants showing TBTD-like symptoms in the Yunnan Province, China, were infected with TBTV, TVDV, TBTVsatRNA and TVDVaRNA. Through these researches, the full genomes of TBTV, TVDV and TVDVaRNA have been determined2,3,,3,5. In 2014, Abraham and colleagues reported that in Ethiopia, tobacco plants showing TBTD-like symptoms were infected with ETBTV, ETBTVsatRNA and PLRV10. Considering that the complete nucleotide sequence identity between ETBTV and TBTV is only about 56.4%, and the full nucleotide sequence identity between ETBTVsatRNA and TBTVsatRNA is only about 35.6%, both ETBTV and ETBTVsatRNA are now considered as new virus species. The nucleotide sequence identity between TBTV Zimbabwe A2 isolate and ETBTV is about 94.9%, these two viruses are now considered as two different isolates of ETBTV. Whether TBTV and TVDV co-infect tobacco in Zimbabwe and Ethiopia remains unclear. Based on the current knowledge, the TBTD found in China and the TBTD found in Africa are two different diseases, and are caused by different causal viruses. Although six different causal viruses have now been found in TBTD-symptoms plants in China, how these viruses co-infect field crops and if more unidentified virus(es) are associated with this disease in field are still unknown. With the development of HTS and bioinformatics in recent years, discovery of new TBTD causal virus(es) or TBTD associated causal agents becomes possible21,22. In virus infections plants, large numbers of virus-derived siRNAs (vsiRNAs) will be generated along with the viral genomic RNAs, and these vsiRNAs can be identified and assembled into virus contigs or even full-length viral genome23,24,25. This technology can also help us to identify new virus(es) associated with TBTD.

In this study, HTS was used to analyze two tobacco samples showing typical TBTD-like symptoms in two different locations. Based on the assembled sequences, we have determined two new near full-length polerovirus (i.e., TPV1 and TPV2) sequences. Sequence alignment result showed that TPV1 shares the highest nucleotide sequence identity with TV2 (79.1%). The deduced amino acid sequences of the TPV1 P4 protein and the read-through protein (P3-P5) share less than 90% identities with that of viruses in the genus Polerovirus. Sequence alignment result also showed that TPV2 shares the highest nucleotide sequence identity of 70.4% with TVDV. The predicted amino acid sequences of TPV2 proteins, except P3, share less than 90% identities with that of viruses in the family Solemoviridae. Therefor, we conclude that TPV1 and TPV2 are two novel poleroviruses.

Ethiopian tobacco bushy top disease symptoms are similar to that of TBTD in China, and is also caused by several different polerovirus and umbravirus10. Recent study showed that ETBTV can complete its vector transmission assist by cowpea polerovirus 1 (genus Polerovirus) besides PLRV10,26, and the results of our group also proved that TBTV can complete its aphid transmission with the assistance of barley yellow dwarf virus GAV (genus Luteovirus; unpublished data). It can be inferred that there may be other polerovirus could assist TBTV acomplish its aphid transmission. Our survey results also revealed that 0.35% and 0.88% field samples infected TBTV or TBTV + TBTVsatRNA do not coinfected with TVDV (Table 9). That means there could be another polerovirus other than TVDV could help TBTV acomplish its aphid transmission, and TPV1 and/or TPV2 should be a potential aphid transmission help virus for TBTV in nature. The phylogenetic analysis showed that TPV1 is closely related to PLRV and TV2, and TPV2 is closely related to TVDV. Because TPV1 and TPV2 are often co-infected with one or more of the other four TBTD known causal viruses, we speculate that both TPV1 and TPV2 might have important roles in the induction of TBTD-like symptoms and/or in the TBTD disease cycle. However, whether TPV1 and TPV2 are responsible for TBTD in China is remain unknown.

Earlier studies on TBTD were done mainly in tobacco plants in Yunnan Province, China. In recent years, several reports have indicated that TVDV and TBTV can also infect other plant species, including pepper, Dahlia, tomato and crfton weed16,17,18. In this study, we have determined that except tobacco, a total of 21 plant species in 12 families can be infected with at least one of the TBTD causal viruses. In addition, we have found TVDV + TVDVaRNA + TBTV + TBTVsatRNA co-infection in crofton weed in Guizhou Province, tomato plants in Hainan Province, and tobacco, sticktight, broad bean, pea, oilseed rape, pumpkin, tomato as well as black nightshade plants in Yunnan Province, respectively. This new finding indicates that the TBTD causal viruses can infect weeds and many other cash crops, resulting in a large virus reservoir in nature. It is noteworthy that many plants co-infected with these four causal viruses did not show typical TBTD-like symptoms. Taken together, we conclude that the causal agents of TBTD are a group of different viruses, these viruses have much broader host ranges and distributions than what have been reported, and these infected plants can serve as the key overwintering or intermediate hosts for the six causal viruses. These three points should be considered when developing an effective control strategy for TBTD in fields.