Identification and estimation of sequence variation dynamics of Tomato Leaf curl Palampur virus and betasatellite complex infecting a new weed host

Abstract Leaf samples from the symptomatic Solanum nigrum plants showing curling on leaf margins, leaf chlorosis, leaf crumpling and stunting in growth were collected during spring 2018. Subsequently, the standard procedures of total genomic DNA isolation, enrichment via rolling circle amplification, polymerase chain reaction, cloning and complete genome sequencing were performed. Pairwise nucleotide (nt) sequence comparison and phylogenetic studies of the inferred genome sequences confirmed the identity of a bipartite begomovirus Tomato leaf curl Palampur virus (ToLCPalV) in the symptomatic plant samples sharing 97.6 and 95.5% nt identity with ToLCPalV isolates identified from Pakistan and India. In the phylogenetic analysis, ToLCPalV grouped with ToLCPalV isolates previously described from Pakistan and India. Additional attempts to identify any putative DNA-satellites in S. nigrum plant samples revealed an associated betasatellite component, cotton leaf curl Multan betasatellite (CLCuMuB). This is the first identification of cotton leaf curl disease-associated betasatellite with ToLCPalV in this region. The nt diversity analysis showed that ToLCPalV populations have a high mutation rate and evolve independently under strong purifying selection, but not via recombination.


Introduction
Geminiviruses (family Geminiviridae) have ~2.5-5.2 kb small genomes that are replicated through rolling-circle mode of replication. Geminiviruses are a diverse group with more than 520 species categorized into fourteen genera Becurto-, Begomo-, Capula-, Citloda-, Curto-, Eragro-, Grablo-, Maldo-, Mastre-, Mulcrile-, Opun-, Topile-, Topocu-and Turncurto-virus, respectively [1]. To date, ~445 species have been assigned in the exclusively diverse Begomovirus genera with a wide range of dicotyledonous group of host plants. Likewise, most of the other geminiviruses, begomoviruses also rely upon the arthropod insects as vectors for their successful transmission [2]. The Bemisia tabaci whitefly is the exclusive mode of vectoring for begomoviruses to facilitate their plant-to-plant transmission and long-distance spread into the other plant hosts and the adjoining agro-ecological regions. The genomic architecture of begomoviruses is either monopartite (a single component of ~2.8 kb) or bipartite genome with two equally sized components (~2.5-2.7 kb each), respectively denoted as DNA-A and DNA-B [1]. Nevertheless, both types of begomoviruses have circular, single-stranded (css) DNA genomes. The DNA-A of bipartite begomoviruses is a structural and functional homologue of the monopartite begomovirus genome, encoding peculiar five-to-six open reading frames (ORFs) that are exclusively transcribed in virion-sense or complementary-sense orientation and are separated by a bi-directional promoter sequence [3]. The pre-coat protein (Pre-CP) and coat protein (CP) are encoded on the virion-sense strand and correspond to AV1 and AV2 in bipartite begomoviruses, respectively, while the replication-associated protein (Rep), transcription activator protein (TrAP), replication enhancer protein (REn) and C4 protein are encoded on the complementary-sense strand and correspond to AC1 to AC4 genes in bipartite begomoviruses, respectively. The DNA-B is particularly important for local (intracellular) and systemic or long-distance movement (intercellular) through two genes: nuclear shuttle protein (NSP; corresponding to BV1) and movement protein (MP; corresponding to BC1) in their respective opposite orientations. DNA-A and DNA-B are not alike but only share a short stretch of sequences (~200 nucleotides) in the intergenic region (IR) encompassing the origin of replication (ori), and nonanucleotide sequences (TAATATTAC) with T 7 -A 8 cleavage site to commence replication [4]. Two exclusive groups of begomoviruses are Old World (native to Africa, Asia and Europe) and New World begomoviruses (Native to the Americas). Most of the bipartite begomoviruses are widely spread in the New World and are characterised by lacking the AV2/V2 gene. A dominant majority of the begomoviruses with monopartite genome prevails in the Old World, where some begomoviruses with a bipartite genome have also been reported [5].
Meanwhile, the major monopartite begomoviruses found in the Old World have even complex aetiology with association of sub-genomic DNA-satellite components. These sub-genomic components have peculiar genetic architecture with corresponding cssDNA genomes (ca 1350 nt) encoding only a single protein. These are alphasatellites (sub-family Geminialphasatellitinae, family Alphasatellitidae) [6] and betasatellites (genus Betasatellite, family Tolecusatellitidae). Nevertheless, a third sub-genomic component deltasatellites (genus Deltasatellite, family Tolecusatelllitidae) with cssDNA genome is also reported [7]. Monopartite begomoviruses are predominantly reported infecting the same host plants in association with alphasatellites and betasatellites (both or either component). The role of alphasatellites is still debatable, however; some of them are known to exacerbate or intensify in planta symptoms and/or obstruct begomovirus-betasatellite in planta accumulation [8]. Besides, a limited number of alphasatellites can also suppress the host plants' gene silencing mechanism at the post-transcription level. On the other hand, betasatellites are in association with the helper begomovirus genome for their encapsidation, replication and insect-based transmission. In return, these molecules help the begomovirus to establish successful pathogenic interaction with the plant host by overcoming the host plant defense mechanism [9,10]. Unlikely, deltasatellites lack any functional ORF and are known for their reliance on the helper virus components to perform their major functions [7].
Solanum nigrum (black nightshade) is a widely distributed weed in different global agro-ecological regions. It is also traditionally cultivated for its therapeutic properties to cure diarrhoea, eye inflammations, jaundice, ulcer and oxidative stresses [11,12]. Moreover, S. nigrum produces antiviral phytochemicals to inhibit chronic infections from Hepatitis C virus and SARS-CoV-2 [12,13]. Besides, it is a potential host for many viruses including begomoviruses, othotospoviruses, potyviruses and tobraviruses as a wild reservoir source to produce new viral recombinants [11,[14][15][16][17]. Among geminiviruses, tomato leaf curl New Delhi virus (ToLCNDV) [15], tomato yellow leaf curl virus [18], tomato leaf curl Sardinia virus [14,18], nightshade curly top virus [19], solanum leaf curl Lakshmangarh virus [20], tomato severe rugos virus [21] and tomato yellow leaf curl Axarquia virus [14] have been identified. Thus, the presence of S. nigrum in the agro-ecological niches may contribute to enrich the genetic diversity of begomoviruses to produce new epidemics [22]. More importantly, these wild reservoirs can help in spatio-temporal distribution of the viral mutants into the cultivated crop plants through insect vectors, which may result in a positive selection of new better fit viral variants in a particular area.
The present article describes the full-length genomes of ToLCPalV and CLCuMuB, as well as the dynamics of their sequence variation in a weed. In Pakistan, several weeds have been known to host monopartite begomoviruses; however, this is the first identification of ToLCPalV being associated with CLCuMuB from the S. nigrum weed, indicating the spread of virus to a new host in Pakistan.

Collection of samples and initial screening
S. nigrum plants in a field that were showing typical begomoviral-like disease symptoms such as vein thickening, curling and stunting, were collected ( Figure 1). Symptomatic leaves from three plants (OD1, OD2 and OD3) along with asymptomatic plants were collected from Lahore, Pakistan during 2017-2018. FavorPrep Plant Genomic DNA Extraction Mini Kit (FAPGK 001) was used following the provided protocol to extract total genomic DNA from the symptomatic and symptom-free S. nigrum plants. Degenerate universal primers were used for initial detection of any begomovirus genome using AC1048/ AV494 primers [23] through polymerase chain reaction (PCR) as described earlier [24]. The resultant PCR amplicons were purified with GeneJet Gel Extraction kit (ThermoFisher Scientific) and cloned into a pJET/1.2 blunt-end PCR cloning vector available in the CloneJet PCR cloning kit (ThermoFisher Scientific). Sanger sequencing technique was performed at First Base Laboratories Sdn Bhd, Malaysia, to completely sequence the amplified products cloned into the recombinant plasmids.

Cloning and sequencing of rolling circle amplification products
Rolling circle amplification (RCA) was performed to amplify the complete begomovirus genome/es and any associated DNA-satellites. The enriched RCA products were diluted (10x) to be used as templates in PCR to directly amplify the complete DNA-A, DNA-B and DNA-satellites using primers Begomo-F/ Begomo-R, TLCVB-F/TLCVB-R and β01/β02, respectively ( Table 1). The presence or absence of any additional DNA-satellite genomes were also confirmed with the degenerate primers for alphasatellites (DNA101/DNA102) [28]. All PCR amplicons were cloned and subsequently sequenced in their entirety, as described above.

Begomovirus sequence annotations and phylogenetic relationships
Initially, BLASTn analysis was performed on the partial CP sequences to infer the highest nucleotide (nt) identities (https://blast.ncbi.nlm.nih.gov/Blast). The complete begomoviral genomic components obtained from the sequencing were assembled and the complete genomes were analysed further using DNASTAR software (Madison Wisconsin Inc., USA). After primary BLASTn analysis in NCBI, the most identical sequences for each component were separately retrieved for nucleotide (nt) sequence comparison. All the sequences were aligned in MEGA11 software using the Muscle algorithm [29]. The alignments were further used to calculate the pairwise nucleotide sequence identities through MUSCLE in Sequence Demarcation Tool (SDTv1.2) [30]. To determine the evolutionary relatedness of all the genomic components, genetic distances were estimated and phylogenetic trees were inferred in MEGA-11 software using bootstrap method (1000 replicates). Additionally, MEGA11 was employed to infer the best-fit nucleotide  substitution model, pairwise genetic differences at non-synonymous (dN), synonymous (dS) positions and Tajima's neutrality test. The ORF finder tool was used to predict potential ORFs in the ToLCPalV and CLCuMuB genomes (http://www.ncbi.nlm.nih.gov/gorf/ gorf.html).

Estimation of nucleotide diversity, divergence and substitution rate
The nucleotide diversity (π) of the retrieved sequences along with our inferred sequences was determined in DnaSP v.6.12 software [31] as has been described earlier [32]. In all datasets, the statistically significant differences in the mean nucleotide diversity were estimated by computing 95% bootstrap confidence intervals. Various parameters of nucleotide diversity viz. total number of polymorphic sites (S), insertions and deletions (InDels), haplotype diversity (Hd), average number of nucleotide differences between sequences (k), total number of mutations (θ-Eta) and Watterson's estimate of the population mutation rate based on the total number of segregating sites (θ W ).

Recombination analysis and detection of positive and negative selection
To detect potential recombination events in the genomes of ToLCPalV and CLCuMuB, two distinct approaches were used: an online tool called the genetic algorithm for recombination detection (GARD; http://datamonkey.org/) and the recombination detection program (RDP v.5.0) [33]. The criteria demonstrated by [32] was followed to accept any potential recombination event. Potential coding sites evolving either under a negative or positive pressure in the AV2, CP, Rep, TrAP, REn, AC4, NSP, MP of ToLCPalV and βC1 of CLCuMuB were estimated using single-likelihood ancestor counting (SLAC) and Fast, Unconstrained Bayesian AppRoximation (FUBAR) (www.datamonkey.org).

Estimation of nucleotide substitution rate
We estimated the rate of nucleotide substitution/site/ year and mutations in the ORFs encoded by ToLCPalV and CLCuMuB by employing the Bayesian Markov Chain Carlo method (MCMC) in BEAST v.1.8.4 with 10 7 chain length [34].
All datasets were individually analysed using both relaxed and strict molecular clocks (uncorrelated lognormal), respectively. Output of BEAST was visualized in Tracer v 1.7.2 [35] to identify the best-fit clock and their respective effective sample sizes (≥200).

Assessment of sequences and phylogenetic relationships
The three symptomatic S. nigrum plants (OD1-OD3) were successfully shown to contain the predicted begomovirus genome, whereas the non-symptomatic ones remained negative for the presence of any begomovirus genome (Figure 1). The BLASTn analysis of the PCR amplicons generated with the degenerate primer pair (AC1048/AV494) showed that the CP sequences from three plants OD1-OD3 were highly identical to ToLCPalV sequences. To further confirm the identity of the isolated begomovirus species, abutting primer pairs were used for amplifying the complete genomic components DNA-A and DNA-B of ToLCPalV (Table 1). To determine the complete ToLCPalV genome sequence, two full-length clones (OD3-1 and OD3-2) from the plant OD3 were sequenced in their entirety. The full-length sequence of the clone OD3-1 was 2756 nt in length (accession # OM022935), sharing highest nucleotide sequence identity at 97.6% with the ToLCPalV isolates (LT556072, AM884015, HG934859 and Ky564205) identified from Pakistan and India. In the phylogenetic dendrogram, OD3-1 grouped well with other ToLCPalV isolates (bootstrap value 100) (Figure 2A). Meanwhile, the clone OD3-2 was 2737 nt in length and was deposited in the GenBank to obtain an accession number (OM022936). It shared the highest nucleotide sequence identity at 95.5% with KC456162 infecting tomato crop in India [36]. In the phylogenetic dendrogram it grouped well (bootstrap value 99%) with other ToLCPalV isolates reported from Pakistan and India ( Figure 2B). Thus, the isolate OD3-1 is a new isolate of ToLCPalV species identified from a S. nigrum plant in Pakistan. One complete betasatellite clone OD3-3 (1373 nt) was also completely sequenced from OD3 plant and deposited under the accession number OM022937 in the GenBank database. The complete betasatellite isolate shared highest nucleotide sequence identity at 99.8% with CLCuMuB (LN831964) identified from cotton in Pakistan. Moreover, in the phylogenetic dendrogram, the isolate OD3-2 and the CLCuMuB isolate (LN831964) shared the same clade with other CLCuMuB isolates reported from Pakistan and India ( Figure 2C). Notably, alphasatellite could not be amplified from the plant samples.  (Table 2).

Estimation of genetic variations in ToLCPalV and CLCuMuB genomes
Moreover, the average number of segregating sites (Ѳw) was significantly greater in NSP (0.101) and ToLCPalV DNA-B (0.099), compared to ToLCPalV DNA-A and CLCuMuB, revealing the occurrence of more polymorphic sites in these isolates ( Figure 3B). On the other hand, the Rep and AC4 had a significantly higher segregation rate of 0.082 and 0.079, respectively. Whereas βC1 and TrAP, as well as CP and MP, had nearly identical levels of segregating sites ( Figure 3B). The AC4 (-2.28), TrAP (-2.14) and REn (-2.11), all had extremely negative Tajima's D values, followed by βC1 (-2.02) and Rep (-2.11) ( Figure 3C). Likewise, highly negative Fu and Li's D values were inferred for AC4 (-5.52), βC1 (-5.12) and Rep (-4.48), trailed by TrAP and REn ( Figure 3D). For AV2, both D test values were found to be in positive numbers, indicating presence of least polymorphic sites ( Figure 3C and D).
The average nucleotide diversity (π) for ToLCPalV and CLCuMuB revealed that the highest average  pairwise nucleotide diversities were found in ToLCPalV DNA-B (0.103), MP (0.103) and NSP (0.082) ( Figure 3A). For ToLCPalV DNA-A, the virion-sense ORFs AV2 and CP had a higher nucleotide diversity of 0.063 and 0.068, respectively, than the complementary sense ORFs. Captivatingly, the genetic diversity of CLCuMuB (0.049) was almost double than its encoded ORF, βC1, (0.025). Overall, less genetic diversity was observed in CLCuMuB than in ToLCPalV ( Figure 3A).

Estimation of negative selection pressure
The possible role of negative selection pressure on each component was inferred by comparing the dN/ dS ratio for genes encoded by ToLCPalV DNA-A, ToLCPalV DNA-B and CLCuMuB, respectively ( Table 3). The average dN/dS ratio for all genes, except REn and AC4, remained less than 1, implying that most of the observed genomic variation is primarily due to negative selection ( Table 3). The lowest values were observed for MP (0.082) and CP (0.1), while the highest observed value was 1.00 for REn and AC4 (Table 3). Likewise, MP (117/51) and NSP (107/56) had the highest number of sequences with negative selective sites detected by single-likelihood ancestor counting (SLAC) and Fast, Unconstrained Bayesian AppRoximation (FUBAR), followed by CP (62/101). On the other hand, Rep displayed a pattern of 47/86 negatively selected positions. FUBAR also detected a few positive selection sites in AV2 (5), βC1 (4), REn [37] and Rep [25] (Table 3).

Nucleotide substitution rate estimation
The rate of nucleotide substitution in ToLCPalV DNA-A, ToLCPalV DNA-B, CLCuMuB and their encoded ORFs was determined using the different parameters listed in Tables 4-6. The mean nucleotide substitution rate for ToLCPalV DNA-B and its encoded ORFs (MP and NSP) was significantly higher (7.54 × 10 4 ) than those for ToLCPalV DNA-A and CLCuMuB, indicating swift evolution. To further validate the role of mutation in the selection process, we determined the mutation rate at each of the three codon positions (CoP). Our data demonstrated that ToLCPalV DNA-A and CLCuMuB both had a higher mutation rate at CoP 1, while ToLCPalV DNA-B had a higher mutation rate at CoP 2 ( Table 6). Except for AC4, all the ORFs encoded by ToLCPalV DNA-A exhibited a higher mutation rate at CoP 3. MP and NSP of ToLCPalV DNA-B had a higher mutation rate at codon 3 and 2, respectively ( Table 6).  abbreviations used are highest posterior density (hpD), codon position (cp).

Discussion
Phyto-pathogenic viruses directly or indirectly affect agricultural productivity worldwide. Similar to the other Southeast Asian countries, a dominant majority of almost all the taxonomic groups of plant viruses prevails in Pakistan. This can be one of the major reasons for low agricultural productivity in the country.
Begomoviruses are of prime importance in the agro-ecological zones in Pakistan, where these viruses infect important crops, weeds and ornamental plants [38]. S. nigrum is a herbaceous weed widely spread in temperate and tropical climates in Asia, the Americas and Europe. It can be often seen to grow around farmer's fields, wastelands and countryside. Besides, it has vital commercial and pharmaceutical applications in Southeast Asian countries [11,12]. However, it is also known to support the spread of a wide range of crop-infecting begomoviruses and serves as a recombination vessel to produce viral recombinants [11,16]. ToLCPalV is a bipartite begomovirus (genus Begomovirus, family Geminiviridae) found infecting bitter gourd, chili pepper, cucumber, muskmelon, pumpkin, Rumex spp., tomato and watermelon in Pakistan and India [26,37,[39][40][41]. Moreover, it has also been found to infect common beans, melon, cucumber, watermelon and pumpkin in Iran [42,43]. ToLCPalV has also shown association with pepper leaf curl betasatellite in India in the field-grown pumpkin plants [40]. Moreover, it has been shown to synergistically co-infect muskmelon with zucchini yellow mosaic virus, a potyvirus found in Pakistan [26]. Although ToLCNDV is also a widely spread bipartite begomovirus in Pakistan, ToLCPalV is a relatively new introduction in the territory of monopartite begomoviruses [26].
In 2018, S. nigrum plants showing curling on leaf margins, leaf chlorosis, growth stunting and whole leaf crumpling were observed. The complete nucleotide sequence analysis found that the clones OD3-1 and OD3-2 shared 97.6 and 95.5% sequence identities with the DNA-A and DNA-B of ToLCPalV isolates and thus represent new members of ToLCPalV species. Besides, ToLCPalV isolates were also found to be associated with a betasatellite isolate of CLCuMuB, which shared maximum (99.3%) nucleotide sequence identity with CLCuMuB. Despite the fact that Pakistan is a major hotspot for co-circulation of DNA-satellites (41),] could not identify any association between ToLCPalV and the DNA-satellites. Thus, they speculate that ToLCPalV was unable to capture any DNA-satellite in this region. However, our study contradicts this speculation. Previously, among bipartite begomoviruses ToLCNDV was a dominant member in this region, but it seems that ToLCPalV has been widening its host range by satellite capturing and infecting the non-cultivated host species quite often. In previous studies, CLCuMuB has been shown to assist the begomovirus in pathogenicity and suppression of post-transcriptional gene silencing in host plants [44,45]. CLCuMuB isolates have been previously identified from many plant families including Malvaceae, Cucurbitaceae and Solanaceae in Pakistan and India [46,47], and their compatible interactions with geographically distinct begomoviruses have been shown [48]. The consequences of the adaptation of CLCuMuB to another bipartite begomovirus can be threatening to the cultivation of agro-economically important crops in the Indian subcontinent.
DNA viruses adapt and evolve rapidly, primarily due to small genome-associated changes (mutation and nucleotide substitution) and, more frequently, because of genomic recombinations [49][50][51]. Begomoviruses are highly destructive plant pathogens, which evolve rapidly under diverse cropping systems and alternate hosts, especially non-host reservoirs like weeds [52]. As with other phytoviruses, emergence of novel begomoviruses is attributed to recombination, recurrent mutations and satellite captures [52][53][54]. In the present study, we used ToLCPalV and CLCuMuB to determine the extent of existing genomic variations in their genomes. Remarkably, our results demonstrated that ToLCPalV mainly evolves under strong negative pressure, and no events of recombination were detected (data not shown). Surprisingly, the ToLCPalV DNA-B component showed more genetic diversity when compared to ToLCPalV DNA-A and CLCuMuB. In addition, across all the analysed datasets, the average number of segregating sites (Ѳw) and negative Tajima's D values indicated that these populations are diversifying to varying degrees under purifying selection. The observed negative Tajima's D value accounted for the high prevalence of low-frequency alleles [55]. It is critical to note that we included complete genomic sequences from a variety of plant species demonstrating that the existing genomic diversity exists through negative selection. Interestingly, while it is widely believed that DNA-A components play a significant role in begomovirus evolution, our data indicate that DNA-B can also play a critical role and may be assisting the virus in evolving rapidly under strong negative pressure.

Conclusions
The present study identified and analysed a bipartite begomovirus along with a cotton-infecting betasatellite, CLCuMuB, infecting S. nigrum in Pakistan. Nucleotide diversity and sequence variation dynamics analyses revealed that ToLCPalV populations have a high mutation rate and evolve independently under strong purifying selection, but not via recombination. The association of swiftly evolving ToLCPalV with a widely distributed CLCuMuB may pose a threat to the agro-economically important crops in Pakistan.

Data availability statement
The data provided in this manuscript can be accessed as accession numbers OM022935, OM022936, and OM022937 in the NCBI GenBank database (https://www.ncbi.nlm. nih.gov/).

Disclosure statement
No potential conflict of interest was reported by the authors.