Two different Xylella fastidiosa strains circulating in Italy: phylogenetic and evolutionary analyses

ABSTRACT Xylella fastidiosa, a bacterial species infecting a broad range plants, includes five subspecies, fastidiosa, multiplex, pauca, mulberry and sandyi. In Europe, Xylella was isolated in olive trees in southern Italy (Apulia region) during the year 2013. The aim of the present study was to apply phylogenetic and evolutionary analysis to trace the possible origin and way of the entrance of Xylella fastidiosa in Italy. All the genomes available for Xylella fastidiosa spp were downloaded from NCBI. A phylogeographic analysis was performed using BEAST. X. fastidiosa strains belonging to X. fastidiosa subsp. pauca and subsp. sandyi have been reported to infect olive trees and coffee plants, respectively. The phylogeographic analysis also revealed and confirmed these two different ways of provenience X. fastidiosa subsp. pauca from Costa Rica and X. fastidiosa subsp sandyi from California Phylogeny have been an important tool to validate and support the recent hypothesis for X. fastidiosa pauca provenience.

The transmission is by specific insect (leafhoppers) but the molecular mechanism involved in plant-bacteria interaction is not well defined (Brlansky et al. 1983). This bacterium is usually transmitted to the new host plants during insect vectors feeding and colonizes the xylem, a water transport network of vessels. Bacterial cells form a biofilm occluding the xylem vessels, blocking water transport and causing water stress symptoms (Almeida and Purcell 2003).
The Xylella fastidiosa CVC strain 9a5c was the first plant pathogenic bacterium whose genome, consisting of a 2,679,305 bp circular chromosome, was completely sequenced (Simpson et al. 2000). X. fastidiosa includes five taxonomically recognized subspecies (Schaad et al. 2004;Schuenzel et al. 2005;Randall et al. 2009;Bull et al. 2012;Bull et al. 2014;Nunney et al. 2014) isolated in different geographic areas: subsp. fastidiosa in Central America, subsp. multiplex in Northern America, subsp. pauca in Southern and subsp. sandyi in the southern regions of USA (Schuenzel et al. 2005;Nunney et al. 2010). In Europe, Xylella was isolated in olive trees in southern Italy (Apulia region) during the year 2013. This evidence shows the ability of spreading at long distance and the consequent susceptibility of the Mediterranean areas to the infection (Saponari et al. 2014;Giampetruzzi et al. 2015aGiampetruzzi et al. , 2015b.
Xylella fastidiosa causing disease in a broad variety of plants with agricultural relevance is responsible for worldwide economic losses. Comparative genome sequencing of bacteria is a powerful means of detecting sequence diversity among closely related, but distinct populations. The aim of the present study was to apply phylogenetic and evolutionary analysis to trace the possible origin and way of the entrance of Xylella fastidiosa in Italy.

Genome sequences dataset
All the genomes available for Xylella fastidiosa spp were downloaded from NCBI (https://www.ncbi.nlm.nih.gov/), and it was built a dataset of 28 sequences. This dataset included complete genomes from USA (California, Texas, Florida, Maryland and Virginia) n = 14, Argentina n = 1, France n = 3, Italy n = 2, Costa Rica n = 3, Brazil n = 5.

Genome alignment and SNPs extraction
Sequences were aligned with Progressive Mauve (Darling et al. 2010).
Recombination was evaluated using Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) on the genome alignment. Gubbins is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions.
Simulations demonstrate the algorithm generates highly accurate reconstructions under realistic models of shortterm bacterial evolution diversification of sequences through both point mutation and recombination and can be run in only a few hours on alignments of hundreds of bacterial genome sequences (Croucher et al. 2014).
The high-quality single-nucleotide polymorphisms (hqSNPs) were extracted from Gubbins as non-recombinant sites.

Bayesian phylogenetic analysis
The HKY + I+G nucleotide substitution model was chosen as a best-fitting model by using the hierarchical likelihood ratio test (Model test, implemented in PAUP*4) for all evolutionary analyses performed. Phylogenetic signal was assessed by likelihood mapping using TreePuzzle (Schmidt et al. 2003). A transitions/transversions vs. divergence graph as well as the Xia's test of substitution saturation was implemented in DAMBE (Xia and Xie, 2001). Maximum-likelihood (ML) phylogenetic analysis was performed using RAxML (Stamatakis 2014).
The core genome was extracted analyzing the genome using Prokka (Seemann 2014). This step was performed to evaluate the accordance with the non-recombinant hqSNPs.
A pairwise genetic distance between the two Italian strains was performed using MEGA7 (Tamura et al. 2013).
The presence of sufficient temporal signal (measurable evolution between sampling times), required for molecular clock calibration within this framework, was evaluated using linear regression analysis of the relationship of rootto-tip genetic distance (using trees inferred by maximum likelihood) and sampling time for individual partitioned sequences using TempEst v1.5 (available from http://tree. bio.ed.ac.uk/software/tempest/) (Rambaut et al. 2016).
The evolutionary rate of the Xylella fastidiosa spp SNPs alignment was estimated by calibrating a molecular clock using known sequences sampling times with the Bayesian Markov Chain Monte Carlo (MCMC) method implemented in BEAST v. 1.8.2 (http://beast.bio.ed.ac.uk) (Drummond et al. 2005, Drummond andRambaut, 2007). In order to investigate the demographic history, independent MCMC runs were carried out enforcing both a strict and relaxed clock with an uncorrelated log normal rate distribution and one of the following coalescent priors: constant population size, exponential growth, non-parametric smooth skyride plot Gaussian Markov Random Field (GMRF) and non-parametric Bayesian skyline plot (Drummond et al. 2002, Drummond et al. 2005Minin et al. 2008) with ascertainment bias correction. Marginal-likelihoods estimates for each demographic model were obtained using path sampling and stepping stone analyses (Baele and Lemey 2008;Baele et al. 2013aBaele et al. , 2013b. Uncertainty in the estimates was indicated by 95% highest posterior density (95% HPD) intervals, and the bestfitting model for each data set was by calculating the Bayes Factors (BF) (Kass and Raftery 1995;Baele et al. 2013a). In practice, any two models can be compared to evaluate the strength of evidence against the null hypothesis (H0), defined as the one with the lower marginal likelihood: lnBF <2 indicates no evidence against H0; 2-6, weak evidence; 6-10: strong evidence, and >10 very strong evidence. Chains were conducted for at least 100 × 106 generations and sampled every 10,000 steps for each molecular clock model. Convergence of the MCMC was assessed by calculating the effective sample size (ESS) for each parameter. Only parameter estimates with ESS's of >250 were accepted. The maximum clade credibility (MCC) tree was obtained from the trees posterior distributions, after a 10% burn-in, with the Tree-Annotator software v 1.8.2, included in the Beast package) (Drummond et al. 2005;Drummond and Rambaut 2007). Statistical support for specific monophyletic clades was assessed by calculating the posterior probability (p > 0.90).
The continuous-time Markov Chain process over discrete sampling locations implemented in BEAST) (Drummond and Rambaut, 2007) was used for the phylogeography inference, by using the Bayesian Stochastic Search Variable Selection (BSSVS) model, which allows the diffusion rates to be zero with a positive prior probability. Locations considered were the different country locations. Comparison of the posterior and prior probabilities of the individual rates being zero provided a formal BF for testing the significance of the linkage between locations. The MCC tree with the phylogeographic reconstruction was selected from the posterior tree distribution after a 10% burn-in using the Tree Annotator.

Results
The recombination analysis performed with Gubbins on the complete genome alignment showed evidence of recombination events. Gubbins built a new genome alignment excluding the recombinant position. This SNPs alignment was of 115,893 SNPs.
The phylogenetic signal analysis using a transition/transversion vs. divergence graph and the Xia's test (p < 0.00) did not show evidence for substitution saturation. This indicated that enough signal for phylogenetic inference existed (Supplementary Figure S1 and Supplementary Table S1). Likelihood mapping analysis reported 0.0% of star-like signal (phylogenetic noise) indicating overall that the data set contained an excellent phylogenetic signal for reliable inference (Supplementary Figure S2).
The maximum likelihood phylogenetic tree showed five main clades (I, II, III, IV and V) all statistically supported with bootstrap values ≥ 99%, as in Figure 1. The five subspecies of Xylella fastidiosa (Xylella fastidiosa pauca, Xylella fastidiosa multiplex, Xylella fastidiosa mulberry, Xylella fastidiosa sandyi and Xylella fastidiosa fastidiosa) are included separately in clade I, II, III, IV and V, respectively. The two unique Italian isolates of Xylella fastidiosa pauca and Xylella fastidiosa sandyi are found within clade I and IV (marked in bold in Figure 1). In clade I, the Italian isolate clustered with sequences from Costa Rica, whereas the French isolate represents the out-group for this cluster. The other Italian sequence cluster with a sequence from California (clade IV). Clade II included only sequences from the USA. In clade III, there are only two sequences from Maryland. In clade V, there are only sequences from the USA but one from France.
The same topology and classification were found in the ML tree using the core genome alignment (data not shown), suggesting that the possible recombination sites were not in the core genome alignment.
The SNPs difference between the two Italian isolates is 59,413 SNPs.
The clock-like signal detected with TempEst showed a more diffuse regression plot (R = 0.2), but appears to be suitable for phylogenetic molecular clock analysis in BEAST or other programs.
The non-parametric smooth skyride plot GMRF with a relaxed molecular clock was selected as the most appropriate demographic model, to describe the probable evolutionary history of Xylella fastidiosa spp. Molecular clock calibration estimated the evolutionary rate of the Xylella fastidiosa spp SNPs alignment at 5.36 × 10 −4 substitutions site per year (95% HPD 4.29 × 10 −4 -6.56 × 10 −4 ). Figure 2 showed the phylogeographic analysis performed on the X. fastidiosa spp SNPs alignment. The same topology of the ML has been highlighted. The X. fastidiosa sandyi Italian strain isolated from Coffea Arabica in 2015, probably originated from California (posterior probability = 1), as shown in the zoom on the right of Figure 2. The X. fastidiosa pauca Italian strain isolated from olive trees in 2014, probably originated from Costa Rica (posterior probability= 1).

Discussion
The introduction of X. fastidiosa spp, since 2013, in Italy caused an agricultural disaster with an economic careening in terms of oil production in south Italy.
1. X. fastidiosa strains belonging to X. fastidiosa subsp. pauca and subsp. sandyi have been reported to infect olive trees and coffee plants, respectively (Paradela-Filho et al. 1997;Montero-Astúa et al. 2008). In spite of no evidence of X. sandyi infection on Italian species of agronomic interest having been reported, the importation of coffee plants from the Americas could represent a potential reservoir to spread the bacteria in countries where it did not exist before, causing epidemic and genetic diversity producing recombinant forms. As confirmed by phylogeographic analysis, the Xylella fastidiosa pauca and sandyi subspecies infecting the Italian country probably come from Costa Rica and California, respectively, using ornamental coffee plants as traveling vector. These two genetically different subspecies were intercepted and isolated in Northern Italy (sandyi) and directly in an olive plant in Southern Italy.
By the epidemiological point of view, the management of X. fastidiosa spp can be difficult sometimes because plants show disease symptoms only after months or years, whereas epidemics can spread fast causing devastating effects. In addition, epidemics can spread easily also because a great range of insects can act as vectors so as different plant species (Perilla-Henao and Casteel 2016). Molecular epidemiological studies leaded to important consideration about the introduction of X. fastidiosa subsp. pauca and subsp. sandyi in Italy. Phylogenetic analysis revealed two different epidemic introductions of these two bacteria strains probably in different calendar time. These two subspecies were also found different in terms of genetic distance having 59,413 SNPs of difference. This difference in genetic distance leads the two genomes to be in different clusters in the tree as expected, underlining the hypothesis of two different introductions probably for two different way using different or similar traveling vectors. The phylogeographic analysis also revealed and confirmed (Giampetruzzi et al. 2017) these two different ways of provenience X. fastidiosa subsp. Pauca from Costa Rica (posterior probability 100%) and X. fastidiosa subsp sandyi from California (posterior probability 100%). Phylogeny has been an important tool to validate and support the recent hypothesis for X. fastidiosa pauca provenience (Giampetruzzi et al. 2017) as well as the geographic origin of X. fastidiosa sandyi. These studies are useful to preserve the ecology and the environmental heritage representing an important cultural and economic resource.

Disclosure statement
No potential conflict of interest was reported by the author(s).