The prevalence of CRF55_01B among HIV-1 strain and its connection with traffic development in China

ABSTRACT CRF55_01B is a relatively “young” HIV strain. At present, we do not know much about its transmission characteristics in China. So, to describe the transmission characteristics of CRF55_01B strain among provinces and HIV infected people, and to analyze the reasons for its rapid epidemic in China, a total of 1237 subjects infected with CRF55_01B from 31 provinces spanning a period of 12 years from 2007 to 2018 were enrolled in this study. By constructing a molecular network and Bayesian correlation analysis, we found that CRF55_01B increased exponentially from 2005 to 2009 after its origin in Shenzhen, and increased rapidly after 2010. CRF55_01B began to spread to other provinces in 2007. After 2010, the strain showed a trend of rapid spread and epidemic from Guangdong-Shenzhen to other provinces in China. Guangdong, Shenzhen, Hunan, Beijing, Guangxi, Hubei, Jiangxi, Guizhou, Hebei, Anhui, Shanghai, Shandong, Henan, and Yunnan were the key provinces of CRF55_01B transmission. CRF55_01B, although originating from men who sex with men (MSM), was transmitted among heterosexuals in 2010. Males in heterosexuals played a crucial role in the transmission and diffusion of this strain. We also revealed that CRF55_01B might spread rapidly along with the rapid development of the Beijing-Guangzhou and Beijing-Kowloon railways. This study suggests that if we detect the spread of MSMs in time through molecular monitoring in the early stage of the epidemic, it can help us control the epidemic early and prevent its spread, which is of great significance to China's national prevention and control of HIV-1.


Introduction
In the 35 years of the epidemic in China, HIV-1 has formed many Circulating Recombinant Forms (CRFs) strains [1][2][3]. CRF55_01B of HIV-1 strain has attracted much attention in recent years as it is the first CRF01_AE and B subtype recombinant strain in China. It is also the first CRF identified in China in this century since CRF07_BC and CRF08_BC were discovered last century. CRF55_01B was first reported in Changsha, Hunan province, and Dongguan, Guangdong Province in 2013 [4] and has been detected in many cities in China since it was identified [5].
CRF55_01B was first discovered in men who sex with men (MSM). Previous studies have shown that CRF55_01B originated around 2000-2003 and mainly spread in MSMs, and then rapidly spread throughout the country [5][6][7]. CRF55_01B stood out among CRF01_AE and B subtype recombinant strains (CRF59_01B [8], CRF67_01B, CRF68_01B [9]) discovered in the same year, as well as many new CRFs (CRF79_0107 [10], CRF80_0107 [11], etc.) found in MSMs and became the fifth most prevalent strain of HIV-1 in China [12]. Therefore, this study will focus on the analysis of how CRF55_01B rapidly spread to the whole country in a short time and the characteristics of transmission among provinces and infected people.
the HIV sequence database of the Los Alamos National Laboratories (LANL) and the National Center for AIDS/STD Control and Prevention of China (NCAIDS). RaxmlGUI v2.0.0 [13] was used to construct the RaxML phylogenetic tree for subtype identification and eliminate duplicates and sequences without province and sampling year information. The demographic data missing was mainly in the sequences of Guangdong. Since there were too many unknown demographic data sequences in the data set, two data sets were established. Data set A contained all sequences for the molecular network analysis. Data set B was another database built according to the priority order of region, province, sampling year, risk, sex, and age for Bayesian analysis (Table S1).

Phylogenetic analysis and HIV-1 molecular network construction
A molecular network was constructed using HIV-TRACE (Transmission Cluster Engine) [14]. We aligned HIV pol sequences to an HXB2 reference sequence and calculated pairwise genetic distances under the Tamura-Nei 93 model [15]. The ambiguous nucleotides of all sequences were less than 5%. Each individual in the molecular network was represented by a node, and we linked nodes to each other if their pairwise genetic distance was up to 0.5% substitutions per site based on the recommended genetic distance threshold by the Centers for Disease Control and Prevention (CDC) in the United States (US) [16].

Time-scaled phylogenetic tree reconstruction using BEAST
To reconstruct the temporal and spatial dynamics of CRF55_01B strain in various provinces, we performed a Bayesian discrete phylogeographic approach to estimate the rate of evolution and the time to the most recent common ancestor (tMRCA) using Markov chain Monte Carlo (MCMC) runs of 200 million generations with BEAST v1.8.4 [17] under a Bayesian Skygrid demographic model [18]. The final data set was analyzed using a general time-reversible (GTR) nucleotide substitution model [19] specifying a gamma distribution as a prior on each relative substitution rate and a relaxed uncorrelated lognormal (UCLN) molecular clock model to infer the timescale of HIV-1 evolution with a gamma distribution prior on the mean clock rate (shape = 0.001, scale = 1000) [20,21]. The Bayesian MCMC output was analyzed using Tracer v1.6 [22]. The Effective Sample Size (ESS) values for estimates were more than 200. Using LogCombiner (in BEAST package), we subsampled the posterior distribution of phylogenetic trees to generate an empirical distribution of 2000 trees representative of the posterior sample. The first 10%-30% of the states from each run were discarded as burn-in. Trees were summarized as maximum clade credibility (MCC) trees using TreeAnnotator (in BEAST package) and then visualized in FigTree v1. 4.4 (http://tree.bio.ed.ac.uk/software/figtree). SpreaD3 v0.9.7.1 [23] was used to draw the CRF55_01B propagation roadmap.

Discrete phylogeographic analyses and spatial structure
To provide an adequate description of the process of viral dissemination, we use a Bayesian stochastic search variable selection (BSSVS) [24] procedure. We expected to analyze the relationship between transmission risk groups (Risk), risk and sex (Risk-Sex), and risk and age (Risk-Age). We also used a robust counting (Markov jumps) [25] approach to count the expected number of virus lineage movements. Statistical support was measured using Bayes factors (BF) [24] and summarized using SpreaD3 [23].

Discrete trait and a Bayesian Tip-association Significance Testing (BaTS)
A Bayesian Tip-association Significance Testing (BaTS) provided a method by which the degree to which traits seen in a phylogeny are associated with ancestry are correlated [26]. To evaluate phylogenetic correlations between provinces, we estimated the phylogenetically based Association Index (AI) statistic, Parsimony Score (PS) statistic, and Monophyletic Clade (MC) statistics for each discrete-trait using BaTS v0.9 beta.
The AI and PS statistics tested the association between provinces and tree topology, considering the level of uncertainty in the phylogenetic reconstruction. The MC index tested which traits (provinces) were associated with phylogeny. The observed mean and its associated 95% confidence intervals (Upper and Lower CI) were obtained by analyzing trees sampled during the Bayesian phylogenetic reconstruction. The null mean and associated confidence intervals were obtained after randomly distributing the phylogeny traits (100 replicas). The significance level was the P-value for the statistical hypothesis test for equality between the index observed and expected under no association [27].

Statistical analysis
Cochran-Armitage was used to analyze the variation trend of transmission risk groups over time. According to whether they belong to a molecular network, a comparison of demographic characteristics was based on chi-square tests. Fisher test was used when the number of cells was less than 5. Wilcoxon ranksum test was used for Beijing-Guangzhou and Beijing-Kowloon railways analysis. P < 0.05 was statistically significant. All statistical tests were performed using R v4.0.2.

Demographic characteristics of CRF55_01B
A total of 1237 (734 from LANL database and 503 from NCAIDS database) sequences sampling time spaned from 2007 to 2018 were obtained for the subsequent analysis. The data set covers 31 provinces mainly distributed in Guangdong, especially in Shenzhen (Shenzhen, belongs to Guangdong) followed by the neighboring Guangxi in south China, Beijing in North China, and Hunan in central China. To increase the analysis's resolution, as the origin of CRF55_01B, and due to a large number of sequences, Shenzhen was listed separately in all the following analyses. MSMs are the main transmission route of the research subjects in (Table 1). Although CRF55_01B was first found in MSMs, it gradually spreads to heterosexuals. By using the chi-square trend test, it was found that heterosexuals showed an increasing trend in three periods: 2007-2012, 2013-2015, and 2016-2018 (Table 2).

Molecular network analysis of CRF55_01B
Under the threshold of 0.5% genetic distance, 60.5% (748/1237) sequences (nodes) from a total of 98 clusters were enrolled in the molecular network. The largest cluster consists of 46.3% (346/748) nodes, including 20 provinces. The molecular network diagram of CRF55_01B is shown in Figure S1. Whether infected people belong to the molecular network is related to risk, age, sampling year, region, and province (Table S2).

Phylogenetic and geographic analysis of CRF55_01B
To statistically quantify the degree of diversification achieved by CRF55_01B, a BaTS was carried out. Phylogeny showed evidence of geographic association of CRF55_01B assessed by AI and PS (P = 0.00) in 31 provinces. Besides, the analysis of the MC index showed that the association was statistically supported (P < 0.05) for Guangdong, Shenzhen, Jiangsu, Guangxi, Shandong, Hunan, Yunnan, Beijing, Anhui, Shaanxi, Hubei, Shanghai, Guizhou, Xinjiang, Zhejiang, Tibet, Jiangxi, Hebei, Chongqing, Fujian. This result suggests at least some historical circulation of CRF55_01B in these provinces (Table S9).

Correlation between the dissemination of CRF55_01B and the Beijing-Guangzhou/ Beijing-Kowloon railways
The propagation roadmap of CRF55_01B (Figure 2) showed that the Guangdong-Beijing line was particularly significant. Therefore, we think of the Beijing-Guangzhou and Beijing-Kowloon railways. To verify the hypothesis that CRF55_01B spread along with the development of the Beijing-Guangzhou and Beijing-Kowloon railways, we added the provinces' comparative analysis on the Beijing-Guangzhou and Beijing-Kowloon railways belonging to the molecular cluster and those not belonging to the molecular cluster. CRF55_01B was found to be more easily accessible to the molecular network in the provinces on the Beijing-Guangzhou and Beijing-Kowloon railways (P < 0.0001). The mean counts of CRF55_01B import provinces were compared between located on the Beijing-Guangzhou and Beijing-Kowloon railways and located on others. The results demonstrated the provinces with more CRF55_01B inputs were mostly located on the Beijing-Guangzhou and Beijing-Kowloon railways (P < 0.0001) ( Figure 5).

Discussion
Using more representative national sequences (1237 sequences, 2007-2018), our study systematically analyzed the origin and spread of CRF55_01B and reconstructed its epidemic history. The tMRCA of CRF55_01B was around 2003.0 (95% HPD interval: 2001. 2-2004.6). The estimate was similar to previous studies [5][6][7]28], but the 95% HPD interval was narrower, possibly caused by our more complete data set.
This study proved that Guangdong-Shenzhen in South China was the source of infection for the rapid spread of CRF55_01B to other provinces in China. CRF55_01B first spread to other cities in Guangdong after its origin in Shenzhen, gradually spread from Guangdong to other provinces around 2007, and then spread to the whole country after 2010.
After reconstructing the epidemic history of CRF55_01B, we found that the spread and diffusion of CRF55_01B closely related to traffic development. In 2007, the sixth-speed increase of China railway (including the Beijing-Guangzhou and Beijing-Kowloon railways) marked China's entry into the ranks of the world's advanced railways. Guangdong-Shenzhen is a major economic province in China, with GDP ranking among the top in the country and rapid development. After the acceleration of the Beijing-Guangzhou railway in 2007, passengers of the Beijing-Guangzhou railway increased significantly. The Beijing-Guangzhou railway is the busiest railway line in China and the most important north-south railway transportation artery in China, running through Beijing, Hebei, Henan, Hubei, Hunan, and Guangdong provinces. It is the railway with the most connections through provincial capitals and other railways in China. Since 2010, the completion of high-speed rail networks in the Yangtze River Delta, Pearl River Delta, and other regions, as well as the Beijing-Guangzhou high-speed railway, the world's longest in operation, has brought Guangdong closer to other provinces. The Beijing-Kowloon railway, which connects not only the north and south but also the central and eastern regions, has become a busy railway in China. It passes through Beijing, Hebei, Shangdong, Anhui, Henan, Hubei, Jiangxi, Guangdong, and Shenzhen. The southern end of the Beijing-Guangzhou Railway and the Beijing-Kowloon Railway is Shenzhen. The provinces where CRF55_01B mainly spread happened to be the provinces through which the Beijing-Guangzhou and Beijing-Kowloon railways passed. Therefore, CRF55_01B may spread rapidly along with the rapid development of the Beijing-Guangzhou and Beijing-Kowloon railways.
CRF55_01B, as a strain of MSM origin, is more likely to spread between large cities and across provinces like other MSM strains [29,30]. At the same time, we have also noticed that CRF55_01B is gradually spreading in railway lines related to the Beijing-Guangzhou and Beijing-Kowloon railways, such as Guangxi, Yunnan, Guizhou, and other places in the west of the railway, and Fujian, Jiangsu, Zhejiang, Shanghai, and other places in the east of the railway.
After CRF55_01B spread to other provinces, the occurrence of CRF55_01B in each province also varies. This study found that CRF55_01B had some historical circulation in 19 provinces (Table S9). Compared with other studies [7,12], this study obtained more information about the spread of CRF55_01B in other provinces.
Although CRF55_01 was first detected in MSMs, it quickly spread to heterosexuals. We found that heterosexuals showed an increasing trend in three periods (P < 0.0001), Table 2. It is worth noting that this is probably due to the greatly reduced "Unknown" population during 2016-2018. In this study, both the molecular network analysis and Bayesian correlation analysis showed a very close relationship between MSMs and heterosexuals. CRF55_01B has an obvious trend of spread from MSMs to heterosexuals, mainly from young MSMs to young heterosexuals males ( Figure 4). The transmission route of HIV-1 in China changed significantly from blood transfusion transmission to sexual transmission between 2007 and 2009 [31,32]. During this period, CRF55_01B also experienced a transmission transition from MSMs to heterosexuals. Therefore, CRF55_01B is still mainly concentrated in MSMs at present, but the number of heterosexually transmitted infections is increasing. This is consistent with the MSMs transmission mode in China, which is transmitted among young people, and males in heterosexuals play a crucial role in the transmission. This mode of transmission may be related to the fact that there are many non-disclosed MSMs in China. Because of social factors, Chinese MSMs are partially self-reported as heterosexuals and bisexuals, and they may have both homosexual and heterosexual behaviours at the same time. Other studies have also proved the existence of such a transmission relationship among Chinese MSMs [33,34]. CRF55_01B is a relatively "young" HIV strain, but it is not spreading at all slowly. From 2013 to 2018, CRF55_01B has become the fifth largest strain China's the HIV-1 composition ratio within 5 years. It has been found in all China provinces and formed transmission clusters in more than half of the provinces. The transmission of CRF55_01B may be related to the development of transportation and technology. CRF55_01B was not found in the Los Alamos National Laboratories (LANL) HIV sequence database in any other country except China. It is also an HIV strain circulating in China. The CRF55_01B is transmitted from MSMs to heterosexuals, so it is necessary to be aware of the reverse transmission of heterosexuals into MSMs. Besides, other studies have shown that CRF55_01B may have a higher transmission risk than CRF01_AE and CRF07_BC [35]. Therefore, the prevention and control of CRF55_01B need to be concerned. As a representative of newly discovered strains of HIV-1 in China, especially MSM strain, its transmission characteristics just reflect the epidemic status of MSM strains in China. The analysis of CRF55_01B suggests that if we detect the spread of MSMs in time through molecular monitoring in the early stage of the epidemic, it will help us control the epidemic early and prevent its spread, which is of great significance to China's national prevention and control of HIV-1. Additionally, this strain originated in developed cities and had convenient transportation, which closely related to the development of China's railways. It is suggested that the spread of HIV-1 closely related to socio-economic development and more attention was needed.