Identification of two novel HIV-1 circulating recombinant forms of CRF111_01C and CRF116_0108 in southwestern Yunnan, China

ABSTRACT Yunnan, the region hardest hit by HIV/AIDS in China, is also an area with the most abundant HIV-1 genetic diversity. A large number of novel HIV-1 circulating recombinant forms (CRFs) and unique recombinants were identified among injection drug users in Yunnan; however, few were found among sexual contacts. Here, we obtained 15 near full-length genome sequences (NFLGs) from HIV-1 seropositive heterosexual contacts in Yunnan who received antiretroviral therapy during the period from 2014 to 2016. Phylogenetic analysis showed that six NFLGs belonged to CRF01_AE (n = 3) and CRF106_cpx (n = 3), and the other nine sequences were novel inter-subtype recombinants. Of the recombinants, two novel CRFs (CRF111_01 C (n = 4) and CRF116_0108 (n = 4)) and one CRF106_cpx variant (n = 1) were identified. CRF111_01 C had a CRF01_AE backbone with seven subtype C fragments inserted into the gag, pol, vif, env, nef and 3ʹLTR regions. CRF116_0108 had a CRF08_BC backbone with a CRF01_AE fragment inserted into the pol, tat, rev, vif, vpr, vpu and env regions. Phylogeographic analyses estimated that CRF111_01 C and CRF116_0108 originated approximately 1995.7–1998.6 and 1991.7–1993.7, respectively. These identifications of two novel HIV-1 CRFs highlighted the importance of continuous surveillance in heterosexual contacts and other high-risk groups in this region and the surrounding regions.


Introduction
The human immunodeficiency virus (HIV) pandemic continues to be a major global public health issue. At the end of 2019, approximately 38.0 million people globally were living with HIV/AIDS, 1.7 million people became newly infected with HIV-1 and around 690,000 people died of AIDS-related illnesses worldwide, according to the Joint United Nations Programme on HIV/AIDS. In China, heterosexual transmission is the most predominant route for HIV-1 infection [1]. In 2019, 73.7% of new HIV infections were estimated to be caused by heterosexual transmission. Furthermore, HIV-1 is easily transmitted from high-risk groups (e.g., injection drug users (IDUs) and female sex workers) to their husband/wife and/or sexual partners [2].
HIV is divided into type 1 (HIV-1) and type 2 (HIV-2), and HIV-1 is further divided into four groups (M, O, N and P) [3]. The group M is the world's major epidemic pathogen of HIV/AIDS, and was further classified into ten subtypes (A, B, C, D, F, G, H, J, K and L) and a series of circulating recombinant forms (CRFs) and unique recombinant forms (URFs) [4]. Despite many advances in the diagnosis and antiretroviral treatment of HIV-1, the high diversity and recombination complexity of HIV-1 have long been major concern issues, challenging the HIV-1 surveillance, diagnosis, antiretroviral therapy, and vaccine development [5,6]. HIV-1 inter-subtype recombination has become more diversified over time [7]. As a result, 110 distinct CRFs and a large number of URFs have been documented globally due to the co-circulation of multiple HIV-1 subtypes so far [7,8]. There are currently 47 CRFs formed by recombination among subtypes B, C and CRF01_AE, 35 of which were first identified in China and neighboring Southeast Asian countries (Figure 1), such as Myanmar [9], Laos [10] and Thailand [11][12][13]. We previously reported an extremely high proportion of HIV-1 recombinants among IDUs in the China-Myanmar border region and identified four new CRFs (CRF82_cpx, CRF83_cpx, CRF87_cpx and CRF88_BC) [9,[14][15][16]. In addition, a complicated HIV-1 CRF01_AE/B/C recombinant isolated from a longdistance truck driver in northern Myanmar was identified [17]. More than ten HIV-1 subtypes/CRFs and a series of inter-subtype recombinants were found among the newly diagnosed HIV-1 infected individuals in Yunnan, and the recombinants accounted for 86.0% of total infections [18].
These studies indicated the ongoing generation of complex HIV-1 recombinants among subtypes B, C and CRF01_AE in northern Myanmar as well as in Yunnan [14,15,18,19].
The HIV-1 epidemic in China has been driven by CRF01_AE, CRF07_BC and CRF08_BC in recent years, which caused the co-circulation of these CRFs with previously dominant subtypes (e.g., B and C). As a consequence, some second-generation recombination forms (2 nd -CRFs) are generated from these CRFs and subtypes. In particular, an increasing number of the 2 nd -CRFs by CRFs were identified, including CRF79_0107 [20], CRF80_0107 [21], CRF102_0107 [22], CRF104_0107 [23], CRF105_0108 [24], and CRF109_0107 [25]. A recent systematic review and meta-analysis that focuses on the heterosexual contacts in China showed a significantly higher prevalence of CRF01_AE and CRF08_BC in heterosexual groups in the southern provinces and southwest provinces, respectively [1]. In particular, HIV-1 CRF01_AE and CRF08_BC were two main circulating HIV-1 strains in IDUs and the heterosexual contacts in Yunnan for years [18,26]. However, only one 2 nd -CRF105_0108 was identified among sexual contacts (four homosexual contacts and one heterosexual contact) in Sichuan [24]. Furthermore, the first report of a novel CRF100_01 C consisting of CRF01_AE and C was identified among heterosexual contacts [27]. Whether there are new CRFs and new 2 nd -CRFs to be formed by these CRFs and subtypes among the heterosexuals remains less investigated. In this study, we reported two newly identified CRFs, CRF111_01 C and CRF116_0108, among the heterosexual group in Yunnan, China.

Study samples
A survey was conducted among HIV-1 infected individuals who received antiretroviral therapy at Cangyuan Wa Autonomous County People's Hospital between 2014 and 2016 ( Figure 1). The original study protocol was reviewed and approved by the Ethics Committee of Kunming Institute of Zoology, Chinese Academy of Sciences (approval number: SWYX-2013023; date: 6 September 2013). All participants provided written informed consent.
Phylogenetic tree analysis was conducted by MEGA 7.0 using the maximum-likelihood (ML) method under the general time reversible model with a gamma-distributed model of among site rate variation using four rate categories (GTR+I + G) nucleotide substitution model with 1,000 bootstrap replications; the best nucleotide substitution model was selected by using the Akaike information criterion implemented in jModelTest 2.1.7. Furthermore, all the study sequences were further analyzed for the presence of recombination, and possible recombination breakpoints were identified by using the bootscanning approach as implemented in SimPlot v3.5.1 software, as described previously [9]. To confirm the recombination breakpoints of novel HIV-1 CRFs, ML trees were reconstructed with each sub-region using the same phylogenetic analysis. The genomic maps of the HIV-1 recombinant forms were visualized with the Recombinant HIV-1 Drawing Tool.

Next-generation sequencing
To explore whether secondary recombination occurs in 14YN252, we targeted amplification of the recombination fragment (HXB2:6024-7381) from 14YN251 and 14YN252. In brief, the first round of PCR products of 14YN251 and 14YN252 fragment C were used as templates, and then the second round of PCR was performed using primers N1F (5ʹ-ATCARAHTCYTRTAYCAAAGCAGTAAGTA-3ʹ) and ED33 (5ʹ-TTACAGTAGAAAAATTCCCCTC-3ʹ) to amplify the 1357 bp sequence, including the HIV-1 partial tat-rev-vpu-env gene. PCR products were then transported to Seqhealth Technology Co., Ltd. (Wuhan, China) for DNA quality testing, library preparation and next-generation sequencing (NGS). Library preparation was performed with a VAHTS Universal DNA Library Prep Kit for Illumina V3 (Cat. # ND607, Vazyme Biotech Co., Ltd, Nanjing, CN). DNA libraries were then paired-end sequenced (2 × 150) with a sequencing depth of 1 G using the Illumina HiSeq X Ten platform (Illumina Inc., San Diego, CA, USA). The raw sequence reads were filtered for low-quality reads using Kneaddata with default settings, and the parameter "trimmomatic-option" was 'SLIDINGWINDOW: 4:20 MINLEN: 100ʹ [29]. The clean reads were subsequently de-novo assembled using Megahit [30]. To calculate the coverage, the clean reads were aligned to assembled contigs using bowtie2 to obtain the information of reads mapping; then, Checkm was used to calculate the number of mapped reads and coverage [31,32]. Finally, all assembled contigs were aligned to reference genomes using Bowtie2 to select object-related contigs.

Molecular clock signal and Bayesian phylogenetic analyses
To explore the origin time and phylogenetic relationships of the novel HIV-1 CRFs, sequence datasets including two and three different genomic regions with different subtypeorigin were selected from CRF111_01 C and CRF116_0108, respectively, and subjected to Bayesian phylogenetic analyses using BEAST v.1.10.4. Available subtype C, CRF01_AE and CRF08_BC sequences with known sampling years from China, Myanmar, Thailand, Vietnam, India, Central African Republic, South Africa, Brazil and Ethiopia were retrieved from the Los Alamos HIV Database (http://www.hiv.lanl.gov) and also included in the Bayesian phylogenetic analyses. The best substitution model (GTR+I + G) for each sequence dataset was inferred using Jmodeltest v2.1.7. The temporal signal of each dataset was evaluated by calculating the correlation coefficient between the root-to-tip divergence and sampling date with TempEst v1.5.1 [33]. The coefficient of correlation between root-to-tip divergence versus sampling date (Table S2) supports the presence of a temporal signal and the use of a molecular clock analysis in BEAST. We tested different molecular clock model (strict and uncorrelated lognormal relaxed molecular clock) and tree prior (constant size, exponential growth and Bayesian skyline coalescent) combinations using the marginal likelihoods calculated by path sampling/stepping-stone sampling [34]. All analyses were run for 100 million generations, with sampling every 10,000 generations. Then, we sampled 20 path steps with a chain length of 5 million, with power posteriors determined according to evenly spaced quantiles of a Beta (0.3, 1.0) distribution. An uncorrelated lognormal relaxed clock model with Bayesian skyline or exponential growth tree prior was identified as the best combination of models (Table S3). The maximum clade credibility (MCC) trees were inferred under the optimal BEAST model combination and best fit nucleotide substitution model using

Data availability
All HIV-1 sequences generated in this study were deposited in the GenBank database (accession numbers MT624743-MT624757) and NCBI SRA database (accession number: PRJNA723921).

Social-demographic characteristics of the study samples
We obtained 15 NFLGs from HIV-1 positive samples collected from heterosexual contacts during 2014-2016. The social-demographic information of 15 individuals with available NFLGs, including sample years, gender, age, ethnic group, education level, and marriage status, is summarized in Table 1.

Identification of the HIV-1 CRF111_01 C and CRF116_0108
A near full-length genome phylogenetic tree divided 8 of 15 NFLGs into two clades I (16YN29, 16YN604, 16YN652 and 16YN732) and II (14YN263, 14YN264, 16YN253 and 16YN424). Phylogenetic analysis showed that the two clades did not cluster with any clusters formed by known HIV-1 subtypes/CRFs, implying that the 8 sequences were likely to be new subtypes/CRFs (Figure 2). In the phylogenetic tree, clade I was located outside the cluster of CRF01_AE (Figure 2). Bootscan analyses indicated that the sequences in clade II included seven CRF01_AE fragments and seven C fragments separated by 13 recombination breakpoints ( Figure S1A). Because these sequences had the same recombination pattern, they were defined as a new CRF111_01 C ( Figure S1A Figure S1B). Sub-region ML trees based on each segment of CRF111_01 C further confirmed the results of recombination analysis ( Figure S2).
Four sequences in clade II were near the reference sequences of CRF105_0108 ( Figure 2). Bootscan analyses showed that these sequences were composed of CRF01_AE and CRF08_BC, with two recombination breakpoints ( Figure S3A). Because these sequences shared the same recombination pattern, they were defined as a new CRF116_0108 ( Figure S3A). The genomic structure of the CRF116_0108 was as follows: I CRF08_BC (826-4683 nt), II CRF01_AE (4684-8727 nt), and III CRF08_BC (8728-9521 nt) ( Figure S3B). Sub-region ML trees based on each segment of CRF116_0108 confirmed the results of recombination analysis ( Figure S4).

The identification of a CRF106_cpx variant
14YN251 and 14YN252 were regular sexual partners, and two NFLGs from them closely clustered together, suggesting direct transmission of the virus between them. Because 14YN251 self-reported being infected earlier than 14YN252, the virus was more likely transmitted from 14YN251 to 14YN252. Interestingly, however, we found that the recombination pattern of 14YN252 was slightly different from that of 14YN251 and other CRF106_cpx strains ( Figure S5B), suggesting the occurrence of further virus recombination in 14YN252. To trace the occurrence of secondary recombination in 14YN252, we conducted nextgeneration sequencing of a specific recombination fragment (HXB2:6024-7381) that carries distinct recombination breakpoints and distinguishes 14YN251 and 14YN252 (Figure 3). Three and two contig sequences were obtained from 14YN251 and 14YN252, respectively ( Figure 3). In 14YN251, two different contig sequences that correspond to the targeted fragment were obtained, indicating the co-circulation of at least two HIV-1 variants. The most predominant contig sequence had a very high read abundance (2,258,654) and shared a consistent CRF106_cpx recombination pattern with the sequence obtained by Sanger sequencing. Another contig sequence with very low read abundance represents a minor variant and belongs to subtype C. In 14YN252, only one contig sequence corresponded to the targeted fragment and shared the same recombination pattern with the sequence (CRF106_cpx-related recombinant, or CRF106_cpx variant) obtained by Sanger sequencing, indicating that the secondary generation recombination occurred earlier before sampling, and the CRF106_cpx-related recombinant completely replaced CRF106_cpx as the dominant strain. The two other contig sequences were outside the targeted region, and might be derived from HIV-1 proviral DNA.

Discussion
Multiplex HIV-1 subtypes/CRFs are co-circulating in the China-Myanmar border area [14,15,19,35,36]. Accompanied by the decrease in the prevalence of pure HIV-1 group M subtypes (such as B and C), the prevalence of various recombinant forms is progressively increasing. In recent years, a large number of HIV-1 inter-subtype recombinants were identified among IDUs in this area, and the vast majority of them were complex recombinants formed by subtypes of B, C and CRF01_AE, including some newly identified CRFs, CRF82_cpx, CRF83_cpx and CRF87_cpx [9,15,16]. However, few CRFs were identified among the heterosexuals, the major high-risk group for HIV-1  infection in Yunnan and having a high prevalence of both CRF01_AE and CRF08_BC. This study identified two novel CRFs, named CRF111_01 C and CRF116_0108 from the heterosexuals in Yunnan. Although the biological, transmission and pathogenic features remain to be evaluated, the identification of the two novel CRFs has important implications for HIV-1 evolution, transmission, diagnosis and treatment [5,6]. The proportion of heterosexual transmission of HIV-1 has rapidly increased in recent years. However, HIV-1 transmission between regular sexual partners is often ignored [37]. In 2011, the proportion of heterosexual transmission of HIV-1 in China was estimated to be 46.5%, of which about a quarter was sexual transmission between regular sexual partners [38]. Occasional sex with other sexual partners and other high-risk factors for HIV-1 transmission increase the risk of HIV-1 transmission within a family. A survey conducted among blood transfusion recipients in Hebei showed that the rate of intra-familial transmission of HIV-1 was up to 30.7% (81/264), of which regular sexual partners' transmission and mother-to-child transmission reached 20.9% (53/254) and 28.1% (46/ 164), respectively [39]. Currently, although some interventions have been employed to decrease HIV-1 transmission among the pre-marital population and to avoid mother-to-child transmission, the measures to reduce HIV-1 intra-familial transmission among high-risk regular sexual partners are very limited, especially in some areas of Yunnan Province, China, which led to an increased prevalence of HIV-1 among this group.
Lincang Prefecture, bordering the northeastern regions of Myanmar, is a "hot spot" region for HIV-1 recombination [27,40]. A recent study reported that the vast majority (93.8%) of HIV-1 positive individuals initiating antiretroviral therapy in Lincang Prefecture were the heterosexuals. CRF08_BC was the most predominant HIV-1 genotype (70.2%, 226/322) among them, followed by some URFs (10.6%, 34/322) [40]. In this study, we first identified two new HIV-1 CRFs (CRF111_01 C and CRF116_0108) from the heterosexuals in Cangyuan County of Lincang Prefecture according to 15 NFLGs and detected co-circulation of the two new CRFs (26.7% both) with CRF106_cpx (26.7%, including a CRF106_cpx variant) and CRF01_AE (20.0%). In spite that no CRF08_BC and other URFs were found in this cohort, our results clearly supported an on-going increase of HIV-1 genetic diversity among the heterosexuals in this region. On the other hand, the NGS data suggested that at least the individual 14YN251 was co-infected by CRF106_cpx and subtype C, implying the presence of sexual activity outside marriage with occasional sexual partners. Given the high potential of HIV-1 recombination, co-circulation of four CRFs among the heterosexuals with regular and occasional sexual partners will inevitably result in the generation of new 2 nd -CRFs and even 3 rd -CRFs formed by these CRFs.
Yunnan is a hotspot for HIV-1 inter-subtype recombination, mainly due to the co-circulation of HIV-1 subtypes B, C and CRF01_AE among IDUs [19]. A large number of URFs were detected among IDUs in Yunnan, and some of them were later identified as CRFs, including CRF57_BC, CRF87_cpx, CRF88_BC, CRF96_cpx, CRF101_01B and CRF110_BC. The molecular epidemiology of HIV-1 recombinants in Yunnan was characterized by three main features. First, the vast majority of URFs and CRFs were generated by the recombination between B and C [19], as well as among B, C and CRF01_AE [16,35,36], but very few URFs and CRFs were derived from the recombination between subtypes C and CRF01_AE. Second, the vast majority of URFs and CRFs originated and were mainly circulating among IDUs, but few from the heterosexuals. Third, accompanied by the co-circulation of CRF01_AE, CRF07_BC and CRF08_BC, six 2 nd -CRFs (CRF79_0107, CRF80_0107, CRF102_0107, CRF104_0107, CRF105_0108, and CRF109_0107) were generated [20][21][22][23][24][25]. The vast majority of the 2 nd -CRFs were derived from CRF01_AE and CRF07_BC, but only one (CRF105_0108) that was first identified in Sichuan (outside Yunnan) was formed by CRF01_AE and CRF08_BC [24]. In this study, the identification of one new CRF (CRF111_01 C: 26.7%) derived from CRF01_AE and C, and one new 2 nd -CRF (CRF116_0108: 26.7%) derived from CRF01_AE and CRF08_BC, as well as the finding of the newly identified CRF106_cpx (26.7%) among 15 heterosexuals, might imply a changing trend of HIV-1 molecular epidemiology among the heterosexuals, at least in Lincang Prefecture of Yunnan, which has important epidemiological implications.
Among CRFs identified in China, only three (CRF07_BC, CRF08_BC and CRF55_01B) caused the epidemic since their origin, while the others were just associated with sporadic infections. CRF07_BC and CRF08_BC were the earliest identified CRFs in China, both of which originated among IDUs in Yunnan in early 1990s and later spread to heterosexuals and homosexuals and other regions of China [41]. CRF07_BC has a nationwide prevalence. It remained the most predominant HIV-1 strain among IDUs and was becoming the most predominant HIV-1 among men who have sex with men (MSM) [42], while the prevalence of CRF08_BC was mainly restricted to some regions (e.g. Yunnan and Guangxi) [18,40,43].
CRF55_01B originated among MSM in about 2001 [44] and showed an increasing prevalence trend in recent years. CRF111_01 C and CRF116_0108 were estimated to originate in about 1995. 7-1998.6 and 1991.7-1993.7, respectively, implying a long history. The origin time of CRF111_01 C and CRF116_0108 was slightly later than those of CRF07_BC and CRF08_BC, but obviously earlier than that of CRF55_01B, and other CRFs (e.g., CRF62_BC) [41,44,45]. The main reason for why CRF111_01 C and CRF116_0108 were not found for a long time might be that their prevalence was mainly restricted to the heterosexuals in Cangyuan County, and this cohort was previously ignored. However, as evidenced by the prevalence and spread of CRF07_BC and CRF01_AE, HIV-1 was easily transmitted among different high-risk groups (e.g., IDUs, the heterosexuals and MSM) and in different regions. Therefore, CRF111_01 C and CRF116_0108 were not excluded to circulate in other high-risk groups (e.g., IDUs and/or MSM) and the surrounding regions. A large-scale molecular epidemiological investigation should be performed to see whether there are more infections with CRF111_01 C and CRF116_0108 among the heterosexuals and other high-risk groups in Cangyuan and the surrounding regions. Furthermore, a long-time monitoring should also be encouraged to track the transmission of the two new CRFs, and to further evaluate their transmission risk from the heterosexuals to other highrisk groups and from Cangyuan to other regions.
There are two limitations in our study. The first one is the relatively small sample size. Despite the small sample size, we identified two new HIV-1 CRFs (CRF111_01 C and CRF116_0108) according to the criterion of a new HIV-1 CRF and found the cocirculation of four CRFs (CRF01_AE, CRF106_cpx, CRF111_01 C and CRF116_0108) in the heterosexuals in Cangyuan County of Lincang Prefecture. The second is that the samples were collected during 2014-2016, which might not reflect the current prevalence pattern of HIV-1 among the heterosexuals. As mentioned above, whether the two new CRFs have caused a wide epidemic in the heterosexuals and even other high-risk groups (e.g. IDUs) in this region and/or surrounding regions needs to be concerned.
In summary, we identified two novel HIV-1 CRFs (CRF111_01 C and CRF116_0108) from the heterosexual contacts in Cangyuan County of Yunnan Province, and estimated that the origin times of CRF111_01 C and CRF116_0108 were in around 1995. 7-1998.6 and 1991.7-1993.7, respectively. Furthermore, we detected the co-circulation of four CRFs in this cohort. Given the on-going increase of HIV-1 genetic diversity in Yunnan province and the increasing prevalence trend of some newly identified CRFs (e.g. CRF07_BC, CRF08_BC and CRF55_01B) in China, the identification of two novel CRFs has important epidemiological implications, which highlights the importance of continuous surveillance.