Detection of human parvovirus 4 DNA in the patients with acute encephalitis syndrome during seasonal outbreaks of the disease in Gorakhpur, India

ABSTRACT Seasonal outbreaks of acute encephalitis syndrome (AES) at Gorakhpur, India have been recognized since 2006. So far, the causative agent has not been identified. Use of next generation sequencing identified human parvovirus 4 (HPARV4) sequences in a CSF/plasma pool. These sequences showed highest identity with sequences earlier identified in similar patients from south India. Real-time PCR detected HPARV4 DNA in 20/78 (25.6%) CSF and 6/31 (19.3%) plasma of AES patients. Phylogenetic analysis classified three almost complete genomes and 24 partial NS1 sequences as genotype 2A. The observed association of HPARV4 with AES needs further evaluation. ELISAs for the detection of IgM and IgG antibodies against scrub typhus (Orientia tsutsugamushi, OT) showed ∼70% IgM/IgG positivity suggestive of etiologic association. Prospective, comprehensive studies are needed to confirm association of these agents, singly or in combination with AES in Gorakhpur region.


Introduction
Outbreaks of Japanese encephalitis (JE) have been reported from the Gorakhpur district and adjoining areas of Uttar Pradesh state, India, during 1978-2005 [1] leading to the introduction of JE vaccine [2]. Following vaccination, JE accounted for <10% of AES cases admitted to the BRD Medical College, the only tertiary care hospital catering to the AES cases in Gorakhpur division [2]. During 2007-2014, variable high proportion of AES cases with unknown etiology (41.6-61%) were recorded [3][4][5]. Over 2000 AES patients are admitted annually to this hospital with high case fatality ratio (20-30%) [6]. With the dramatic rise in non-JE cases, vigorous attempts were made by the National Institute of virology, Pune, India to identify the agent by in-vivo (suckling mice) or in-vitro (several cell-lines, PCR) amplification without any success (unpublished observations). In view of the failure of classical and molecular techniques in identifying the agent, in 2011, we decided to employ Next Generation sequencing (NGS) technology not requiring prior sequence information to explore the possibility of detection of the agent(s) in the clinical specimens. At the time of this study, enteroviruses were reported in 21.3% of the patients [7]. Subsequently, association/ role of scrub typhus (Orientia tsutsugamushi, OT) in causing AES outbreaks was extensively studied [8][9][10][11]. We present our NGS data and reanalyse the findings in relation to the current perspective.

Patient characteristics
For HPARV4, we investigated 78 paediatric patients (64 in 2011 and 14 in 2012) from whom a CSF sample was available (Table 1). These included 46 (58.9%) children below the age of five, 21 children (26.9%) in 5-10 years while 11 were of 11-15 years age (14.1%). All the patients came with fever of 2-13 days (8.3 ± 0.27 days). Most of the patients were undernourished, 31% were below −3 SD weight for the ages and 66% were between −2SD and −3SD (WHO growth standards 2006). Table 1 presents the most common presenting symptoms. In addition, 10% children had non-palpable dark, irregular macular rashes on the trunk. ECG abnormalities like low voltage, QRS complexes, tachycardia and abnormal elevation of ST segment and inversion of T-waves were recorded in 10% children. Radiological evidence of cardiomegaly was observed in 14% children. The CSF was clear and 50% of them had pleocytosis and raised CSF protein (40-100 mg/ dL in 62 children); CSF sugar was normal or more than 60% of the blood sugar; 60% had leucocytosis, mainly Polymorphonuclear leucocytosis. Additional patients investigated for OT antibodies were similar except that the samples were collected during 1-7 days post-onset of clinical symptoms (3.7 ± 0.32 days).

Identification of viral RNA/DNA sequences employing NGS platform
As the use of total nucleic acids for the detection of RNA or DNA viruses yielded unsatisfactory results for both categories, different samples were used for RNA and DNA detection ( Table 2). The samples exhibited presence of a large number of bacterial, fungal contigs that were also detected in control samples and hence were not investigated further (Supplementary Table 1). The detection of Tobacco Mosaic virus was surprising (Table 2). We did not detect enterovirus sequences.

Detection of Human Parvovirus 4 DNA
The first run employing CSF/serum pools identified 12 contigs of Human Parvovirus 4 when different tools were used for analysis (Tables 2 and 3). These were closely related to the sequences reported from CSF of two AES patients from south India [12]. Further, when 45 reads ≥ 100 bp were mapped with the reference PARV4 genome (Accession No HQ593530, 5205 nt), almost complete genome was covered with 8 gaps of 17/19/46/65/217/289/316/338 nt. In the pooled tissue sample (GKP-14), five contigs of Human parvovirus B19 and one contig of PARV4 were identified. Use of two barcoded CSF samples of four each did not yield any data, probably because of lower DNA/sample.

HPARV4 DNA positivity and phylogenetic analysis
Real-time PCR (detection limit 10 copies/reaction) detected HPARV4-DNA in 20/78 (25.6%) CSF and 6/31 (19.3%) serum samples (10 2 -10 10 copies/ml). HPARV4 positivity was comparable during both the years, though the sample size in 2012 was small (18/64, 28.1% and 2/14, 14.3% respectively). HPARV4 DNA was not detected in the serum samples   from the healthy controls from Gorakhpur as well as Pune. Almost full genome sequences were generated for three positives with high viral load (accession numbers KJ541119-21) while partial NS1 was sequenced for additional 24 positives (18 CSF, 6 sera; accession numbers KJ541122-KJ541145). Phylogenetic analysis ( Figure 1) classified all the sequences as genotype 2. Within genotype 2, the Indian sequences from Gorakhpur (this study) and south India [12] formed distinct clusters, the percent nucleotide identity being 98.1 + 0.005%. Genotype 2 was further subdivided into two groups with a percent nucleotide difference Figure 1. Phylogenetic analysis of almost full genomes sequenced during this study (accession numbers KJ541119, KJ541120 and KJ541121). Full genome sequences available in the Genbank database are denoted by the respective accession numbers. Percent bootstrap support is indicated at each node. Genotypes are designated as brackets. Solid circles denote sequences obtained during the present study. Scale bar indicates nucleotide substitutions/site. of 3.8 ± 0.005%. The intra-genotypic difference was 7.4-8.9%. The phylogenetic pattern employing partial NS1 region was identical to full genome-based analysis ( Figure 2).

Testing for IgM and IgG antibodies against Orientia tsutsugamushi
In view of the high positivity of these antibodies among patients investigated during the subsequent years [8][9][10][11], we tested available serum samples for both the antibodies (Table 4). These included 80 patients from 2011 and 12 patients from 2012. High IgM positivity in patient category than in controls was striking (p < .0001). As depicted in Figure 3, the patient population was characterized by high OD values for both the markers, while 88-95% of the control children population was susceptible to OT as evidenced by the absence of IgG antibodies. Of the 31 serum samples tested earlier for HPARV4, 14 could be subjected to ELISA testing for OT antibodies. These included two HPARV4 DNA positives. One each was positive for IgM/IgG and IgG antibodies.

Discussion
Our data reconfirms the use of NGS platforms in the identification of unknown agents. We tried both RNA and DNA isolation-based detection of the suspected agent. However, RNA-based detection could not identify sequences of a possible agent. On the contrary, blast analysis of the contigs generated by the first DNA run identified HPARV4 sequences that were closest to the earlier sequences generated from CSF samples of paediatric encephalitis patients with unknown aetiology from south India [12]. These results encouraged us to evaluate HPARV4 as the possible agent and further NGS runs with additional samples were not carried out. We used real-time PCR [13] with sensitivity of 10 copies/reaction for the detection of viral DNA and could detect the virus in substantial proportion of the CSF samples screened. Importantly, very high viral loads were also detected in a few CSF samples (10 7 -10 9 copies/ml). None of the 25 healthy controls were circulating HPARV4 DNA suggesting low prevalence of this virus in the general population. Of note, HPARV4 from Gorakhpur and south India belong to genotype 2 and form a distinct cluster (Figure 1), strains from US and Germany (except one) form a separate cluster whereas all the African strains belong to genotype 3 [14]. Considering intra-genotypic nucleotide difference of 7.4-8.9%, the difference of 3.8 ± 0.005% between two branches of genotype 2 may be considered as distinct sub-genotypes. It would be worthwhile to explore if genotypes influence epidemiology and/or outcome of PARV4 infection.
Human parvovirus 4 of the family Parvoviridae was discovered in a plasma sample of an adult HIV positive patient from US with an undiagnosed acute infection in 2005 [15]. The epidemiology and disease potential of PARV4 is largely unknown. As against the parenteral route of transmission in the US and certain European countries [15,[16][17][18], and Taiwan [19], non-parenteral transmission routes were shown in African countries [20][21][22][23]. The patients investigated during this study were not at high-risk of parenteral transmission suggesting alternate transmission mechanism(s). Epidemiology of HPARV4 in India is not understood and needs a special attention. Detection of HPARV4 in the CSF samples of AES cases from southern India [12] and Gorakhpur (northern India, this study) with 98.1 + 0.005% nucleotide identity suggests association of this virus with encephalitis in Indian children. Additional studies including serology are needed to assess the contribution, if any, of this virus in causing encephalitis.
In view of a recent report [10] of etiologic association of Orientia tsutsugamushi with AES in Gorakhpur, we reanalysed the NGS sequence data. One contig each were detected in two RNA runs (Rickettsia tsutsugamushi, strain Kawasaki, gene for 16s RNA, length 373 and 241nt). However, the presence of just two contigs was not enough for us to identify this pathogen as the possible etiologic agent. Interestingly, OT serology led to striking findings. The kit used for OT serology does not provide cut off value and asks the customers to determine the same on the basis of initial testing of the normal population and confirmed OT cases. In the absence of serum samples from confirmed OT cases, we tested 92 apparently healthy children sampled during non-AES period and in the same age group for OT antibodies (Table 4 and Figure 3). It was clear that circulation of OT was not very high in the healthy children while almost 70% of the patients exhibited IgM antibodies at high levels. Similar pattern was seen with IgG antibodies as well. Association of OT infection with AES was apparent.
We would like to point out two observations here. Firstly, though during acute phase, majority of the patients circulated high levels of IgM and IgG antibodies. Even patients bled on 1st or 2nd day after the onset of clinical symptoms showed similar pattern. Those negative for IgM antibodies (n = 16) also circulated high IgG levels. High IgG antibodies during acute phase were previously shown to be the secondary infections [24]. Second, one would expect increased seropositivity to OT (IgG positivity) as annual seasonal outbreaks are occurring since 2006. However, the exposure of the paediatric population to OT remains small.
In conclusion, the detection of HPARV4-DNA in the CSF samples of ∼25% of the AES patients from Gorakhpur with 98.1% nucleotide identity with sequences obtained from similar cases from south

Material and methods
This study was approved by the "Human Ethics Committees" of the National Institute of Virology, Pune and the tertiary care hospital, BRD Medical College, Gorakhpur.

Patients and clinical specimens
An AES case was defined as a person with acute (<15 days) onset of fever and a change in mental status (including symptoms such as confusion, disorientation, coma, or inability to talk) AND/OR new onset of seizures (excluding simple febrile seizures) [25]. On admission, detailed history was obtained from the parents and blood sample was taken. Whenever possible, CSF was collected. IgM-anti-JEV negative patients (n = 78, 2011-2012) confirming to the criteria of AES and from whom a CSF sample was collected were investigated. For testing anti-OT antibodies, patients from whom no CSF samples were available were included as no CSF and only a small number of serum samples (n = 21) tested for HPARV4 were available due to small quantities and repeated use.
For HPARV4, we included two types of apparently healthy children controls (age 5-8 years): (1) from Gorakhpur (n = 25) and (2) from Pune, western India wherein AES outbreaks are not reported. For OT, serum samples from 92 healthy children (<5 years = 42; <5-10 years = 50) were examined. These samples were collected earlier for vaccine immunogenicity study during non-epidemic period and stored at −20°C till tested. Twenty-two acute-phase specimens were used for NGS experiments employing Ion torrent Personal Genomics Machine (PGM, Life technologies, USA). Different samples were used for RNA and DNA sequencing.

RNA-sequencing
Total RNA was isolated employing Ultrasense Viral RNA kit (Qiagen, Germany) and concentrated using Ribominus Concentration module (Life technologies, USA). Ribosomal RNA was depleted using Ribominus Eukaryotic kit v2 (Life technologies, USA). Purified mRNA was fragmented using NEBNext® Magnesium RNA Fragmentation Module (New England Biolabs, USA). After adaptor ligation of a library of 200 bp size, cDNA was synthesized and amplified using Ion total RNA sequencing kit v2 (Life technologies, USA) as per manufacturer's instructions. The cDNA library was quantified and size distribution analysed on High sensitivity DNA chip kit on Agilent Bioanalyzer 2100. Emulsion PCR was carried out using the Ion One-Touch™ 200 Template Kit v2 (Life Technologies, USA) according to the manufacturer's instructions. Sequencing of the cDNA libraries were carried out on 316/318 chips using the Ion Torrent PGM system and employing the Ion Sequencing 200 kit (Life Technologies, USA) according to the supplier's instructions. After sequencing, low quality reads and polyclonal sequences were filtered by the PGM software.

Bioinformatics analysis
Sequencing data size varied from 157 Mb to 2.5 Gb.
Real-time PCR for the detection of PARV4 DNA The presence and quantitation of HPARV DNA in CSF/ plasma/serum samples was evaluated using a real time PCR assay. Sequences of the primers and probe used for this assay were; forward primer (Fwd 5 ′ -CTAAG-GAAACTGTTGGTGATATTGCT-3 ′ ), reverse primer (Rev 5 ′ -GGCTCTCCTGCGGAATAAGC-3 ′ ) and Probe 5 ′ -(FAM) TGTTC AACTTTCTCAGGTCCTA CCGCCC -3 ′ ) [13]. Amplification reactions were performed using TaqMan 2X Universal PCR master mix (Invitrogen, USA) as per manufacturer's instruction on Applied BioSystems, 7300 platform. A standard curve was generated from using 10-fold serial dilutions of plasmid DNA containing the 103-bp ORF2 product. Plasmid DNA concentration was determined with a NanoDrop ND-1000 spectrophotometer (Thermo Scientific, USA).
TACCTTCTTCCCACCATACT-3 ′ (internal reverse) primer sets were used with 35 cycles of one minute each of 94°C, 51°/53°C and 72°C for first/nested runs. The PCR products of predicted molecular size were gel eluted (QIAquick gel extraction kit; Qiagen, Hilden, Germany) and sequenced by using BigDye Terminator cycle sequencing Ready Reaction Kit (Applied Biosystems, U.S.A) and an automatic Sequencer (ABI Prism 3100 Genetic Analyzer; Applied Biosystems). All PCR products were sequenced in both directions. Phylogentic analysis was carried out using MEGA software (version 5.2.2) [28]. For the construction of phylogenetic trees (full genomes, partial NS1), the Neighbor-Joining algorithm and the Kimura 2-parameter distance model were utilized. The reliability of the analysis was evaluated by a bootstrap test with 1000 replications.

Testing for IgM and IgG antibodies against Orientia tsutsugamushi
In view of the high positivity of these antibodies among patients from subsequent years [8][9][10][11], ELISAs (The Scrub Typhus Detect IgM and IgG ELISAs, InBios international, Inc, Seattle, USA) were used to screen that available samples according to the manufacturer's instructions. These included serum samples from patients and controls (n = 92 each). CSF samples were not available. Cut off value of 0.5 was used to discriminate reactive and nonreactive specimens [29] Statistical analysis Chi square test was performed for group comparisons. A p value < .05 was considered significant.