Identification of characteristic TRB V usage in HBV-associated HCC by using differential expression profiling analysis

Liver cancer is one of the most common cancers worldwide. CDR3 sequencing-based immune repertoire can be closely associated with cancer prognosis and development. Identifying the specific interaction between the TCR and cellular antigens is important for developing novel immunotherapeutic approaches for the treatment of cancer. The rearranged TCRβ loci amplified using Vβ- and Jβ-specific primers by multi-PCR and sequenced using high-throughput sequencing (HTS) in liver cancers were compared with those of T cells from healthy adult peripheral blood and from adjacent liver tissue. The T-cell repertoires within each tumor show strong similarity to one another but are distinct from those of the circulating T-cell repertoire. In addition, our results demonstrate that there are significant differences in the T-cell repertoires of HCC (hepatocellular carcinoma), ICC (intrahepatic cholangiocarcinoma), and MHC (mixed hepatocellular and cholangiocellular carcinoma). Furthermore, we found that the highly expanded clone (HEC) ratio in blood samples from liver cancer patients differed significantly from those in the blood of healthy adults and hepatitis patients (p < 0.001). The above results suggest that comparison of the T-cell repertoires of tissue and blood could be used to distinguish liver cancer patients from healthy adults and from hepatitis patients. In the future, the diversity of CDR3 sequences in liver cancer may prove to be a useful and novel biomarker for detecting aggressive tumors with high invasive or metastatic capacity.


Introduction
Primary liver cancer is the sixth most frequent cancer globally and the second leading cause of cancer deaths. In 2012, nearly 782,000 new cancer cases (50% in China alone) occurred worldwide and were responsible for 746,000 deaths. 1 The most frequent liver cancer, accounting for approximately 75% of all primary liver cancers, is hepatic carcinoma (HCC) and originates in hepatocytes. Liver cancer can also originate in other structures of the liver, such as the bile duct, blood vessels, and immune cells. Cancers of the bile duct (cholangiocarcinoma and cholangiocellular cystadenocarcinoma) account for approximately 6% of primary liver cancers. 2,3 HCC is the fifth most frequent cancer and the third most frequent cause of cancer mortality worldwide. 4 Most cases of HCC are secondary to either viral hepatitis infection or cirrhosis. [5][6][7] Approximately 5-10% of individuals infected with hepatitis B virus (HBV) become chronic carriers, and approximately 30% of these individuals acquire chronic liver disease that can lead to HCC. 5,8 Cancerous cells often express aberrant peptides that are presented on the surface of cells and can be bound by T-cell receptors on the surface of T lymphocytes, the primary mediators of the cellular adaptive immune response. Healthy adults have approximately 2.5 £ 10 8 distinct TCRs in the peripheral blood, allowing highly specific immune responses to a diverse range of foreign antigens. 9 In the peripheral blood, more than 90% of T cells are ab T cells. 10 The TCRs expressed by ab T cells are heterodimeric proteins that include two polypeptide chains, a and b. Each polypeptide has variable (V), joining (J) and constant (C) regions; b chains also have diversity (D) regions. 11 In TCRs, the highly variable complementarity-determining region 3 (CDR3) is the primary source of antigen-specific recognition; its diversity is generated by rearrangement of the V, D, J, and C regions. The random insertion of non-germline-encoded nucleotides at the junctions of these rearranged segments provides additional diversity. 12 Because the diversity of the TCR repertoire mirrors that of the human immune system, analysis of CDR3 diversity within the TCR is crucial for understanding the basic molecular mechanisms of adaptive immunity in health and disease. 10,13 TCRb sequencing may shed light on mechanisms of cancer immunity. The diversification and selection dynamics of TCR repertoires in healthy individuals and in those with infection, autoimmunity, immunodeficiency or cancer remain poorly understood but have important clinical implications. This method has been applied to a variety of cancers, including ovarian carcinoma, renal cell carcinoma, and cutaneous T-cell lymphoma. [14][15][16] Researchers are currently attempting to identify biomarkers or prognostic factors in the T-cell receptor repertoire to facilitate the early detection, treatment and prognosis of tumors.
In this work, we aimed to analyze the T-cell receptor repertoire in peripheral blood samples and tissues from liver cancer and hepatitis B patients using HTS of the TCRb CDR3 region. Our work focused on producing a comprehensive, unrestricted TCR immunogenetic characterization from the blood and tissue of healthy individuals, as well as those of hepatitis B and liver cancer patients.

Results
Analysis of the profile of TCRb CDR3 in human cells using next-generation sequencing To study the profile of the T-cell b receptor in human cells, primers were designed for multiplex PCR at the TRB V/D/J loci to amplify the CDR3 fragment at the RNA level (Fig. S1). To distinguish the sequences between samples, a 6-bp barcode sequence was added to the 5 0 end of each primer. The total length of all PCR products (PCR product C barcode sequence) was approximately 250 bp (Fig.  S2). The PCR products were purified using magnetic beads. The enriched products were used for library construction and then sequenced at a single-base resolution using the Illumina HiSeq2000 platform. To obtain sufficient coverage of the targeted regions for further analysis, 20 samples were pooled into a single lane. More detail information of the 160 samples was shown in Table S1. In each sequence run, we obtained approximately 300 M reads (average read length of 100 bp) per lane. Using the primer panel, we amplified all CDR3 sequences in 160 samples obtained from healthy adults and from patients with hepatitis, liver cancer, and colon cancer (CC). We obtained a total of 1.038 G raw reads representing 173,367 gigabase pairs from 160 samples. To retain the high-quality reads for further analysis, we filtered sequence reads (Fig. 1, Part 1) according to four strict criteria (described in detail in the Materials and Methods section). We then merged the high-quality paired reads using COPE and FqMerger (BGI) and designated the results as contigs. 17 The contigs were aligned to the reference TCR Vb/Db/Jb gene sequences using BLAST (Fig. 1, Part 2). To ensure high accuracy of the results, the initial alignment results were realigned to the database to find precise V/D/J genes ( Fig. 1, Part 3). A subset of these alignment results were then filtered (the criteria used in this step are described in the Materials and Methods section).
Finally, we obtained a total of 495,708,702 contigs of TCR CDR3 from 160 samples ( Table S2). An average of 3,098,179 contigs was generated per sample. In the end, we identified 48 Vb and 13 Jb segments, of which 48 Vb segments were merged into 23 Vb segments. These results were used to analyze the Vb and Jb usage of CDR3 amino acid clonotypes in each sample (Fig. 2, Fig. S3), and the values of Vb and Jb segment usage was semi-quantitative.

Differential expression profiling of TCRb CDR3 in liver cancers
To identify the differential expression profile of CDR3 in various samples at the mRNA level, we selected all clones in each sample with a frequency of more than 0.1% for further analysis and defined these clones as HECs. The ratio of HECs to total clones was used to compare CDR3 differential expression across samples.
We first assessed the relationship of the HEC ratio for CDR3 or unique CDR3 with the age and gender in the samples from liver cancer patients. Our results indicated that there was no significant difference among these groups (Fig. 3).
We then compared the HEC ratio of blood and tissue samples from healthy adults, hepatitis B patients and liver cancer patients. Basic information of healthy individuals and patients with hepatitis B or CC was summarized in Table S3. Interestingly, the results indicated that the HEC ratio in blood samples from liver cancer patients was significantly different from the HEC ratio in blood samples from healthy adults (p < 0.001) (Fig. 4A). We also observed significant differences between healthy adults and hepatitis B patients (p < 0.001) and between hepatitis B patients and liver cancer patients (p < 0.01) (Fig. 4A). In addition, we compared the HEC ratio in healthy adults, hepatitis B patients and liver cancer patients at the tissue level. We found significant differences between each two groups (Fig. 4B). The above results suggest that comparisons of HEC ratios in both tissue and blood can be used to distinguish liver cancer patients from healthy adults.
We also analyzed the consistency of TCRb CDR3 between tumor tissues and adjacent tissues of three types of liver cancers using the Pearson correlation coefficient method ( Fig. 4B subgraph). Clinicopathologic informations of the 29 patients with liver cancer were summarized in Table S4.The two-sided t-test was then used to assess the consistency of these differences among three types of liver cancers: HCC, ICC, and MHC. For HCC, the tumor tissues and adjacent tissues displayed low consistency in the TCRb CDR3 sequences, possibly indicating the low degree of malignancy of this tumor, whereas for ICC, the consistency was much higher (Fig. 4B subgraph), which may indicate that the cancer cells had already metastasized.

Differential expression in liver cancer patients and colon cancer (CC) patients
To determine whether we could distinguish CC patients from healthy adults in a similar fashion, we compared the HEC ratios of CC from healthy adults and liver cancer patients at the tissue level. There was a significant difference between CC patients and healthy adults (p < 0.05) (Fig. 4C). Interestingly, there was also a significant difference between CC patients and liver cancer patients.

Principal component analysis for differential Vb and Jb genes in HCC patients and in healthy adults
To find the best factor for distinguishing HCC patients and healthy adults, we compared V-J usage, V usage and V merged usage in HCC blood and healthy blood. The 48 Vb segments could be merged into 23 Vb subclasses (Table S5). Of which 23 Vb subclasses were used for V merged subclasses usage, while the 48 Vb segments were used for V and J recombination usage and V usage. 10 out of 20 healthy cases and 10 out of 20 HCCs were random selected as training set, and other 10 healthy samples and 10 HCCs were used as testing set. The results revealed that blood samples from healthy adults and HCC patients can be clearly separated using three types of indexes ( Fig. 5A-C). Compared with VJ and Vmerge, the V usage displayed a higher AUC (the area under the ROC curve) value of 0.92 ( Fig. 5D-F) in testing set by ROC curve analysis and was superior at distinguishing HCC patients and healthy adults. To validate the conclusion, we also used linear discriminant analysis (LDA) for classification analysis (Fig. S4). The resulting data strongly support V usage may be a potential classifier for HCC.

TCR high-throughput sequencing technology show obvious advantages
In this work, we coupled HTS with semi-quantitative multiplex PCR amplification of TCRb CDR3 sequences in mRNA to characterize the basic properties of T cells in hepatitis B and liver cancer patients and to compare those properties with the properties of T cells from adjacent liver tissue or from the peripheral blood of healthy adults. TCR CDR3 diversity has been estimated previously via spectratyping. 18,19 This technique is inexpensive and rapid but ignores the actual sequence content of the CDR3 regions. Traditional Sanger sequencing can be used to determine CDR3 sequence identity, 20 but the cost of this sequencing and the constraints on the number of cells that can reasonably be analyzed makes the approach impracticable for assessing CDR3 diversity. 21 However, HTS technologies have enabled millions of TCR clonotypes to be identified. [22][23][24] The use of ultra-deep TCR-sequencing technology reveals the clonal composition of T-cell populations and has proven helpful in efforts to better understand the immune response in patients with liver cancer.

HEC of T-cell repertoire may be the potential biomarkers for HCCs
In this work, we determined the contribution of HECs with frequencies of over 0.1% to the total T-cell repertoire, whereas Klarenbeek et al. determined the contribution of HECs occurring at frequencies that exceeded 0.5%. These authors observed that 84% of the clones were of low frequency (<0.1% of total TCR analyzed); above this value, the distribution decreased Variance analysis of the HEC ratios in hepatic tissue from healthy adults, hepatitis B patients and liver cancer patients. The subgraph shows a comparison of the consistency between various types of hepatocellular carcinoma and adjacent tissues. The Pearson correlation is shown on the y2 axis, and the type of liver cancer is indicated on the x axis. (C) Variance analysis of the HEC ratio in tissues of healthy adults, liver cancer patients, and CC patients. *p < 0.05, **p < 0.01, ***p < 0.001 according to two-tailed t-tests.
quickly, which it approaches at a clonal frequency of 0.4-0.5%. 25 Through data analysis and comparison, we selected 0.1% as the standard of HECs in our current workflow, which proven to be an effective choice. Comparison of the T-cell repertoires of both tissue and blood could distinguish liver cancer patients from healthy adults or hepatitis patients based on the HEC ratio.
Cancer is a disease for which an early diagnostic would be immediately beneficial. Early cancer detection using cancer biomarkers is an exciting field. However, the complex response of the body to disease makes it difficult to characterize this response using only a few biomarkers. A recent study indicated that some complex heterogeneous diseases could be distinguished from other cancers and from conditions of health using immune markers, a finding that demonstrates the potential power of the immunosignature approach in the accurate, simultaneous classification of disease. 26 In our research, we characterized the basic properties of T cells across multiple dimensions, including Vb-Jb combination usage. We compared peripheral blood samples from HCC patients with samples from healthy adults according to the total Vb-Jb usage analysis and found no significant differences. However, the TRBV18, TRBV4-1, TRBV4-2, and TRBV6-9 displayed higher AUC value, suggesting that these V usages may be potential classifier to distinguish HCCs from healthy adults. The conclusion was confirmed by LDA analysis.
When tumor tissues and adjacent tissues were compared, significant differences were found in TRBV6-4TRBJ1-1, TRBV6-4TRBJ2-2, TRBV6-4TRBJ2-3, and TRBV6-5TRBJ1-6. The observed usage bias of Vb and Jb is likely due to a combination of proximity effects such as recombination signal sequence compatibilities that influence initial TCR development, thymic selection and immune challenges that modify the representation of selected clones in the extant repertoires. 27 Cancers are themselves heterogeneous, and individual response to disease, at a molecular level, can vary considerably. Here, we identified some specific TRBV and TRBJ combinations that distinguish the TCR repertoires of liver cancer patients and healthy adults. These specific TRBV and TRBJ combinations may offer new biomarkers.
Additionally, we determined the contribution of HECs occurring with a frequency of over 0.1% to the total T-cell repertoire. Whether the comparison among liver cancer, hepatitis and healthy adults was performed using blood or liver tissue, liver cancer and hepatitis B patients showed clear differences from healthy adults at the HEC level. And comparison of the HEC ratios in blood could distinguish liver cancer or hepatitis B patients from healthy controls. This provides a possible basis for non-invasive detection of liver cancer. Furthermore, the differences in observations in peripheral blood and tissues between hepatitis and liver cancer patients deserve our attention. We observed significant differences between patients with hepatitis B and liver cancer in blood (p < 0.01) and in tissues (p < 0.001), and these differences indicate that comparison of TCRB CDR3 in tissues is superior to comparison in the blood for the identification of hepatitis B. However, comparing the HEC ratio in blood was able to distinguish liver cancer and hepatitis B patients, thus

Difference of TCR repertoires characteristic in disease and health
We used pairwise comparison to analyze samples of the same type and calculated the proportion of CDR3 amino acid sequences they shared (Fig. S5-S6). We compared the proportion of shared clones in blood samples of healthy adults, hepatitis B and liver cancer patients (Fig. S5). The TCRb repertoire of healthy adults is more diverse than that of patients with diseases. Comparison of blood and tissues of hepatitis B patients suggests that fewer immune cell species types were found in the tissue than in blood with the same initial amount of RNA (200 ng).
The shared unique clones of all blood samples of HCC and CC patients were also analyzed. Out of 82119, 51.36% (42178/ 82119) derived from healthy adults. While there was only 48.64% (39941/82119) shared clones between HCCs and CCs (data not shown). Pairwise analysis displayed the clone number ranged from 617 to 3247 among HCCs (Fig. S6A), 187 to 2473 among CCs (Fig. S6B), 556 to 2200 among health adults (Fig.  S6C), suggesting that CDR3 clone has a strong heterogeneity between different individuals.
T cell recognition of a particular antigen needs the presentation of an HLA molecule and differences in HLA types may influence TCR repertoires. Warren et al. inferred that HLAmatched individuals may display increased TCRb CDR3 repertoire overlap, which suggesting an influence of HLA type on T-cell repertoire features. 28 But other deep profiling studies of unrelated subjects or monozygotic twins suggest that repertoire overlap between individuals is generally independent of HLA type. [29][30] In the current results, we didn't detect the HLA type for individuals who participated in this study. In the future, we will simultaneously detect HLA type and TCR sequencing and study the relationship of HLA and TCR repertoires.
The ability to mount a protective immune response depends on the diversity of T cells, and the aging process threatens this diversity. [31][32][33] A recent study indicates that TCRb CDR3 diversity declines throughout life. 34 The authors of that study directly quantified and compared T-cell repertoire diversity in samples obtained from 39 healthy donors 6-90 years of age using an advanced deep TCRb sequencing approach. In our study, we observed no significant change in TCR diversity with age. One likely reason for this discrepancy in the findings is intrinsic differences in the samples used for comparison; in our case, the samples were obtained from liver cancer patients, in whom TCRb diversity had already decreased. Thus, the increase in the proportion of expanded clones that occurs with age was hidden by the presence of disease. The infiltration of human tumors by T cells is a common phenomenon, and the nature of these intratumoral T-cell populations can predict the course of disease. 35 Tumor infiltrating lymphocyte in cancer In this research, we analyzed the consistency of the TCR repertoires of tumor tissues and adjacent tissues in HCC, ICC and MHC patients. In HCC and ICC patients, the consistency of tumor tissues and adjacent tissues was significantly different (p < 0.01), a finding that may be associated with the range of malignancy observed in these tumors and that emphasizes the reliability of our detection method and demonstrates its potential power in the classification of liver cancer. Two patients, referred to as 'a' and 'b' here, for whom the consistency values were far from the average values of their respective groups, caught our attention. Patient 'a' presented a poorly differentiated HCC that was classified as grade III according to Edmondson-Steiner's classification, 36 whereas ICC patient 'b' presented a highly differentiated mucinous adenocarcinoma according to histopathological examination. These observations suggest that our results are consistent with diagnoses made by histopathological examination.
Recent evidence suggests that in colorectal and ovarian carcinoma patients, the presence or absence of tumor-infiltrating lymphocytes (TILs) provides a strong prognostic marker for survival independent of current staging methods. 37,38 The importance of TILs in the prognosis of melanoma patients has also been known for many years. 39 It has been postulated that biomarkers could be developed to capture the TIL response for both cancer prognosis and the prediction of therapeutic response. Many groups are testing various types of TIL measurements as predictive biomarkers for immunotherapeutic responses. 40,41 T cells directed toward cancer cells can significantly impact clinical outcome, as shown by a number of studies that have found a correlation between increased numbers of TILs and improved survival in a variety of tumor types. 42 In conclusion, high-throughput analyses of the TCR repertoire of a tissue can be performed using a deep sequencing platform, and these analyses help provide a better understanding of the immune response in liver cancer patients in whom the multi-faceted T-cell response is comparable to that found in healthy volunteers and HBV-infected patients. Further studies are needed to clarify the functional basis of TCRb CDR3 clonalities underlying the persistence and/or eradication of cancer cells. The diversity of CDR3 sequences in liver cancer tissue may be a novel biomarker for detecting aggressive tumors with high invasive or metastatic capacity.

Materials and Methods
Patients and sample collection Peripheral blood of healthy adults and patients was collected in PAXgene blood RNA collection tubes. All HCC tissue specimens were obtained from patients who underwent surgical resection for their tumors and who provided informed consent prior to liver surgery. The primary tumor specimens were immediately frozen at ¡80 C until RNA extraction. Specimens (approximately 1 cm 3 ) of the tumor and adjacent liver tissue were collected from each patient, and the diagnosis of HCC was confirmed through pathological examination. This project and protocols involving human and animal tissues were approved by the ethics committee of the Chinese National Human Genome Center.

RNA extraction
The PAXgene Blood RNA System* was used as a blood collection tube (Becton, Dickinson and Company, USA). Total RNA was extracted from blood samples using a nucleic acid purification kit (PAXgene Blood RNA Kit) according to the manufacturer's instructions. To reduce the risk of genomic DNA contamination, 1-2 mg RNA was incubated with 2 U DNase I (Invitrogen, Carlsbad, CA, USA), 1 ml DNase buffer and 0.4 ml RNaseOut for 15 min at room temperature. The RNA concentration of the sample was determined using spectrophotometry, and the total RNA integrity was examined by visualization of the 28S and 18S RNA transcripts on a 1.2% agarose gel. The quality of RNA was evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, USA).

Library construction
In this study, we used the HTBI primers and Arm-PCR from iRepertoire to construct the libraries including PCR1 and PCR2, inclusively and semi-quantitatively. During the first round of PCR1, only 15 cycles were used to amplify CDR3 fragments using the specific primers against each V and J genes. And in the second round, PCR was performed using universal primers.
PCR1: RNA reverse transcription and amplification of the T-cell receptor b CDR3 using the HTBI primers (Huntsville, Alabama, America) was carried out using Qiagen OneStep RT-PCR. The first round of PCR was performed using 200 ng of total RNA mixed with 4 ml random iRepertoire primers, 5 ml 5 £ buffer, 1 ml dNTP mix, 0.25 ml RNasin (40 U/ml), and 1 ml enzyme mix, with nuclease-free water added to reach a total volume of 25 ml. After mixing and centrifugation, the reactions were transferred to a thermal cycler that carried out the following program: one cycle of 50 C for 40 min; one cycle of 95 C for 15 min; 15 cycles of denaturation at 94 C for 30 s, annealing at 60 C for 40 min, and extension for 30 s at 72 C; 10 cycles of denaturation at 94 C for 30 s, annealing and extension at 72 C for 2 min; and a final extension at 72 C for 10 min. The samples were then held at 4 C.
PCR2: A 2 ml sample of the PCR1 product was used as template for a second step of amplification following the addition of 5 ml communal primers, 25 ml Multiplex MM prepared using the Multiplex PCR Kit (Hilden, Nordrhein-Westfalen, Germany), and 18 ml nuclease-free water to reach a total volume of 50 ml. The reactions were then transferred to a thermal cycler that carried out the following program: one cycle of 95 C for 15 min; 40 cycles of denaturation at 94 C for 30 s, annealing at 55 C for 30 s and extension at 72 C for 30 s; and final extension at 72 C for 5 min. The samples were then held at 4 C. Size selection was used to purify 250-bp PCR products on magnetic beads (Agencourt No. A63882, Beckman, Beverly, MA, USA). After gel purification, the PCR product was subjected to HTS using the Illumina Hiseq2000 platform.

Sequencing using the Illumina Hiseq2000
We used the same workflow described elsewhere to perform cluster generation, 43 template hybridization, isothermal amplification, linearization, blocking and denaturation and hybridization of the sequencing primers. Paired-end sequencing of samples was carried out with a read length of 100 bp using the Illumina Hiseq2000 platform.

Analysis of Illumina sequence data
We amplified all CDR3 sequences present in 160 samples obtained from healthy adults and patients with hepatitis, liver cancer and CC. To reduce the impact of sequencing errors, we first filtered the sequence reads according to four strict criteria, removing the following: (1) reads contaminated by the adapter sequence; (2) reads with more than 5% uncalled bases (N); (3) reads with an average Phred-type Q-score <15; and (4) PE reads with low-quality base readings(Q-score <10)at the ends of reads or short reads (Reads1 length <60 bp; Reads2 length <50 bp). We then merged the high-quality paired reads using COPE and FqMerger (BGI, S henzhen, China), designating the results as contigs. The contigs were subsequently aligned to reference TCR Vb/Db/Jb gene sequences (http://www.imgt.org/download/ GENE-DB/) using BLAST. We referenced directory sets of sequences containing the human TRB V-REGION, D-REGION, and J-REGION alleles (http://www.imgt.org/). The TCRb CDR3 regions were identified within the sequencing reads according to the definition established by the International ImMunoGeneTics (IMGT) collaboration. 44 Finally, we filtered the alignment results to remove the following: (1) low frequency contigs for which the number of the supported reads was less than 2; (2) contigs that failed to match V or J reference sequences; (3) contigs merged onto the Vb and Jb in the opposite direction; (4) contigs containing stop codons; and (5) contigs for which the length of the CDR3 sequence was not a multiple of 3. In the end, we identified 48 Vb and 13 Jb segments, of which 48 Vb segments were merged into 23 Vb segments. These results were used to analyze the Vb and Jb usage of the CDR3 amino acid clonotypes in each sample.

Statistical analysis
Clones with a frequency that exceeded 0.1% were considered to be HECs. The Pearson correlation coefficient (r) was used to measure the linear correlation between pairs of variables. Comparisons between groups were performed using two-tailed t-tests. Two-sided p values <0.05 were considered statistically significant.

Principal component analysis
We performed principal component analysis by using the fast. prcomp function in the package 'gmodels' in R. 10 out of 20 healthy cases and 10 out of 20 HCCs were random selected as training set, and other 10 healthy samples and 10 HCCs were used as testing set. We first calculate the relative abundance of VJ (V and V merge) of each sample, and selected differential genes with p values less than 0.01 (t-test) in training set for PCA to determine the coefficient of linear combination of the first principal component and the second principal component. Then we calculate the value of linear combination of the PCA1 and PCA2 of the samples from testing set according to the determine coefficient of linear combination of training set. The corresponding values of PCA1 of testing set were used for further ROC curve estimation, assessing the generalization of the classifier.
Linear discriminant analysis LDA was used to evaluate separability of the two subject cohorts (HCC and control) of testing set using V usage. 10 out of 20 healthy cases and 10 out of 20 HCCs were random selected as training set, and other 10 healthy samples and 10 HCCs were used as testing set. We first calculate the relative abundance of V of each sample and selected differential V genes (p < 0.01) of training set to construct model by using LDA module in R. Then, we used predict function to calculate the corresponding LD1 value of testing set for validation.

Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.