Identification and validation of core genes in tumor-educated platelets for human gastrointestinal tumor diagnosis using network-based transcriptomic analysis

Abstract Gastrointestinal (GI) tumors have increasing incidence worldwide with their underlying mechanisms still not being fully understood. The use of tumor-educated platelets (TEPs) in liquid biopsy is a newly-emerged blood-based cancer diagnostic method. Herein, we aimed to investigate the genomic changes of TEPs in GI tumor development and their potential functions using network-based meta-analysis combined with bioinformatic methods. We used a total of three eligible RNA-seq datasets, which were integrated using multiple meta-analysis methods on the NetworkAnalyst website, and identified 775 DEGs (differentially expressed genes; 51 up-regulated and 724 down-regulated genes) in GI tumor relative to healthy control (HC) samples. These TEP DEGs were mostly enriched in bone marrow-derived cell types and associated with gene ontology (GO) of “carcinoma” and could affect pathways of “Integrated Cancer Pathway” and “Generic transcription pathway” respectively for highly and lowly expressed DEGs. Combined network-based meta-analysis and protein–protein interaction (PPI) analysis identified cyclin dependent kinase 1 (CDK1) and heat shock protein family A (Hsp70) member 5 (HSPA5) to be the hub genes with the highest degree centrality (DC), being up-regulated and down-regulated in TEPs, respectively. GO and Kyoto Encyclopedia of Genes and Genomes (KEGG) results showed that the hub genes were primarily related to cell cycle and division, nucleobase-containing compound and carbohydrate transport, and endoplasmic reticulum unfolded protein response. Additionally, the nomogram model suggested that the two-gene signature owns extraordinary predictive power for GI tumor diagnosis. Further, the two-gene signature was demonstrated to have potential value for metastatic GI tumor diagnosis. The expression levels of CDK1 and HSPA5 in clinical platelet samples were verified to be consistent with the results from bioinformatic analysis. This study identified a two-gene signature encompassing CDK1 and HSPA5 that can be used as a biomarker for GI tumor diagnosis and maybe even cancer-associated thrombosis (CAT)-related prognosis. Plain Language Summary What is the context? Gastrointestinal (GI) tumors are now responsible for the majority of cancer-related mortalities worldwide.In the majority of cases of cancer, curative treatments are not recommended at the time of diagnosis. In this case, early screening and diagnosis is very important for overall tumor prognosis. Liquid biopsy emerged as a newly introduced minimally invasive approach for cancer diagnosis by detecting blood analytes as tumor-educated platelets (TEPs). Compared to tissue-based biopsies, liquid biopsies are less invasive, easy to access, convenient for serial tracking and better in eliminating intratumoral spatial heterogeneity. In recent years, specific gene signatures have been identified for cancer diagnosis, prognosis and prediction based on gene profiling data of TEPs. However, most of these studies were performed on the independent platelet profile datasets published on the Gene Expression Omnibus (GEO) database, which may harbor enormous heterogeneity. Additionally, few study revealed TEP mRNA functions and roles in GI tumors. Therefore, there’s the need of using an integrated method to re-analyze these data, so we can gain new insights for GI tumor diagnosis. What is new? Herein, through network-based RNA-seq meta-analysis, we identified the CDK1-HSPA5 signature in TEPs that has the potential as a biomarker for diagnosing GI tumors. This is the first time, to our knowledge, that a shared transcriptional signature of tumor-educated platelets has been identified in human GI tumor patients based on meta-analysis. Additionally, we found the two-gene signature has potential value for metastatic GI tumor diagnosis. We also demonstrated that HSPA5 may have different roles in blood and tumor cells, so its expression deregulation in distinct types of tissue may have opposing diagnostic and prognostic values. What is the impact? Our work provides a novel biomarker for platelet-based GI tumor prediction and diagnosis, which may also be used as novel targets for thrombosis prevention during cancer development in the future.


Introduction
Nowadays, the global burden of cancer continues to be one of the greatest public health problems. 1Cancers of the digestive tract, for example, colorectal cancer (CRC), are now responsible for the majority of cancer-related mortalities. 2The median age at diagnosis for GI tumors has been reported to be 71 years, 3 which could dramatically affect the patient's survival.In the majority of cases of cancer, curative treatments are not recommended at the time of diagnosis. 4][7] Liquid biopsy is a newly-emerged minimally invasive approach for cancer screening, diagnosis, and follow-up disease monitoringthat involves the isolation and analysis of circulating tumor markers from peripheral blood. 8Compared to tissue-based biopsies, liquid biopsies are less invasive, easy to access, convenient for serial tracking and better in eliminating intratumoral spatial heterogeneity. 9,104][15] Platelets "educated" by cancer undergo mRNA and protein profile modification and changes in their number and size, thus being reported to be a promising blood-based biomarker of cancer.0][21][22] Apart from applying powerful algorithms to a single platelet profile dataset, whether they are original [23][24][25][26] or publicly available, [27][28][29] for biomarker screening, with the help of integrative bioinformatic methods, gene-panels with diagnostic potential have also been generated by integrating already published datasets.However, these genes were selected by a direct overlap of different datasets, 13,30 which may harbor enormous heterogeneity.Additionally, few studies revealed TEP mRNA functions and roles in GI tumors.Therefore, there's the need of using an integrated method to re-analyze these data, so we can gain new insights for GI tumor diagnosis.
Compared to directly intersecting RNA-seq data from different data repositories to obtain DEGs, combination of multiple datasets can increase sample size, therefore increasing the reliability and generalizability of results."Meta-analysis" is the process of combining results from independent but related studies using statisticaltechniques. 31Using transcriptome data from multiple studies, a meta-analysis can reveal robust molecular signatures, improve reproducibility, or discover more reliable biomarkers.NetworkAnalyst (http:// www.networkanalyst.ca) is a comprehensive on-line tool that allows users to perform common and complex gene expression metaanalyses, which include data integration, statistical meta-analysis, PPI network, and enrichment analysis, via a standard web browser. 324][35][36] Notably, as mature human platelets are anuclear, what we are referring to as gene "expression" or differential "expression" here is actually the "splicing" process of pre-mRNAs in platelets. 20,37hree RNA-seq datasets for tumor-educated platelets in human GI tumors that meet the criteria for this investigation have been chosen.This is the first time, to our knowledge, that a shared transcriptional signature of tumor-educated platelets has been identified in human GI tumor patients and HCs based on metaanalysis.Our results identified a two-gene signature for plateletbased GI tumor prediction and diagnosis and suggested novel targets for thrombosis prevention during cancer development.

Search strategy and selection of RNA-seq data
From the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/),gene expression information was retrieved by searching for "tumor" AND "platelets" AND "Homo sapiens" AND "series," and BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/) by searching for "tumor" (All Fields) AND "platelets" (All Fields).The datasets were then selected for inclusion using the following criteria: (a) the thrombocytes must be derived from human blood samples; (b) both tumor patients and healthy individuals must be included in each study; (c) the kind of tumor must be gastrointestinal tumor; (d) each group must contain at least three samples; and (e) must contain read count data from bulk RNA sequencing.The following analysis was performed on a total of three studies (GSE68086, PRJNA737596, and GSE160252; Table I) that satisfied the inclusion requirements.All the retrieved datasets were high-throughput sequencing gene expression datasets.

Data extraction and processing
The processed files of clean reads were downloaded from all included studies.We checked the sample names from the files and patient information tables provided by the authors, making sure that the final 322 samples we obtained were non-duplicate.Before meta-analysis, all datasets were respectively uploaded to the "Gene Expression Tables" module on the webpage of NetworkAnalyst.Firstly, all gene IDs were converted into the Entrez ID format.Individual datasets went through variance filtering, abundance filtering, and unannotated genes were filtered out.Then, datasets were normalized by log2 transformation, and the normalized files were downloaded for follow-up analysis.The Limma method was used to analyze the differential expression of individual datasets.A specific comparison was made between cancer group and HCs.Using the false discovery rate (FDR), the P values were adjusted.Log2 fold change of 1.0 and a cutoff P value of .05 were used to determine statistical significance.Volcano plots were produced and downloaded from NetworkAnalyst for every dataset to better visualize the DEGs.The intersection of the obtained significant genes from three independent studies was visualized by Venn diagram using Sangerbox online tools (http://sangerbox.com/AllTools?tool_id=9736766).

Batch effect adjustment and meta-analysis
We conducted batch effect adjustment and meta-analysis using NetworkAnalyst 3.0 (https://www.networkanalyst.ca/NetworkAnalyst/home.xhtml).All normalized datasets were uploaded to the "Multiple Gene Expression Tables" module on NetworkAnalyst, then processed, annotated, and checked for distribution and normalization to guarantee that the class names and data format are consistent across all datasets.Afterwards, batch effects were eliminated using the tried-and-true ComBat methods on the data.To evaluate the effectiveness of batch effect removal, the sample clustering patterns with and without batch effect modification were compared using principal component analysis (PCA).After a dataset integrity check, Fisher's method and a random effect model (REM) were selected for meta-analysis (combined p < .05 was considered to be significant).The intersecting genes of these two methods were visualized using a Venn diagram drawn by Adobe Illustrator 2020.Heatmaps for the top 10 up-and downregulated genes were downloaded from NetworkAnalyst for visualization.

PPI network analysis
PPI network-based analysis was performed on the webpage of NetworkAnalyst using STRING Interactome.We used zero-order network which only keeps seed proteins that directly interact with each other to avoid overall presentation of the structure.Using the approach of DC, hub genes were screened based on the core subnetwork constructed, other than which smaller subnetworks with nodes >10 were also mapped.From the core subnetwork, we used the "Module Explorer" tool to extract densely connected modules within interaction networks that contain hub genes, based on the random walk algorithm WalkTrap, through which we can learn the potential cancer-associated phenotypes our hub genes are functionally relevant to. 38

Functional annotation and pathway enrichment analysis
Functional enrichment analysis of the DEGs screened from metaanalysis was performed on the Enrichr website (https://maayanlab.cloud/Enrichr/).The GO biological process (BP) enrichment analysis of modules and subnetworks was conducted using the "Function Explorer" tool on NetworkAnalyst after the construction of the PPI network.To further examine the probable roles of the identified hub genes and better visualize the results, GO enrichment and KEGG pathway analysis were carried out using Cytoscape (v3.9.0) plug-ins biological networks gene ontology program (BiNGO) 39 and ClueGO. 40P < .001was considered significant when using BiNGO to analyze GO BP, cellular component (CC), and molecular function (MF).P < .05 was considered significant when using BiNGO to analyze GO.

Construction and validation of a nomogram model for tumor diagnosis
By using "rms" package in R, we establish a nomogram model based on the normalized data from GSE68086 to predict the occurrence of GI tumors."Points" indicates the score of the corresponding variable."Total Points" indicates the cumulative score for all the variables.The decision curves for the three prediction models are shown in decision curve analysis (DCA).X-axis indicates the threshold probabilities and Y-axis indicates the net benefit.The "Number high risk" curve indicates the number of samples which are classified as positive by the model at each threshold probability; the "Number high risk with event" curve is the number of true positives at each threshold probability.Then calibration curve was used to assess the predictive power of the nomogram model.At last, the clinical value of the model was evaluated using decision curve analysis and clinical impact curve.

Ethics statement
This study was approved by the Ethics Committee of the Second Xiangya Hospital (2018(Yan149)).

Platelet RNA extraction and quantitative real-time PCR (qRT-PCR)
A total of 7 asymptomatic healthy volunteers, 10 symptomatic patients, and 15 CRC patients were recruited for this study.The healthy controls were chosen from the general public and had no history of or symptoms of serious illnesses.The symptomatic

Statistical analysis
The meta-analysis was carried out and the data was examined using Fisher's method and REM-based combined ES method (p < .05)via NetworkAnalyst.To conduct the statistical study, GraphPad Prism version 7 software was employed.In the study of the association between hub genes and cancer metastasis, expression data of each gene in GSE68086 were divided into three groups (HC: healthy control, n = 55

RNA-sequencing datasets included in the meta-analysis
A total of three datasets from GEO and BioProject databases were included in the meta-analysis, which were GSE68086, PRJNA737596, and GSE160252.After data extraction and normalization according to the "Methods" section above, we identified a total of 322 samples encompassing 235 gastrointestinal tumor and 87 normal samples (Table I).Among the disease samples, there are 176 from CRC, 46 from PAAD, and 13 from HBC.The RNA-sequencing of these samples was performed on the Illumina platform and all the samples were from blood thrombocytes.

Overlap of differentially expressed genes among datasets
Firstly, we performed a comprehensive analysis across datasets by calculating the DEGs for each dataset (Figure 1a-c) and examining the overlap of the significant outcomes.Twenty genes are significantly associated with GI tumor among the three datasets (Figure 1d and Table S1).Most of these genes have been implicated to be involved in tumor development.
However, the exact functions of genes such as OSGEPL1, P4HTM, PUSL1, and TMTC4 are still not clear.Notably, we observed that according to the log2FC values in Table S1, ALAS2, CCDC102B, CCDC141, CCDC152, CD8A, CHI3L1, GGT1, GZMH, HPD, LCK, MRPL39, OSGEPL1, and TMTC4 showed inconsistent changing trends among datasets.Of note, inconsistence is not likely caused by joining CRC, PAAD and HBC groups together, for through additional testing, it still exists when we analyzed the data separately.Therefore, in face of such controversy, meta-analysis should be carried out to draw comprehensive conclusions.

Batch effect adjustment and meta-analysis of the eligible datasets
Before performing the meta-analysis, we made a correction for the batch effect using the ComBat option on the NetworkAnalyst website, a web interface for integrative meta-analysis.Prior to using the batch effect correction, the PCA reveals that the three datasets are clearly disassociated (Figure S1A).However, following the Combat approach, the samples become evenly distributed, which enhances the accuracy of the findings (Figure S1B).Then, the three datasets were integrated and analyzed using the "Multiple Gene Expression Tables " option on the NetworkAnalyst website.The merged data of this meta-analysis is listed in Table S2. Figure 2a depicts the whole meta-analysis procedure used in this investigation.The "Fisher's method" and "Random effect model" were used to produce more consistent and accurate results by taking into account both the direction and magnitude of changes in gene expression.By employing "Fisher's method" and "Random effect model" meta-analysis methods, we identified 2421 and 775 DEGs, respectively (Combined p < .05 was considered to be significant).Among them, 775 genes were jointly uncovered by both strategies, which is shown in the Venn diagram (Figure 2b).Of these 775 DEGs, 51 genes are significantly up-regulated and 724 are significantly down-regulated in the platelets of tumor patients as compared to the controls.A complete list of the  DEGs can be seen in Table S3.Among them, the gene with the greatest upregulation in the three datasets is CD96, while the one with the greatest downregulation is KLRB1.CD96, together with TIGIT and co-stimulatory receptor CD226, are newly discovered immune checkpoints expressed on natural killer (NK) cells and CD8+ T cells and hold great potential as targets for cancer immunotherapy. 41,42A heatmap visualization of the top 10 up-and down-regulated genes across the different studies is displayed in Figure 2c.

Functional enrichment analysis
To further study the potential functions of these DEGs, we carried out enrichment analysis on the Enrichr website.First, we studied the cell types and ontologies enriched by the 775 DEGs.Our result shows that the DEGs most significantly enriched in Myeloid cells in Placenta (p = 3.61e −03 ; Figure 3a).Notably, among the enriched cell types are a number of myeloid cells, which confirms the obtained DEGs are of blood origin.The DEGs are mostly enriched in the ontology of carcinomas (p = 6.0e −11 ; Figure 3b).Then, we studied pathway enrichment of the up-regulated and downregulated DEGs, respectively.For the 51 up-regulated DEGs, the mostly enriched pathway is Integrated cancer pathway (p = 9.65e −05 ).For the 724 down-regulated DEGs, the mostly enriched pathway is Generic transcription pathway (p = 4.21e −08 ).

Identification of hub genes using protein-protein interaction network analysis
Using NetworkAnalyst, we looked for hub genes that may serve as biomarkers for GI tumors.STRING interactome database was selected to conduct the following analysis. 43Genes or proteins engaged in the same biological processes are frequently found in individual modules. 44Module identification, also referred to as Core genes in tumor-educated platelets of gastrointestinal tumor 7 community detection or graph clustering, is a method that has been widely applied in network studies.Therefore, identifying modules is important to gain insights into the biological processes of carcinogenesis. 45To better visualize the results, we conducted "zero order" protein-protein interactions (PPI) network analysis on the 775 DEGs which yields 64 nodes and 99 connection edges in the core subnetwork.In order to more clearly visualize this network, we organized the layout using a force atlas (Figure 4a).Betweenness (BC) and DC were shown in Table S4.Hub genes in the network were ranked by degree.We screened out two hub genes that had the degree of connectivity > 10 in the network, which were CDK1 (upregulated, BC = 1206.58;DC = 11) and HSPA5 (downregulated, BC = 579.08;DC = 11).
Using the module explorer tool, we then extracted the most important modules within the network (p < .05).The largest subnetwork is composed of five major modules (Table S5).The modules encompassing CDK1 and HSPA5 are presented in Figure S2.These modules were individual protein complexes, and it is likely that they interacted with one another to carry out biological processes as a whole.The genes in module 1 which contains HSPA5 are enriched for processes related to "nucleobase-containing compound and carbohydrate transport " and "endoplasmic reticulum unfolded protein response" according to the GO study of these modules using the "Function Explorer" tool.The genes in modules 2 which contains CDK1 are enriched for "cell cycle and division" (Table S6).
Further, after mapping the biggest PPI subnetwork (nodes 64, edges 99, seeds 64), we mapped the other three smaller subnetworks (nodes>10) (Figure 4b-d).Similarly, we performed GO analysis on these subnetworks using the "Function Explorer" tool, through which we found that the genes in subnetwork 2, subnetwork 3 and subnetwork 4 were mainly enriched for "ribosome biogenesis," "positive regulation of multicellular organismal process," and "insulin receptor signaling pathway," respectively (Table S6).

Study of the hub gene functions
Next, we applied GO analysis and KEGG pathway enrichment to the 64 hub genes in the core subnetwork to further study their functions.The GO results from the Cytoscape plug-in BiNGO demonstrate that the 64 hub genes are mainly enriched for the BP of "cellular response to stimulus" (p = 5.05e −16 ; Figure 5b), for the CC of "nuclear part" (p = 1.03e −16 ; Figure 5a), for the MF of "protein binding" (p = 6.22e −09 ; Figure 5c).According to the KEGG pathway enrichment results from the Cytoscape plug-in ClueGO, the 64 hub genes are mostly enriched in the pathway of "Nucleocytoplasmic transport" (Figure 6a-b).

Construction and assessment of a nomogram model gastrointestinal tumor diagnosis
A nomogram is a graphical representation of a mathematical model established based on multi-factor regression analysis, in which biologic or clinical variables are jointly evaluated to predict a specific endpoint for a given individual. 46"Rms" package was used to establish a nomogram model for GI tumor diagnosis based on the two hub genes (CDK1 and HSPA5; Figure 7a).Then, the calibration curve was generated to evaluate the predictive power of this model.The calibration curve indicates that the actual tumor risk is highly consistent with the predicted risk, and the error between them is very small, suggesting the nomogram model owns high accuracy in predicting tumor (Figure 7b).DCA is a method for evaluating the clinical benefit of different predictive models.In DCA, the "nomogram" curve is higher than the gray lines, the "CDK1" curve, and the "HSPA5" curve, suggesting that the nomogram model has the largest net income under different threshold probabilities and better clinical benefit than the other models (Figure 7c).The clinical impact curve (CIC) was drawn on the foundation of the DCA curve to assess and more clearly depict the clinical impacts of the nomogram model.When the high risk threshold is greater than 0.8, The "Number high risk" curve coincides well with the "Number high risk with event" curve, indicating that the nomogram model has a low false-positive rate and a great predictive performance (Figure 7d).These results indicate that TEP CDK1 and HSPA5 hold great potential as individualized risk factors to be considered in clinical decision making regarding the prediction of GI tumors.

Association of the two-hub-gene signature with cancer metastasis
Next, to investigate if the gene signature is associated with the metastatic status of cancer, we used the data from GSE68086 to examine the expression profiles of CDK1 and HASPA5 in groups of patients with different metastatic status of GI tumor (Figure 8a).In GI tumors, there is a strong correlation between the expression levels of CDK1, HASPA5 and cancer metastasis.CDK1 expression shows gradual increase and HASPA5 expression shows gradual decrease from HCs to localized cancer to metastatic cancer.We noticed that both genes demonstrate significant expression difference between HCs and metastatic tumor patients.However, this gene signature could not potently distinguish groups of HC and localized GI tumor or localized and metastatic GI tumor.A ROC analysis was also performed to further investigate the diagnostic performance of the two genes.The result of diagnostic analysis demonstrates that both CDK1 (AUC = 0.7096, p < .0001)and HSPA5 (AUC = 0.741, p < .0001)have diagnostic values for metastatic GI tumor (Figure 8c), whereas only HSPA5 (AUC = 0.7229, p = .0028)has a diagnostic value for localized GI tumor (Figure 8b).

Validation of the two-gene signature in clinical samples using qRT-PCR
To confirm the reliability of this two-gene signature identified using public datasets, we collected the platelets from 7 asymptomatic healthy volunteers, 10 symptomatic patients, and 15 CRC patients to further test the expression of CDK1 and HSPA5.Their relative expression levels were detected using qRT-PCR.Collectively, the result shows that both CDK1 and HSPA5 have differential expression in TEPs between cancer group and asymptomatic control group, which is in consistence with our bioinformatic analysis; however, they are not differentially expressed in TEPs between cancer group and symptomatic GI disease group (Figure 9).

Discussion
This work identifies possible diagnostic biomarkers for GI malignancies utilizing a network-based integrated transcriptomic analysis, which may also help in understanding the molecular pathways and malignant traits of tumors in the primary site.To The ontology and cell type enrichment results demonstrate that the TEP DEGs are mainly bone marrow-derived and closely related to carcinoma, which strongly verifies the validity of our integrative study.In relation to pathways, the enrichment results reveal that the up-regulated DEGs are primarily enriched in "Integrated cancer pathway," and the down-regulated DEGs are primarily enriched in "Generic transcription pathway," both of which are closely related to cancer.Based on the above results, these TEP DEGs are reliable sources for us to carry on next-step analyses.
Next, the association between the proteins encoded by these TEP DEGs was investigated using PPI network analysis.We find that among the TEP DEGs of the core subnetwork, CDK1 and HSPA5 have the highest degree values.Thereby, this two-gene diagnostic signature may be used for GI cancer detection.Cyclindependent kinase (CDK1), also known as cell division control protein 2 (CDC2), is essential for directing the cell cycle in all cell types. 47Heat Shock Protein Family A (Hsp70) Member 5 (HSPA5), also named Glucose-Regulated Protein 78 (GRP78) or immunoglobulin heavy chain binding protein (BiP), is a chaperone heat shock protein that expresses in all eukaryotes on the membrane of Endoplasmic Reticulum (ER). 48GRP78 is overexpressed on the membranes of many cancer cells, which makes GI malignancies like PAAD and CRC more aggressive. 49Although according to previous studies, 50 HSPA5 is aberrantly activated in tumor tissues compared to normal control, however, its expression changes in TEPs seems to be the opposite as revealed by our study.Therefore, here we discover a novel two-gene signature specifically in platelets for GI tumor diagnosis.
Thrombocytosis happened in tumor patients is often termed as cancer-associated thrombosis (CAT).CAT is often observed in gastrointestinal, lung, breast, and ovarian cancers, 17 and has been associated with increased platelet counts in the stages whether before 51 and during 52,53 cancer development. 54The activities of CDKs, which are crucial for proliferation and development, are dysregulated in a number of cancer types.A recent study implicated an important role of CDK2 and CDK4 in rapid platelet generation following stress. 55Therefore, the elevated CDK1 expression in tumor platelets observed by our study may be an indicator of stimulated hematopoiesis and increased platelet number in cancer patients, and can therefore be associated with increased risk of CAT.We also find that HSPA5 is involved in protein processing in endoplasmic reticulum pathway as well as in multiple metabolic processes.7][58] Therefore, we hypothesize that the two-gene signature may not only serve as a biomarker for GI tumor diagnosis, but also be an indicator of CAT, and participate in CAT development by inducing changes in platelet numbers or the overall activation status.
We then utilized the two TEP biomarkers CDK1 and HSPA5 to construct a nomogram model for GI tumor prediction.It was found that this simple model owns great predictive power and accuracy.This is the first time that a study shows CDK1 and HSPA5 in platelets can work together as variables toward cancer prediction.
The onset of cancer metastasis is a complex yet poorly understood process that accounts for the majority of cancer-related fatalities. 59Regarding the relationship between platelets and the metastatic status of GI tumors, our study shows that HSPA5 and CDK1 in platelets may be more indicative and predictive for the metastatic stage of GI tumors.Moreover, recent studies showed that platelets not only can be a biomarker for tumor metastasis prediction, but also participate in tumor metastatic activities through "educating" tumor cells. 60,61However, although it can be true for a gene as CDK1 to have consistent roles both in platelets and the tumor bulk because it is highly expressed in most tumor tissues at both sites comparing to normal control, 62 things may be different for a gene as HSPA5 who shows contrasting expression patterns in blood and tumor cells, so its expression deregulation in distinct types of tissue may have opposing diagnostic and prognostic values. 63,64Based on the above evidence, we hypothesized that this two-gene signature has a diagnostic value for GI tumor metastasis.Whether it can be a prognostic biomarker for GI tumors by affecting the incidence of CAT or metastasis still needs exploration.
When trying to detect the expression levels of CDK1 and HSPA5 in clinical samples, apart from platelet samples from healthy volunteers and CRC patients, we also included those from patients with symptomatic GI diseases.Similar to what has been revealed by previous studies, 21,22 platelet mRNA profiles could be very different regarding healthy people and patients with non-cancerous diseases.These results strongly indicate that the presence of symptomatic diseases may also affect the RNA splicing in platelets, just as cancer does.It also gives us a hint that the gene signature may hold the potential for large-scale cancer screening in the general population, where people are mostly healthy, but may not be able to discriminate benign and cancerous GI diseases, as is the condition with hospital admissions.Additionally, the symptomatic control group here includes a rather broad range of GI diseases.Given that under inflammatory conditions, platelets can be Core genes in tumor-educated platelets of gastrointestinal tumor 11 dramatically activated and secrete a lots of inflammatory 65 the inflammatory condition of a disease may also make a difference in platelet splicing.Therefore, setting more subgroups within the symptomatic control group based on their inflammatory conditions is and studying of their platelet profiles are needed in the future.Some limitations of this study need to be addressed.Firstly, heterogeneity remained and could affect the analysis even after we rigorously evaluated the data quality and corrected for batch effects.Secondly, due to a lack of clinical information from the public datasets, the effects of the hub genes on GI tumor prognosis still needs to be further studied and verified.

Figure 1 .
Figure 1.Significant DEGs identified by studies of human GI tumor TEPs.(a-c) Volcano plots produced by NetworkAnalyst illustrating DEGs identified by each study.(d) Shown is a Venn diagram produced to determine the overlap of significant DEGs identified in three independent studies.TEPs: tumor-educated platelets; GI tumors: gastrointestinal tumors; DEGs: differentially expressed genes.

Figure 2 . 5 Figure 3 .
Figure 2. Workflow of the network-based transcriptomic meta-analysis.(a) Flowchart of the meta-analysis.(b) Venn diagram of DEGs identified from the meta-analysis using Fisher's method and random effect model.(c) Heatmaps of the top 10 up-and downregulated DEGs identified by meta-analysis among data sets.DEGs: differentially expressed genes.

Figure 4 .
Figure 4. Network-based analysis of the hub genes on NetworkAnalyst.(a) Force atlas layout of platelet DEGs in GI tumors relative to HCs in the core PPI subnetwork using "zero-order" method in the meta-analysis.The red color shows the up-regulated hub genes in GI tumor platelets.The blue color shows the down-regulated hub genes in GI tumor platelets.(b-d) PPI subnetwork of the other three smaller subnetworks with nodes >10.DEGs: differentially expressed genes; GI tumors: gastrointestinal tumors; HCs: healthy controls; PPI: protein-protein interaction.

Figure 5 .
Figure 5. Visualization of GO terms among hub genes using Cytoscape plug-in BiNGO.Shown is a BiNGO analysis depicting (a) BP, (b) CC, and (c) MF of hub genes in GI tumors relative to healthy controls.The size of the nodes is proportional to the number of genes in the test set that are annotated to that node.The colors of the nodes from yellow to dark orange represent increased statistical significance.GO: gene ontology; BiNGO: the biological network gene ontology tool; BP: biological processes; CC: cellular component; MF: molecular function; GI tumors: gastrointestinal tumors.

Figure 6 .
Figure 6.Visualization of KEGG pathways among hub genes using Cytoscape plug-in ClueGO.Shown is a KEGG analysis depicting hub genes in TEPs of GI tumors relative to healthy controls.KEGG: Kyoto Encyclopedia of Genes and Genomes; TEPs: tumor-educated platelets; GI tumors: gastrointestinal tumors.

Figure 7 .
Figure 7. Construction and assessment of a nomogram model based on GSE68086 for GI tumor diagnosis.(a) Nomogram for tumor occurrence prediction.(b) Calibration curve for model's predictive power assessment.(c) DCA curve for model's clinical value evaluation.(d) Clinical impact curve of the nomogram model based on DCA.GI tumors: gastrointestinal tumors; DCA: decision curve analysis.

Figure 8 .
Figure 8. Diagnostic value of the two-gene signature for GI tumor diagnosis based on GSE68086.(a) Differential expression analysis of CDK1 and HASPA5 according to the metastatic status of patients with GI tumors.HC: healthy control, n = 55; N: localized cancer, n = 21; Y: metastatic cancer, n = 69 (**p < .01,and ****p < .0001).ROC analysis of sensitivity and specificity of CDK1 and HASPA5 in predicting the diagnosis of (b) localized or (c) metastatic GI tumors.GI tumors: gastrointestinal tumors; ROC: receiving operating characteristic.
Unpaired t test results were used to assess statistical significance.Receiving operating characteristic (ROC) curve was applied to examine the diagnostic value of different hub genes.For analysis of the qRT-PCR data, Mann-Whitney test and unpaired t test were used to assess statistical significance respectively for CDK1 and HSPA5.p < .05 was used to determine if differences were significant.Significant values were denoted by the symbols *p < .05,**p < .01,***p < .001,and ****p < .0001.
; N: localized cancer, n = 21; Y: metastatic cancer, n = 69) and then measured.All data were reported as the mean ± SEM.