Combining human platelet proteomes and transcriptomes: possibilities and challenges

Abstract The anucleate human platelets contain a broad pattern of mRNAs and other RNA transcripts. The high quantitative similarity of mRNAs in megakaryocytes and platelets from different sources points to a common origin, and suggests a random redistribution of mRNA species upon proplatelet formation. A comparison of the classified platelet transcriptome (17.6k transcripts) with the identified platelet proteome (5.2k proteins) indicates an under-representation of: (i) nuclear but not of other organellar proteins; (ii) membrane receptors and channels with low transcript levels; (iii) transcription/translation proteins; and (iv) so far uncharacterized proteins. In this review, we discuss the technical, normalization and database-dependent possibilities and challenges to come to a complete, genome-wide platelet transcriptome and proteome. Such a reference transcriptome and proteome can serve to further elucidate intra-subject and inter-subject differences in platelets in health and disease. Applications may also lay in the aid of genetic diagnostics. Plain Language Summary Blood platelets contain thousands types of mRNAs and proteins. The mRNA composition is similar to the mRNAs of megakaryocytes, from which platelets are derived, suggesting a random redistribution upon platelet formation. First attempts to identify all classified platelet proteins from mass spectral analysis used the genome-wide information of all mRNA types. This analysis revealed that the so far absent proteins in platelets are especially located in the megakaryocyte nucleus, or have low mRNA levels or low copy numbers. In this review, we discuss the possibilities to come to subject-dependent identification of the platelet protein and mRNA composition. Future applications may aid the genetic diagnostics.


Introduction
Platelets are the smallest blood cells with an anucleate structure, in majority released from megakaryocytes in the bone marrow. 1 Although anucleate, platelets contain an unexpected large transcriptome, including messenger ribonucleic acids (mRNAs), long noncoding RNAs (lncRNAs), microRNAs (miRNAs), pre-mRNAs and circular RNAs (cRNAs). 2 In the current decade, one of the major scientific promises for cell biology in general and for platelet biology in particular is thought to be provided by multi-omics analyses including these RNA species. 3ollowing a multi-omics approach, the detailed knowledge of an individual's (epi)genome, combined with information on the megakaryocyte, the platelet protein profiles (proteomes) with posttranslational modifications, as well as the platelet metabolome and lipidome (lipid profile with molecular species of phospholipids), should provide a very precise idea of what platelets are and what platelets can do? in terms of structure and function.On the other hand, recent developments in high-resolution mass spectrometry and in (c)DNA sequencing have been resulting in so many molecular data that it becomes difficult to understand what a single transcript, modified protein or other molecule is doing in detail.Furthermore, the thousands of molecular differences obtained between any set of platelet samples makes it difficult to separate the relevant changes from confounders and artifacts.There is a tendency therefore to simplify the complex inter-sample differences into affected "biological pathways", 4 often with a non-surprising outcome.
6][7] We can thus expect that combined omics datasets are becoming more and more important for obtaining molecular biomarkers of platelets in health and disease and for finding new therapeutic targets. 3,8n the following sections, we discuss current procedures to obtain "general," complete and comparable human platelet proteomes and transcriptomes.We also touch on potentials and limitations of the omics approaches

Unravelling the complexity of the platelet proteome
According to the curated knowledgebase neXtProt of human proteins and functions in disease, the annotated human genome and transcriptome predicts for the presence of over 20 380 protein-coding genes, although with the notion that many of the predicted proteins are not or scarcely detected in biological samples. 9In the Human Proteome Project, aiming to generate a strict blueprint of the human proteome, about 18 000 genelinked protein entries have been confirmed at the highest confidence level (protein evidence 1, PE1), 10 whereas additional proteins at PE2-4 levels are only predicted from transcripts or from the presence in gene models. 11This distinct level of protein identification is reflected as an annotation score in the UniProtKB database, used for the lookup of proteins from peptides identified in mass spectrometry analyses.The latter database contains two sections, which are referred to as "UniProtKB/ Swiss-Prot" (reviewed, manually annotated) and "UniProtKB/ TrEMBL" (unreviewed, automatically annotated). 12The first part includes common (human) genetic variants and global information on the intracellular location and function of an annotated protein per gene.
Due to the complexity of the platelet proteome, as a consequence of variations in protein abundance, isoform presence and posttranslational modifications (PTMs), its complete description remains a challenge.A problem is that, in contrast to the stable genome, in a platelet or megakaryocyte the transcript profiles and PTM changes such as phosphorylation, acetylation, proteolysis and glycosylation are highly dynamic, depending on the cellular state and environment. 6,13In addition, the complex protein kinase-and phosphatase-driven signaling networks impede our understanding of the dynamic activation state of a platelet. 14he early global proteomic analyses suggested that at least part of the platelet proteome is in general stable across healthy human subjects. 6,15More recent papers confirm this, and also show that for instance in patients with ST-segment-elevation myocardial infarction (STEMI) only a limited set of proteins is modified. 16his overall stability is clearly linked to the anucleate structure of platelets, minimizing the processes of gene transcription and ribosomal translation. 17A question in this respect is if the current knowledge on the platelet transcriptome can be exploited to better understand the complexity of the platelet proteome.

Towards a comprehensive genome-wide platelet transcriptome
In the past decade, the Blueprint epigenome consortium has generated genome-wide datasets of the gene transcripts in human hematopoietic stem cells and all major differentiated cells, including megakaryocytes and platelets. 18,19This effort was performed by high-throughput and high-depth RNA sequencing (RNA-Seq, via cDNA), in order to identify products of all known gene variants.Based on this genome-wide RNA-Seq dataset, we performed a listing and annotation of the obtained RNA species, resulting in quantitative information on not only 14 800 protein-coding genes, but also 43 000 pseudogenes and RNA genes. 20Interestingly, most of these pseudogenes and RNA genes appeared to be present at only very low levels in megakaryocytes and platelets.We further noticed a high quantitative similarity of mRNA levels in megakaryocytes and platelets (R = 0.75), which suggested a random redistribution of the megakaryocytic mRNA pools upon proplatelet formation. 20This contrasted to the absence of certain (nuclear) proteins in the platelet proteome.
In general, the RNA-Seq outcome relies on the used list of reference transcripts, in other words on the definition of a certain gene.Current transcriptome lists are commonly based on the Ensembl knowledgebase, which collects so-called primary assembly genes.These assembly genes, based on HGNC gene symbols, are assigned with the annotation ENSG00000XXXXXX. 21As an easy access tool to check for primary assembly genes including protein-encoding, pseudogenes or RNA genes, the website GeneCards is available, which mostly cross-refers to the Ensembl gene annotations. 22or the majority of protein expressing genes, also links to corresponding UniProtKB/Swiss-Prot pages are available, which provide general information on the structure, intracellular location and global function of the corresponding proteins.However, per HCNG gene, UniProtKB often contains several reviewed cards, so that the link between gene, mRNA and protein is not always unique.Several on-line databases can link obtained lists of transcripts or proteins to biological and disease-related pathways, including Reactome 4 and KEGG. 23or inherited diseases, the compendium site OMIM links human HCNG genes to genetic phenotypes 24 In order to provide an easy accessible insight into the meaning of the long lists of transcript values obtained by RNA-Seq analysis, we recently developed a procedure for the assignment of any (annotated) transcript and protein to a protein function class in platelets (Figure 1).The developed classification was hierarchical, according to a decision tree that is primarily based on UniProtKB information. 20The decision rules are based on the (likely) primary protein's intracellular location and function.Priority order of decision assignment for megakaryocytes and platelets is from "central" to "peripheric:" nucleus → mitochondria → endoplasmic reticulum and Golgi apparatus → cell → other cellular vesicles (lysosomes, peroxisomes, endosomes, secretory vesicles) → (plasma) membrane interactions → cytoskeleton structures → cytosolic protein types.Note that according to this schedule, extracellular proteins are classified as secretory proteins, whether or not released by platelets or present in the blood plasma by endothelial or liver secretion.

Choices in platelet proteome analysis
In a recent review, we enumerated on the workflow of 67 published proteomic studies on human platelets in health and disease, regarding platelet purity, sample size, sample separation, protein digestion method, and the type of labeling or enrichment. 25The vast majority of studies referred to "human platelets" without attention to intra-individual differences between pools of platelets or to inter-individual diversity (genetic or acquired), and focused more on the "real" platelet global proteome and sub-proteomes.Half of the studies used isolated washed platelets with reasonable purity, while the other studies investigated platelet releasates, vesicles or immuno-affinity fractions.
The workflow for bottom-up proteomic analyses (i.e., without gel separation) encompasses sample preparation of protein lysates via either trichloroacetic acid and/or acetone protein precipitation, combined with in-solution trypsin digestion. 25Some papers use a filter-aided sample preparation (FASP) before trypsin digestion.Such bottom-up analyses give a high number of identified proteins, although the use of trypsin is seen as a limitation, since this protease can cleave in an imprecise way and may over-or underdigest specific peptide sequences.If required, additional sample treatment steps can include an enrichment for phosphorylated peptides, which for instance is performed by metal oxide affinity chromatography using titanium dioxide (TiO 2 ) beads, in combination with hydrophilic interaction liquid chromatography (HILIC). 26In addition, digested protein samples can be prepared for the assessment of N-terminal cleavage, ubiquitinoylation, acetylation or glycopeptide sites. 3The comparison of samples is possible by stable isotope labeling, for instance using isobaric tags for relative and absolute quantification (iTRAQ) or using tandem mass tags (TMT). 27n order to achieve a full platelet proteome, attempts have been made to combine data from multiple workflows.Caution has to be taken here, since the way of sample processing and the complexity of mass spectrometric analysis introduce many sources of variability between datasets. 28For human platelets, one large-scale proteomic study has estimated copy numbers per platelet of about 3,700 proteins, but this study dates back to over a decade. 15ith the recent availability of more advanced mass spectrometers and more sophisticated acquisition techniques, i.e. from data-dependent to -independent acquisition (DDA to DIA) in combination with parallel accumulation-serial fragmentation (PASEF), a further completion of the platelet proteome is to be expected, even including low-abundance proteins.A refinement of the workflow includes the transition from lengthy FASP approaches (filtering for molecular size) to suspension traps (S-traps) that employ a three-dimensional porous filter to trap proteins and deplete interfering substances. 29A progress in enrichment procedures is from the laborious working with TiO 2 beads to the use of immobilized metal affinity chromatography (IMAC) and high-throughput liquid handling systems in 96-well format. 30owever, some limitations remain.Platelets and platelet proteins are highly sensitive to changes in the microenvironment during isolation, and react on factors such as temperature, pH, divalent cations, clotting and shear stress.This makes the preanalytical preparation of platelet samples a critical issue.An issue complicating the isolation of platelets is their small volume (9-11 fL) and mass, when compared to erythrocytes (80-10 fL).Accordingly, care needs to be taken to minimize the contamination of a platelet preparation with erythrocytes and leukocytes.On the other hand, plasma proteins, being present in the platelet open canicular system can be considered as an integral part of platelets, implying that the full platelet proteome includes such proteins.Trypsin is being used as a gold standard for protein cleavage, but its activity needs to be precisely controlled to limit inter-sample variation.Other challenges are the identification of (very) low abundant proteins, of insoluble membrane proteins, the separation of sequence-related proteins, and a most reproducible analysis of highly complex mass spectra.
With average proteomic studies obtaining 2,000-5,000 human platelet proteins, 25 the question arises whether we should strive to first complete the platelet proteome or to first obtain a smaller high-precision proteome.2][33] Accordingly, with some exceptions, it is often unclear if and how a given phospho-site is regulated upon platelet activation or inhibition.On the other hand, certain phosphoproteomics studies provide valuable information on altered platelet functions in disorders like severe obesity 34 or the genetic Scott syndrome. 35he aim raised five years ago to use mass spectrometry of platelet PTMs for diagnostic purposes 8 is hence still ahead of us.A possibility to improve the quantification of targeted proteins is the use of covalent chemical probes, which are small molecules with known selectivity thus allowing easy mass-spectrometric identification. 36,37Taken together, efforts such as proteomic Combining platelet proteomes and transcriptomes 3 standardization to produce reproducible datasets, high-throughput analyses, and targeted analyses of protein (PMT) subsets seem to be the promising approaches to compare (activated) platelets from cohorts of patients with bleeding, thrombotic or other plateletrelated complications.

Choices in platelet transcriptome analysis
For quantitative information on the mRNA profile in (human) platelets, it is also needed to collect well-purified platelet samples, thus eliminating the transcripts from RNA-rich leukocytes and even from erythrocytes.The common procedure is to isolate and wash platelets (leaving out buffy coats), and subject these to a leukocyte depletion protocol.Checking the purity of a leukocyte-depleted platelet preparation preferentially occurs by microscopy, since blood cell counters (Sysmex) or flow cytometry are too insensitive to determine the required leukocyte contamination fractions of <0.1%.Obviously, the isolated platelets are to be collected and lysed in an RNA-protecting medium.Unless there is specific interest in the highly abundant ribosomal RNAs, it is advised to deplete the platelet lysates from such RNA forms.An exception is when investigators want to preserve the ribosomes for a so-called ribosomal footprint profiling, in which the mRNA species attached to ribosomes are analyzed, as an indication for the transcripts undergoing active translation. 38iven the high amount of sequence information obtained by RNA-Seq, expertise data processing is needed for selecting the required biological or clinical information.The RNA-Seq data per transcript (from protein-encoding genes, pseudogenes or RNA genes) are usually expressed as fractions, compared to the total number of sequence reads.Three calculation procedures are common. 39(i) RPKM (reads per kilobase of transcript per million mapped reads), which represents the number of reads mapped to a certain transcript at certain sequencing depth divided by the total number of reads in the library, taking into account the transcript length.(ii) FPKM (fragments per kilobase of transcript per million mapped fragments), which calculation is closely related to RPKM, except for the fact that pairs of reads per transcript are counted instead of single reads.(iii) TPM (transcripts per million), i.e. a value of the total reads mapped to a given transcript, normalized for transcript length, and expressed as a fraction of the reads mapped to all genes.Important to consider is that RPKM, RPKM and TPM values give information on the relative abundance of a transcript among all sequenced transcripts, and therefore depend on the composition of all RNAs in a given sample. 39If platelet samples differ too much from each other (isolation protocol, purity), the obtained RPKM or TPM values cannot directly be compared and need to be re-normalized.As a consequence of this, the comparison of different studies on platelet transcriptomes is not a straightforward procedure.Nowadays, genome-wide transcriptomes -also of platelets -provide very large sets of gene transcript lists up to 75 000. 20n the last few years, the interest in megakaryocyte and platelet transcriptomics for discoveries in human health and disease is rapidly growing. 40Most studies so far compare the transcripts in platelets under two or three conditions, with only incidental attention to the stability and variability of the transcriptome within individuals. 41 challenge is still to obtain a complete "reference" platelet transcriptome from well-purified platelets of an "average" subject or mouse.Yet, developments in the field are fast with many platelet and megakaryocyte groups regularly analyzing large transcriptome datasets. 42For instance, Armstrong et al. revealed in aging mice the conservation of certain mRNAs, along with granule proteins. 43In addition, Shen et al. identified platelet transcriptome markers for the prediction of chronic myeloproliferative neoplasms. 44

A compared genome-wide human platelet transcriptome and proteome
It has been estimated that the proteomes of nucleated cells contain in the order of 10 000 proteins. 45,46Also for platelets an achievable proteome of 10 000 proteins is predicted, 20 in spite of the fact that individual studies rarely report more than 5,500 proteins. 25he Blueprint consortium has been able to compare the (epi) genomes and transcript patterns of all major human hematopoietic cells and lineages, including CD34-derived megakaryocytes and platelets. 18,19Based on the obtained large datasets, encompassing 57 800 transcripts (mRNAs, long noncoding RNAs, microRNAs, and RNAs derived from pseudogenes), it became possible to list the genome-wide transcript profiles from pooled human platelets and megakaryocytes derived from healthy individuals. 20To facilitate understanding of the relative expression levels, the transcripts were expressed as log2(FPKM +1) values (meaning that absent transcripts had a value of zero).Interestingly, for all 57 800 transcripts (R = 0.85) and for the 14 800 protein-coding transcripts (R = 0.75), a high similarity of the platelet and megakaryocyte transcriptomes was seen.Taken into account the fact that the cells were obtained from different (pools of) subjects, this high similarity pointed to a random redistribution of megakaryocytic mRNAs and other RNAs upon proplatelet shedding.
For a better understanding of the functions of all these transcripts and corresponding proteins, the classification scheme of Figure 1 was used.In addition, information was taken from six studies to establish a combined human platelet proteome, encompassing 5,200 unique proteins, including 3,700 with estimated copy numbers. 20Our quantitative analysis of the combined human platelet transcriptome (log2 + 1) and proteome (copy number) showed a relatively low correlation of the transcript and protein levels (R = 0.25).On the other hand, it appeared that high protein copy numbers were always accompanied by high transcript levels, but not vice versa.Accordingly, in megakaryocytes high transcription of a gene appears to be a prerequisite for high translation.
By combining the information on transcript level and identification in the proteome, for 20 245 classified genes (17,654 in platelets and 16 900 in megakaryocytes) we could also determine which transcripts corresponded to protein products in the identified platelet proteome.Herein, the level for relevant transcripts was set to a low level of log2(FPKM +1) >0.20.The function class analysis showed an under-representation in the proteome of mainly four classes: C10 membrane receptors & channels, C13 other nuclear proteins, C20 transcription & translation, and C21 uncharacterized & other proteins (Figure 2a-d).This held for analysis of the combined platelet/megakaryocyte transcriptome, as well as for the separate platelet and megakaryocyte transcriptomes (Table S1).By modeling approaches, we recognized three restraining factors limiting the identification of certain proteins in platelets by mass spectrometry.These restraining factors were: (i) (peri) nuclear localization in the megakaryocyte; (ii) low transcription levels, and (iii) low translation levels. 20This unequal appearance of proteins was confirmed by analysis of the combined proteome of platelets from 30 healthy subjects, revealing about 1,000 additional proteins in the main function classes.
Regarding the still "missing" part of the platelet proteome, these for instance include transcript products that are often absent in other human tissues, such as the many olfactory and taste receptors and zinc finger proteins. 10,47The still improving mass spectrometry approaches for protein identification will certainly help in finding many of the remaining, likely low abundance proteins in platelets.
Interestingly, our analysis indicated that several proteins present in the combined platelet proteome are lacking the corresponding transcripts (Figure 2 and Table S1).These proteome-only proteins were mostly secretory (= plasma) proteins (C17) such as fibrinogen, likely present in the platelet open canicular system or endocytosed.And furthermore, a set of uncharacterized proteins (C22) and proteins of the intermediate cytoskeleton (C2).Of the latter, indeed keratins are common contaminants in a typical proteomic workflow.Distribution profiles of the relevant transcripts (level >0.20) in both platelets and megakaryocytes furthermore indicated high numbers of RNA genes and pseudogenes (Figure 2e-f).The majority of the RNA genes and pseudogenes were linked to protein encoding genes (i.e.same locations on the chromosome), thus suggesting a considerable amount of transcriptional "noise." Considering the role of platelets in hemostasis and thrombosis, from the multi-omics analysis we could list 124 genes that are associated with thrombocythemia or thrombophilia, of which the products in the platelet proteome and transcriptome were present at relatively high expression levels, 20 i.e. for mRNAs log2(FPKM +1) of 4.58 ± 3.70, and for protein copy numbers of 22 800 ± 73,000 (mean ± SD).This underlines the view from others that platelet omics analyses start to provide relevant information regarding platelet functions in health and disease. 40

Multi-omics perspectives
In a given platelet, the various RNAs are subjected to rapid turnover and degradation. 17,48,49There also is scarce information that subject age and gender can affect the platelet transcriptome. 50Accordingly, the aspect of heterogeneity of transcripts between platelets of an individual and between individuals is becoming a point of attention.In order to precise such differences, consistent quantification and normalization approaches will be required.Essentially the same requirements also apply to the comparison of platelet proteomes. 51Once resolved, it is likely that the inter-platelet mRNA differences reflect variability in the megakaryocytic transcription and translation processes rather than differences in platelet functions.
An important application of the completed and normalized transcriptomes and proteomes of a subject's platelets will be in the genetic diagnostics.The omics analyses can assess how the allele-specific expression of certain mRNAs or proteins is linked to genetic variance, for instance in heterozygous carriers of mutations linked to platelet bleeding disorders.Transcriptome and proteome analyses can reveal why and how (compound) heterozygous mutations are not always associated with an altered platelet phenotype.The same applies to somatic mutations, which can also influence platelet traits. 52An example is assessment of the mRNA-or protein-based expression of JAK2 mutations in platelets from patients with polycythemia vera.Finally, even more indepth insights can be obtained from single cell proteome and transcriptome analyses, using techniques that can also be used to understand the diversity of single platelets. 53verall, in this review we discussed procedures to obtain complete human platelet proteome based on the quantitative comparisons of genome-wide platelet and megakaryocyte transcriptomes.The availability of such a reference transcriptome and proteome of human platelets can serve to further elucidate important intra-subject and inter-subject differences between platelets in health and disease.Other applications may lay in the aid of genetic diagnostics.

Figure 1 .
Figure 1.Classification scheme for the assignment of platelet transcripts and protein to 21 function classes.Class assignments are based on the major subcellular localization of a protein and the assumed function according to UniProt-KB.(a), Protein function class numbering in alphabetical order.(b), Hierarchical decision tree for classification.Modified from Ref. 20 .

Figure 2 .
Figure 2. Relevant transcripts in platelets and identified proteins.Shown are numbers of relevant transcripts with levels of log2(FPKM +1) >0.20 per protein function class appeared to be present or not in the composed platelet proteome.Class descriptions C01-C21 are, as in Figure 1.(a), All classified transcripts >0.20; (b), Transcripts >0.20 only present in the transcriptome; (c), Transcripts >0.20 also present in the proteome.(d), Low-level or absent transcripts, but proteins identified in the proteome.(e, f), Distribution of transcripts in platelets (e) or megakaryocytes (f) with levels of >0.20 or <0.20 with presence (yes) or absence (no) in the platelet proteome.Data are derived from Ref. 20 .