Shaping the future of oral cancer diagnosis: advances in salivary proteomics

ABSTRACT Introduction Saliva has gained increasing attention in the quest for disease biomarkers. Because it is a biological fluid that can be collected is an easy, painless, and safe way, it has been increasingly studied for the identification of oral cancer biomarkers. This is particularly important because oral cancer is often diagnosed at late stages with a poor prognosis. Areas covered The review addresses the evolution of the experimental approaches used in salivary proteomics studies of oral cancer over the years and outlines advantages and pitfalls related to each one. In addition, examines the current landscape of oral cancer biomarker discovery and translation focusing on salivary proteomic studies. This discussion is based on an extensive literature search (PubMed, Scopus and Google Scholar). Expert opinion The introduction of mass spectrometry has revolutionized the study of salivary proteomics. In the future, the focus will be on refining existing methods and introducing powerful experimental techniques such as mass spectrometry with selected reaction monitoring, which, despite their effectiveness, are still underutilized due to their high cost. In addition, conducting studies with larger cohorts and establishing standardized protocols for salivary proteomics are key challenges that need to be addressed in the coming years.


Introduction to saliva as a liquid biopsy source
Oral cancer is a subtype of head and neck cancer that starts in the squamous cells of the inner mucosa, being one of the most common malignant cancers worldwide.When diagnosed in advanced stages, oral cancer has a poor prognosis and is a very mutilating type of cancer [1].The main risk factors associated with the development of oral cancer are age, male gender, alcohol consumption and smoking.Due to the frequent late-stage diagnosis of oral cancer, there is a crucial need to identify biomarkers enabling earlier detection, allowing personalized prognosis prediction, and anticipating patient's response to treatment [2].
As powerful analytical techniques have evolved, saliva has gained increasingly recognition as a valuable source of liquid biopsy for identification of oral cancer biomarkers.This fluid is mainly produced by three major salivary glands -parotid, submandibular and sublingual glands -alongside minor salivary glands.A healthy individual typically produces around 500-2500 mL of saliva per day, with a composition primarily comprising 97 to 99% of water and a small percentage of lipids, proteins, and inorganic substances [3].Saliva mainly contains four groups of secretory proteins: proline-rich proteins, statherins, histatins and cystatins.Saliva collection is a noninvasive and safe procedure, enabling the collection of multiple samples with minimal infection risk.In the context of oral cancer, saliva has the advantage of being in the proximity of tumor cells that may be reflected in its molecular composition [4,5].However, the salivary molecular profile is very susceptible to several factors such as circadian rhythm, salivary flow rate, type of saliva, genetic polymorphisms, clinical and epidemiological characteristics of the patient [6][7][8][9].After collection of this fluid, hydrolases are activated, increasing the molecular complexity of this body fluid.For instance, proteases degrade the proteins into peptides which makes the analysis of the salivary peptidome extremely important for the discovery of biomarkers since peptides are more than just protein degradation products.Integrating the biological and molecular functions of salivary peptides and proteins is pivotal for gaining a deeper understanding of the disease mechanisms underlying the development and progression of oral cancer [10].However, the lack of standardized protocols for sample collection, processing and proteome characterization and validation has hampered the effective translation of salivary biomarkers to the clinical setting [5,11,12].This review aims to critically analyze the main proteomics approaches employed in the identification of salivary biomarkers for oral cancer, highlighting their advantages and limitations.
To gain a comprehensive understanding of the knowledge surrounding salivary proteomics in the context of oral cancer, a bibliometric analysis was conducted using VosViewer.For this, the search formula [saliva and protein and (oral cancer) or oscc or (oral squamous cell carcinoma)] was used in Scopus, and 822 articles were obtained.After excluding literature reviews and articles in languages other than English, 650 articles were included in the bibliometric analysis being most of them published from 2010.Co-occurrence analysis was performed and a minimum of 2 citations for each keyword of interest was established as a filter.Through manual curation, from 2450 keywords the most relevant were selected resulting in the networks presented in Supplementary Figure S1A and Supplementary Figure S1B.It is possible to observe that when studies using mass spectrometry began, there was a significant increase in salivary proteins identified as oral cancer biomarkers.The first mass spectrometer was developed by J.J. Thomson in 1912 (Supplementary Figure S1C).However, due to the high costs associated with the use of this technique, mass use of this technique has only occurred in recent decades.This evolution has sparked a transformative impact on proteomics, particularly in the analysis of proteins in saliva samples, where analytes are present at low concentrations.The subsequent sections provide an overview of the key advancements in discovering oral cancer biomarkers through salivary proteomics, emphasizing both the challenges encountered and the notable achievements within this research domain.

Workflow for biomarkers discovery and clinical implementation
Biomarkers are defined as characteristics that can be indicators of both physiological and pathogenic processes or can be used to assess the biological response to a particular exposure or intervention.Ideally, in the case of salivary biomarkers, they should be sensitive and specific and objectively quantifiable in saliva samples.Quantification must be reliable and reproducible [13].To enhance the translation of salivary biomarkers into clinical practice, there are some steps that should be taken in a systematic way.The main steps for identifying and developing salivary biomarkers are shown in Figure 1.The first step is biomarker discovery.Once the biomarkers of interest have been selected, proof of concept is required to optimize the conditions relating to the identification and quantification of the biomarker of interest to ensure that the results are reliable and reproducible.The biomarker is then validated in a group of selected patients.If the results are sensitive, specific, reliable, and robust, there is potential for them to be translated into clinical practice [13][14][15].

Optimizing saliva sample handling and standardization strategies
Standardizing the collection, processing, and storage of saliva samples is critical to ensure reliable and consistent results in proteomics applications.Before starting the collection, it is important to establish guidelines regarding the collection method, type of saliva, fasting duration, and whether to rinse the oral cavity immediately before collection.There are several methods of saliva collection, namely for unstimulated saliva, stimulated whole saliva and gland-specific saliva.Among these, the unstimulated saliva is generally considered as the most suitable for salivary proteomics studies aimed at identifying oral cancer biomarkers.In this case, saliva is collected from patients in a resting state using methods such as passive

Article highlights
• The establishment of harmonized protocols for consistent research results.• Saliva is becoming increasingly important in medical research due to its simple, painless, and safe collection method.• Saliva is a source of biomarkers for oral cavity cancer, a malignancy that has a high mortality rate when detected late.• Early detection of oral cavity cancer can significantly improve prognosis and contribute to personalized treatment.• A thorough overview of the improvements needed in study design for such research is provided.drooling, spitting/suction or absorbent materials.These methods are noninvasive, easy to implement and enable the collection of large volumes of samples.Several devices have been developed to support saliva collection (Figure 2).Passive drooling involves minimal oral movement, minimizing the stimulation of salivary secretion and maintaining unaffected salivary rate.This method of saliva collection was validated by Khurshid et al. [16].In the spitting/suction method, the patient spits the saliva into a collection device.In the absorbent method, an absorbent material is placed on the floor of the mouth.The results obtained with these two methods may be influenced by salivary sampling and in the case of the absorbent method, biomarker absorption to the absorbent material may occur [14,15].The collection of stimulated whole saliva involves stimulating salivary secretion prior to its collection.This stimulation can be done through masticatory or gustatory stimuli (chewing gum, wax blocks, cotton swabs, citric acid).
The choice of the stimulant material may impact the composition of salivary proteins identified in oral cancer patients, leading to significant variations in study outcomes [14].
When focusing on studying salivary proteins specific to salivary glands (parotid, submandibular/sublingual, or other minor glands) some methods are available.For investigating proteins secreted by the parotid gland, the Lashley cup or cannulation methods can be used [13][14][15].Lashley cup method is noninvasive and involves attaching a device to the mucosa of the inner cheek using a vacuum, allowing saliva to be collected into a tube [17,18].On the other hand, the cannulation method requires the placement of a tube at the level of Stensen's duct for selective saliva collection; however, it is an invasive approach.For submandibular and sublingual saliva collection, customized collectors or the suction method can be employed [14].In the case of the minor glands, the available methods include customized collectors, the suction method, and filter paper.Customized collectors allow selective collection but often require specialized personnel to collect the saliva samples.The suction method is noninvasive but may result in higher variability among the collected samples.Filter paper, while noninvasive, typically yields smaller sample volumes, and the filter paper itself can influence the results [14].
In addition to the collection method, there are other variables that should be considered when collecting saliva.Age, gender, circadian rhythm, recent food intake or smoking, hydration level, and oral cavity condition are some of the factors that may influence the composition of the saliva [6][7][8][9].Before collection, the patient should have restricted food intake and smoking for 30 to 60 min and should have rinsed the oral cavity with water for about 60 seconds.For saliva collection there are several devices, such as Salivette, Saliva Collection Device (SCS), Orapette, SuperSAL and VersiSAL [9].
After saliva collection, there is another set of variables that need to be considered such as storage temperature, proteolysis, and radiation.In an attempt to create a protocol for a standardized handling of saliva samples, Chevalier et al. compared the stability of salivary proteins at different conditions using one and two-dimensional electrophoresis approaches.It was possible to verify that saliva samples should preferably be collected in the morning, 2 h after eating and following a rinse of the oral cavity with water [19].Before being stored, they should be placed in ice with protease inhibitor, centrifuged and the supernatant stored at − 80°C Figure 2. Main characteristics of saliva as biofluid and types of devices for saliva collection.The image was created with the BioRender.[18,19].This protocol ensures the preservation of sample integrity.Salivary proteins have a very limited stability at room temperature, about 1 h [19].Therefore, it is important to place the samples in a container with ice immediately upon collection.The addition of protease inhibitors is questionable, since there are already collection tubes that come with a solution that allows stabilization of the proteins at the time of collection [20][21][22][23].While the addition of protease inhibitors may inhibit protein degradation, they may introduce complexity into proteomics analysis by interacting with proteins other than proteases.For long-term storage, typically up to approximately 5 years, saliva samples can be safely maintained at − 80°C [24].This freezing temperature effectively inhibits the degradation of the sample contents, allowing for future analyzes.Another important issue of processing saliva samples is centrifugation, which should ideally be done after sample collection.As an alternative, filtration can be performed but tends to be more time consuming and may result in some protein loss through the filters [25].

Methods for protein extraction and separation
There are several methods for protein extraction and separation.The first step consists of lysing and solubilizing the saliva samples.This step can be done using either detergents [sodium dodecyl sulfate (SDS) or sodium deoxycholate (SDC)] or chaotropic agents [urea or guanidine hydrochloride (GndHCl)].Subsequently, the saliva samples can be processed using different methods, namely, in-solution digestion (ISD), filter-aided sample preparation (FASP), solid-phase-enhanced sample preparation (SP3) and protein aggregation capture (PAC).Depending on the method and type of proteomics analysis, proteins may or may not be digested.The most used enzyme for digesting proteins presents in saliva samples is trypsin [26].
Protein separation generally relies on electrophoresis and/ or chromatography-based approaches.In terms of electrophoresis, one-dimensional SDS-polyacrylamide gel electrophoresis (1-DE), two-dimensional SDS-polyacrylamide gel electrophoresis (2-DE), two-dimensional difference gel electrophoresis (2D-DIGE) may be used.Briefly, 1-DE allows the separation of proteins based on their molecular weight.Proteins migrate along the polyacrylamide gel in response to the electrical field according to their molecular weight and, after gel staining, bands can be observed.The pore size depends on the concentration of acrylamide and bisacrylamide.In the proteome characterization of biological fluids such as saliva containing hundreds of proteins, this is not the best approach for the separations of proteins present in a complex mixture [27,28].2-DE was developed by O'Farrell and was one of the major advances in proteomics studies.This electrophoretic approach comprises 2 steps for the separation of proteins present in complex mixtures.First, proteins are separated according to their isoelectric point and then according to their molecular weight.Considering that hardly two proteins have the same isoelectric point and molecular weight, 2-DE allowed to overcome the problem of low resolution observed with 1-DE, being possible to identify a high number of proteins even if they are present in a small amount of sample.The proteins that are best identified using 2-DE are those with a molecular weight between 20 and 220 kDa and an isoelectric point between 3 and 9.It is a more expensive and poorly reproducible approach, but robust and high resolution [28].2D-DIGE is a variation of 2-DE developed by Minden and colleagues that consists of labeling proteins with fluorescent cyanine probes that allow the identification and quantification of proteins without affecting molecular weight and isoelectric point.This method solves the gel-togel variations problem enabling the use of multiple samples in single gels.The sensitivity is better in relation to the other types of electrophoresis due to use of fluorescent dye.However, it is time consuming, and it is necessary that the operator has experience with this technique to obtain good results.It is not the best technique for separating proteins with very low or very high isoelectric points [29].Thus, onedimensional electrophoresis methods are suitable for protein separation in simple samples, but when it comes to separating and identifying proteins present in a complex mixture, twodimensional approaches should be preferred.
As an alternative or complement to 2D-PAGE separation, high-performance liquid chromatography (HPLC) is an excellent option due to the reproducibility of results and the possibility of being coupled with mass spectrometry (MS).There are several types of liquid chromatography, namely reversedphase, ion-exchange, size-exclusion, and affinity chromatography [30].Reversed-phase chromatography (RPLC) is the most widely used type of chromatography due to its compatibility with the various MS methods.RPLC allows the separation of compounds with hydrophobic properties.While in standard liquid chromatography, the stationary phase is nonpolar and the mobile phase is polar, in RPLC it is the other way around.RPLC has the advantage of being less toxic, more economical and with less sample volume it can perform as well as aqueous normal-phase chromatography.When samples contain very large amounts of proteins and peptides in the hundreds of thousands, one-dimensional approaches are often not enough due to insufficient peak capacity.In this case, multidimensional approaches (ion-exchange chromatography-RPLC, affinity chromatography-RPLC, size-exclusion-RPLC) are often necessary [30].An overview of proteomics workflow is displayed in Figure 3, highlighting the most frequently employed experimental approaches.

Methods for identification of proteins
MS has been increasingly used for large-scale protein characterization as it is a high-throughput technique.The principle is the formation of ions in the gas phase that are characterized in terms of mass-to-charge ratio (m/z) and relative abundance.A beam of high-energy electrons hits the molecules, fragmenting them.From the analysis of the m/z of the fragments and their relative abundance, it is possible to obtain information about the amino acid sequence and, consequently, on proteins present in the saliva samples.For salivary biomarker discovery, there are two strategies: top-down and bottomup.In the top-down approach, intact proteins are analyzed, while in the bottom-up approach, peptides that result from the digestion of proteins by a specific protease such as trypsin are analyzed.In the case of the bottom-up approach, the study is usually complemented by HPLC, as the proteins when digested generate a very complex mixture of peptides.These peptides are separated according to their degree of hydrophobicity, with the most hydrophilic peptides being eluted the fastest.
After the sample peptides have passed the chromatographic column and separated based on hydrophobicity, they undergo ionization, followed by separation according to their m/z ratio and subsequent detection.The most important ionization methods include electrospray ionization (ESI), matrix-assisted laser desorption/ionization (MALDI), and surface-enhanced laser desorption/ionization (SELDI), although the latter two are less commonly employed in salivary proteomics.After ionization, the generated ions pass through a mass analyzer such as Time-of-flight (TOF), Orbitrap, Ion Trap, or Quadrupole to accurately determine the m/z ratio and the intensity of the signals under vacuum-induced electric fields.
ESI is a technique in which ions are generated using an electrospray when a high voltage is applied to a liquid to create an aerosol.It is particularly suitable for analyzing biomolecules such as peptides and proteins, making it a cornerstone of proteomics.ESI is characterized by its compatibility with liquid chromatography and enables seamless integration into LC-MS/MS workflows.The advantage of this method is that it can handle a wide range of molecular weights and enables a gentle ionization process that preserves non-covalent interactions.However, the sensitivity to sample preparation and matrix effects can be seen as a limitation.The application of ESI for the identification of salivary biomarkers is facilitated by its versatility in analyzing complex mixtures and provides insights into the proteomic profile of saliva samples [31].In MALDI-TOF/MS the analyte molecules are embedded in a matrix that can absorb ultraviolet light.When this matrix-analyte mixture is irradiated with a laser beam, the matrix absorbs the energy and supports the desorption and ionization of the analyte molecules into the gas phase.The ions are then separated based on their m/z value in the TOF mass analyzer.This technique is well suitable for the analysis of proteins with high molecular weight, typically greater than 100 kDa.MALDI-TOF-MS is valued for its sensitivity, rapid analysis, cost-effectiveness and minimal sample pre-treatment or sample volume is required.However, there are some limitations to consider.One of these limitations is the limited mass range of the analyzer, which may not be able to accommodate extremely high molecular weights.In addition, MALDI-TOF-MS can be sensitive to contaminants, which can affect the reproducibility of results.Finally, the sample preparation protocol can vary significantly depending on the specific characteristics and properties of the analyte, which may require different approaches for optimal results [32].SRM is a quantification method used in MS that focuses on the accurate measurement of specific proteins in complex samples.SRM involves the precise selection of a unique ion (MS1) that corresponds to the target protein and is identified by its unique m/z.This selected ion is then subjected to fragmentation, and the resulting fragment ions are further selected (MS2) according to their unique m/z values.This targeted approach enables the detailed analysis and quantification of proteins by tracking specific ion transitions, facilitating a highly selective and sensitive assessment of protein abundance in biological samples [33].The application of SRM-MS to the analysis of saliva from patients with lymph node metastases has shown significant down-regulation of proteins such as CSTB, LTA4H, PGK1, COL6A1, ITGAV and NDRG1, highlighting the utility of this approach in uncovering critical molecular insights into disease progression and metastatic potential [34].
Data-dependent acquisition (DDA) and data-independent acquisition (DIA) are the main discovery platforms, being DDA the most used.These platforms differ primarily in their approach to isolating precursor ions for fragmentation and subsequent acquisition of MS2 spectra.DDA essentially involves the selection, accumulation and fragmentation of precursor ions based on the analysis of data or signals from an initial MS1 surveillance scan in real time.DIA, on the other hand, systematically works through a predetermined range of precursor ion isolation windows.Within these windows, all precursor ions are fragmented simultaneously, eliminating the need to select precursor ions in real time [35].DIA tends to generate more complex MS2 spectra and multiplex chromatograms, which can lead to a reduction in selectivity for individual precursor ions.This complexity necessitates the use of advanced computational tools specifically designed for the interpretation of DIA data [36].Over the past twenty years, DIA methods have evolved significantly, with numerous acquisition strategies developed and applied to various MS instrument platforms.This evolution has significantly pushed the boundaries of what can be achieved in terms of sensitivity, specificity, reproducibility and analytical throughput using DIA techniques.
DDA and DIA can both be combined with isotope labeling for protein quantification.The methods for quantification of proteins using MS comprise label-free and label-based MS quantification.The results are cross-referenced with databases using software such as Sequest, Omssa and Andromeda for peptide identification [37][38][39].
The use of label-free MS quantification comprises two different types of quantification: spectral counting or spectrometric precursor signal intensity measure of the protein expression.In the first case, the number of spectra obtained for each peptide in the different saliva samples is quantified and the results of all peptides are integrated.Precursor signal intensity is the most used and consists of extracting signals from peptides at the MS1 level by applying high-resolution scan.The main techniques used with this type of quantification are liquid chromatography-tandem mass spectrometry (LC-MS/MS), matrix-assisted laser desorption⁄ionization timeof-flight mass spectrometry (MALDI-TOF-MS) and selected reaction monitoring (SRM).
The application of targeted quantification techniques such as Multiple Reaction Monitoring (MRM), Consecutive Reaction Monitoring (CRM) and Parallel Reaction Monitoring (PRM) has revolutionized the field of proteomics by providing highly reproducible, sensitive and specific methods for the quantification of proteins [40,41].MRM is particularly characterized by its precision in measuring targeted peptides in complex biological samples, making it indispensable for the validation of potential biomarkers in saliva.In studies using MRM, key proteins associated with oral cancer have been successfully identified and quantified, demonstrating the potential of this technique for clinical diagnostics [42,43].For an introduction to targeted quantification techniques and their impact on biomarker discovery the previous paper [35] provides a comprehensive overview that is accessible to nonspecialists.
The use of stable isotope labeling techniques, specifically Isobaric Tags for Relative and Absolute Quantitation (iTRAQ), Tandem Mass Tags (TMT), Isotope-Coded Affinity Tags (ICAT), and Stable Isotope Labeling by Amino acids in Cell Culture (SILAC), has greatly enhanced the ability to simultaneously quantify protein expression in multiple samples [44].For example, iTRAQ has helped to reveal differential protein expression in saliva samples from oral cancer patients, providing new insights into the proteomic landscape of the disease [45,46].Similarly, TMT labeling has expanded the scope of quantifiable proteins in saliva, enabling the identification of more than 1400 proteins in certain studies, which reflects its sensitivity and throughput [47].The evolution of TMT from a 2-plex to an 18-plex system, including the latest TMTpro 16plex system enhanced with NeuCode isotopes [48], illustrates the technological advances in improving analytical throughput and efficiency.These developments have paved the way for comprehensive proteomic profiling of saliva, revealing potential biomarkers for oral cancer and other diseases in unprecedented depth and specificity [49].Such advances underscore the critical role of MS in advancing diagnostic and therapeutic research and promise a future in which salivabased diagnostics could become a routine part of clinical practice.

Methods for salivary protein verification and validation
After the identification of salivary biomarkers, a validation step is necessary so that these biomarkers can be translated to clinical practice.Enzyme-linked immunosorbent assay (ELISA) and western blotting have been traditionally used for validation and verification of data retrieved from MS-based proteomics [35,50].There are several types of ELISA tests, namely, direct, indirect, sandwich, and competitive ELISAs [51].Overall, ELISA is a very sensitive technique with good reproducibility and specificity, when using well-characterized and validated antibodies.However, it requires a considerable sample volume and does not allow to control antibody specificity [51].Western blot enables the determination of the molecular weight of the target protein, confirming the identity of a biomarker by detecting its size and ensuring, to a certain extent, the specificity of antibody detection [52].However, western blot is more time-consuming and inherently semiquantitative, though it requires a relatively smaller sample volume [53,54].Nevertheless, compared to MS, which typically offers unambiguous protein identification via unique peptides at a given false discovery rate, techniques like ELISA, western blot, Olink, or SomaScan (mentioned below) rely on the specificity of affinity reagents, which may be influenced, for example, by posttranslational modifications [55].This limitation underscores the importance of cautious interpretation when employing affinity-based techniques for protein analysis.Immunohistochemistry is also important to validate in tumor tissue samples the biomarkers found in saliva samples from oral cancer patients.However, in addition to the limitations mentioned for affinity-based techniques, there are salivary biomarkers that can be found in saliva samples from oral cancer patients that do not originate from tumor tissue.
Recent advances in proteomics have significantly novel approaches for biomarker searching and validation.Immunoassays such as ELISA only allow one analyte to be measured at a time.The application of multiplex assays for proteome profiling has made it possible to extract information on multiple analytes present in the same sample in a single analysis.It is a much more efficient method than ELISA as it saves time, costs, and material.There are two types of multiplex assays, namely planar microarray (protein chips) and suspension array (microparticle or bead microarray) [56].Protein microarray is similar to sandwich immunoassays.That is, several proteins can be processed in the same analysis.There are three types of microarrays: analytical microarrays, functional protein microarrays and reverse phase protein microarrays (RPPA).In analytical microarrays, antibodies are immobilized on a matrix.The protein of interest binds to the primary antibody and then a secondary antibody coupled to a fluorophore binds to the protein, causing light emission.In functional microarrays, proteins are immobilized on the array and biochemical properties of the proteins can be studied.In RPPA, cells are isolated and lysed.The resulting product of cell lysis is immobilized on a matrix and antibodies are added that bind to a given protein of interest with light emission.It is widely used to study posttranslational modifications [57,58].However, specificity may be an issue thus requiring optimization of the sample volume to be used.In the case of bead-based arrays, each bead binds a specific capture antibody and emits a specific fluorescence intensity.In this way, several proteins can be quantified in the same sample using different beads.The capture antibody binds to the protein of interest and then a fluorochrome-conjugated antibody is added to detect the proteins using flow cytometry.The intensity of the fluorescence emission is proportional to the amount of protein present in the sample [56].Table 1 shows that most of the studies using multiplex assays for identification of salivary biomarkers evaluate cytokines.
Proximity extension assays (PEA) combine the principles of sandwich ELISA with the precision of DNA-based readout methods such as quantitative PCR or next-generation sequencing (NGS).The result is a powerful tool for liquid biopsy detection with a broad dynamic range.PEA uses pairs of antibodies labeled with DNA oligos that hybridize after binding to the target molecule, enabling PCR amplification and precise quantification of proteins.PEA has a wide range of applications, from the identification of prognostic biomarkers in colorectal cancer to the profiling of different cancer types, with high sensitivity and specificity.Despite its power, the highly complex variant of PEA faces challenges in library preparation and NGS, requiring careful validation due to potential bias and variation in large-scale studies [145,146].With the advance of Table 1.Salivary biomarkers for oral cancer and the main techniques used in each study.Legend: Ø, without recruited patients; DNTB, dinitrothiocyanobenzene; ECLIA, electrochemiluminescence immunoassay; FFE, free flow electrophoresis; ELISA, enzyme-linked immunoassay; IHC, immunohistochemistry; LC-MS /MS, liquid chromatography-tandem mass spectrometry; MALDI, matrix assisted laser desorption/ionization; MRM, multiple-reaction monitoring mass spectrometry; NA: non available information; OPSCC, oropharyngeal squamous cell carcinoma; OPMD, oral potentially malignant lesions; OSCC, oral squamous cell carcinoma; PRM, parallel reaction monitoring; SCE chromatography, strong cation exchange chromatography, SDS-PAGE, sodium dodecyl-sulfate polyacrylamide gel electrophoresis; SELDI, surface-enhanced laser desorption/ionization; SRM, selected reaction monitoring; TOF, time of flight mass spectrometry; TSA, trichostatin bioinformatics in recent years, there has been a need to extract as much information as possible from a given sample.Olink technology harnesses the power of PEA, employing pairs of antibodies tethered to DNA oligonucleotides.These antibody pairs selectively bind to target proteins within a sample.A standout feature of Olink technology is its capacity to concurrently measure numerous proteins in a single sample, facilitating a thorough analysis of protein profiles.This high-throughput capability renders Olink assays exceptionally valuable in biomarker discovery and validation in clinical research contexts using saliva samples [147].

Gene name
Aptamers, i.e. short strands of DNA, RNA or peptides, fold into unique tertiary structures that can bind to target proteins with high specificity and affinity.The slow off-rate modified aptamers (SOMA) scan assay, a notable application of this technology, uses these aptamers or SOMAmers with photocleavable linkers and fluorescent markers to capture and quantify proteins and determine their abundance in samples [148].Aptamers offer advantages over antibodies, such as higher affinity, specificity, and easier synthesis, which facilitates scaling up for high-throughput applications.This has enabled the simultaneous profiling of over 7000 proteins [114,149,150].In clinical diagnostics, aptamer-based assays have shown great promise.For example, stool-based profiling with aptamers has identified characteristic protein patterns for the diagnosis of colorectal cancer.Similarly, in non-small cell lung cancer, aptamer-based studies have identified several protein biomarkers that have led to the development of a clinically useful biomarker panel for early detection.

Harmonization of techniques used on salivary proteomics
MS-based approaches have shown a growing efficacy in the analysis of salivary analytes.However, several preanalytical factors, namely the methodology and conditions of saliva collection, as well as the intrinsic quality of the collected biological fluid, influence the results obtained.Thus, the creation of standard operating procedures (SOPs) for each of the steps inherent to conducting salivary proteomics studies are crucial to ensure reliable and consistent results.Furthermore, harmonization of the techniques used will increase reproducibility, reduce technical noise, and overcome the problem of low sample size studies.Another important step is the introduction of quality control steps in proteomics studies.Bourmaud et al. developed a simple internal quality control procedure that has been tested on plasma samples analyzed by LC-MS/MS and can be easily incorporated into proteomics studies with saliva [149].This procedure consists in the introduction of a mixture of exogenous proteins, followed by the addition of isotopically labeled peptides to the reference samples to assess the performance of the sample handling and MS technique.This procedure provides a necessary system suitability test before starting the actual sputum analysis as well as allows continuous monitoring of the instrument performance and sample preparation allowing more comparable, robust, and reproducible results [145,150].The main advantage is that in addition to ensuring optimal conditions during the proteomics study, deviations or malfunctions can be corrected in real time.In this way, the creation of SOPs to control preanalytical factors together with the implementation of internal quality control procedures will ensure the generation of reliable and consistent data for advancing the field of salivary proteomics in oral cancer research and beyond.A case in point is the work of Voß et al., who created HarmonizR, a data harmonization tool specifically tailored for tissue analysis from oral cancer patients [146].In this study, data from several proteomic LC-MS/MS datasets from online repositories were harmonized.The datasets derived from different tissue preservation techniques, LC-MS/MS instrumentation setups and quantification techniques.This work forms the cornerstone for integrating data across various datasets within a specific domain, enabling large-scale data analysis.This type of strategy can be expanded to salivary proteomics studies, offering a fresh perspective on existing datasets with saliva samples from oral cancer patients that would otherwise be challenging to analyzed in an integrated way.Nevertheless, not all laboratories and research groups have the same resources and budget.In this sense, there must always be a balance between costs and performance when deciding on techniques, and this consideration should be incorporated into protocols to ensure standardization across research laboratories, regardless of their individual conditions.

Limitations of salivary proteomics: the importance of small and large cohort studies
The development of increasingly effective strategies for the early detection of oral cancer and for predicting prognosis and response to treatment tailored to each patient is only possible if the results of scientific research are generalizable and reliable.Exploratory studies are often initially conducted at an early stage.However, studies with small sample sizes often have several limitations that hinder this goal.The main limitations include low statistical power, overfitting and high variance, bias and confounding variables, less generalizable results, and high influence of outliers.Determining the sample size is one of the essential steps in study design.A small sample size, together with high variability and small effect size, can decrease the statistical power of the study and jeopardize the integrity of the results.It is recommended that the statistical power should be at least 80%.Beyond 80%, the identification of statistically significant differences or associations become more challenging, as the study may be too sensitive and reveal effects that are statistically significant but may not be of practical significance.Such oversensitivity can lead to overfitting, a statistical problem in which a model is so finely tuned to the training data that it cannot be generalized to new, unseen data.This phenomenon can lead to overly optimistic results in initial studies, but nonreproducible results in subsequent studies [151].
In the case of proteomics studies with two experimental groups, the effect size is calculated based on the difference in protein abundances between the two different sample groups.The main issue is that many of the proteomic studies aimed at identifying salivary biomarkers for oral cancer test many proteins, each with its own effect size and variance.However, exploratory studies with small sizes are important because they allow estimating a mean variance value to correctly determine the appropriate sample size for further validation of proteins in larger cohorts.In addition, small salivary proteomic studies are more likely to have biases and confounding variables.For example, sampling bias may occur if the sample is not representative of the population, distorting the results.Confounding variables may not be evenly distributed across groups in small studies, leading to misleading associations.Due to the limited number of participants, results obtained in salivary proteomics studies with small sample sizes may not be representative of the population, being associated with lower external validity and limiting the generalizability of the results to the general population.
Finally, when low sample sizes are used, any abnormal results may disproportionately affect the results leading to erroneous or inconclusive conclusions.Thus, exploratory studies are important to optimize the study design, but must be complemented with large cohort studies so that the proteins of interest can be translated into clinical practice [145,150,[152][153][154].Large cohort studies allow for more robust, accurate and reproducible results.This type of study represents the cornerstone for the validation of salivary biomarkers for oral cancer and subsequent implementation into clinical practice.Large cohort studies have several advantages in relation to small salivary proteomic studies such as greater statistical power (higher probability of detecting a true effect), better representativeness (more generalizable results to the population), less influence of outliers (more accurate estimates of parameters and relationships), detection of subtle effects that are often missed in low-sample salivary proteomics studies, better treatment of confounding variables (more accurate estimate of the true effect of the variables of interest) and long-term follow-up (monitorization of the progression of oral cancer over time).Thus, despite the higher cost and logistical burden, large cohort studies offer significant advantages by improving the validity, reliability, and generalizability of research findings [152,153].They are essential for progress in areas such as salivary proteomics for the detection and treatment of oral cancer.

Verification of salivary biomarkers through tumor tissue analysis
Immunohistochemistry (IHC) may be used for the verification of biomarkers found in the saliva of oral cancer patients through proteomics with potential origin in HNSCC, through the analysis of tumor tissue samples.IHC involves the specific binding of proteins of interest within HNSCC tissue to antibodies conjugated to an enzyme (peroxidase or alkaline phosphatase) or a fluorophore.The main disadvantages are related to result quantification and susceptibility to human error, as it involves a subjective analysis [155,156].These studies are extremely important because they allow us to verify whether the proteins identified in the saliva of patients with oral cancer reflect tumor biology.The composition of the salivary proteome varies according to the physiological state of the patients, and many diseases can influence the protein profile in saliva.By comparing salivary biomarkers with tissue IHC findings, valuable information about the factors contributing to HNSCC-induced salivary proteome remodeling can be extracted.This approach may significantly enhance our comprehension of HNSCC pathobiology.In recent years, the branch of pathomics has emerged.Digital pathomics takes advantage of deep learning and machine learning algorithms to extract information from high-resolution whole-slide images of tissue sections that allows the generation of data regarding various phenotypic characteristics of this type of sample.Combining proteomics with pathomics studies constitutes an extremely powerful weapon for translating salivary biomarkers.For example, Bankhead et al. created QuPath which is a digital pathology software that allows the integration of histopathology data with genomics and medical imaging data to predict response to immunotherapy treatments, which has been shown to perform very well on unimodal measures including tumor mutational burden and PD-L1 score [157,158].

Validation of biomarkers in patients with potentially malignant lesions
Normando et al. performed a comprehensive meta-analysis showing an association between a set of proteins and the malignant transformation of oral lesions [159].The inclusion of individuals with potentially malignant oral lesions in oral cancer proteomics studies provides a better understanding of the salivary proteome across various stages of oral cancer development.In addition, this validation step has the potential to uncover biomarkers capable of predicting the risk of malignant transformation of oral lesions.
The process of malignant transformation in the context of oral cancer comprises several phases of cell differentiation.For each stage of disease, there are genetic, epigenetic, environmental and tumor microenvironment factors that facilitate or promote this differentiation and are schematized in Figure 4.
The use of saliva as liquid biopsy and source of potential biomarkers for the diagnosis of oral potentially malignant lesions (OPMD) is also extremely important.Some metaanalyzes justify the potential use of salivary biomarkers in OPMD.Arroyo et al. demonstrated that salivary carcinoembryonic antigen (CEA) and soluble fragment of cytokeratin 19 (CYFRA21) can be useful for the differential diagnosis between OPMD and OSCC, being CYFRA 21 also capable of differentiating OPMD from healthy individuals [160].Salivary levels of the cytokines IL-8, IL-6, TNF-α and IL-1β are significantly increased in patients with OSCC compared to OPMD.The cytokines IL-6 and TNF-α in turn are found at significantly higher levels in OPMD compared to healthy individuals.This panel of cytokines is useful in the screening of both OSCC and OPMD [161,162].A meta-analysis performed by Velázquez et al. showed that salivary LDH is useful in the diagnosis of OSCC and OPMD.However, more studies are still needed to define the cut off values for use of this biomarker [163].

Salivary protein biomarkers in oral cancer: current status
Several proteins have been identified and validated in recent years as oral cancer biomarkers, with particular interest in cytokines, growth factors, matrix metalloproteinases and acute phase proteins [164,165].Nonetheless, many of the salivary biomarkers of oral cancer that have been proposed do not originate directly from the tumor.This is an issue that should be addressed in the design of future proteomics studies.However, the fact that these biomarkers have not been identified in OSCC tumor tissue does not mean that they cannot be a good OSCC biomarker.Their role in the pathophysiology of OSCC may not yet be known, and this could be an opportunity to expand our knowledge of OSCC.From the salivary biomarkers proposed for oral cancer, as shown in Table 1, the two most studied ones are IL-6 and IL-8.TNF-α, MMP9, cyfra 21.1 and MMP1 have also been extensively studied in recent years.A meta-analysis performed by Benito-Ramal et al. showed that salivary IL-6, IL-8 and TNF-α are suitable to be used in the diagnosis and prognosis of oral cancer [166].Hema Shree et al. and AlAli et al. concluded that MMP9 and cyfra 21.1 are specific and sensitive biomarkers of oral cancer [167,168].However, they report the need for studies with larger patient cohorts.On ClinicalTrials.govthere are three registered clinical trials aiming to validate salivary proteins for oral cancer.NCT05049408 aimed to validate the MMP1 protein as a diagnostic biomarker for oral cancer.About 1100 patients were recruited (269 with oral cancer, 518 with oral premalignant diseases and 313 healthy controls) from whom saliva samples were collected and analyzed by ELISA for MMP1 identification.In this study, MMP1 showed a sensitivity of 69.5% and specificity of 95% in detecting oral cancer at an earlier stage [60].In addition, it was shown to be a relevant biomarker in monitoring disease progression and recurrence as well as predicting neck lymph node metastasis.The other two clinical trials have no published results.However, NCT03148665 has already been completed and aimed to validate CD44 in the diagnosis of oral cancer using OncAlert as a collection method.NCT03529604 has no information on its stage but involves the validation of three biomarkers of which two are proteins, SCCA and TROP2.In NCT03529604 trial, 100 patients were recruited and the analysis of samples for protein identification involves the use of techniques such as ELISA and liquid chromatography.

Conclusion
The identification of salivary biomarkers in oral cancer is a very promising area that may change the paradigm of this disease.Saliva as liquid biopsy source has many advantages in the identification of oral cancer biomarkers.The collection is noninvasive, easy, allows repetitive sampling and establishes a close relationship with the main structures involved in oral cancer.Several salivary biomarkers have been validated for diagnosis and prognosis of oral cancer.The techniques used in proteomics are increasingly high -throughput.The advent of MS has revolutionized proteomics studies and, today, LC-MS /MS stands as the most widely employed MS variant.It is anticipated that as the field progresses, researchers will move toward increasingly sophisticated techniques that offer improvements in sensitivity, specificity, reproducibility and overall robustness of study results.Such optimization is expected to increase the likelihood that salivary biomarkers will be successfully translated into clinical practice.As these biomarkers are increasingly incorporated into routine diagnostic and prognostic protocols, patients will benefit from personalized and timely interventions.The integration of reliable, noninvasive biomarker-based tests into the clinical workflow may ultimately lead to better outcomes for patients with oral cancer and represent a paradigm shift in the management of this difficult disease.

Expert opinion
The potential of saliva as a liquid biopsy source for identifying biomarkers of oral cancer is undisputed.It has ushered in a new era in oral cancer diagnostics by providing a noninvasive, easily accessible, and patient-friendly medium for disease detection and monitoring.With the introduction and refinement of various proteomic techniques, the proteomics field has made remarkable progress over the years.Despite these promising advances, the field is not yet mature, as we must fully realize the potential of saliva in oral cancer diagnostics.Technological advances, particularly in MS, have greatly improved the landscape of salivary proteomics.Selected reaction monitoring/multiple reaction monitoring (SRM/MRM) MS techniques have been proven to be valuable tools in this field.Compared to the DIAand DDA-MS approaches, MRM-MS has the highest quantitative accuracy allowing relative protein quantification.The main disadvantages are that it only detects and quantifies up to 100 proteins per analysis and that it requires the synthesis of 'heavy peptides' as internal quantitative standards.MRM-MS is not used as a protein discovery strategy, but very promising studies are beginning to emerge that combine DIA-MS with MRM for biomarker discovery and validation.In this way, MRM-MS can also be incorporated into proteomic discovery studies.However, it is imperative to overcome the high costs and accessibility issues of MRM-MS technology to make it available to a broader range of laboratories.Reducing costs and increasing the efficiency of MS platforms, standardizing the experimental approaches used in proteomics studies, creating biobanks, and establishing collaborative projects, will lead to an expansion of clinical cohort sizes.This will make it possible to overcome issues relating to the statistical power of proteomics studies enabling translation of the protein candidates to clinical practice.Multi-Omics approaches are extremely attractive because they can integrate several levels of information, namely genomics, proteomics, epigenomics, transcriptomics and post-translational modifications.Nevertheless, large amounts of data imply the need for superior computing power and better data analysis software.These advancements will bring us one step closer to translating research findings into meaningful clinical applications that can significantly improve oral cancer diagnosis and patient outcomes.
Over the next five years, the field of salivary proteomics in oral cancer diagnostics is poised for a transformative evolution that will revolutionize the way we diagnose and track this disease.Although sophisticated techniques such as MS with SRM/MRM have shown immense potential, their high cost has been a barrier to widespread use.As these methods become more widely used and more laboratories adopt them, costs are expected to decrease significantly due to economies of scale.This reduction in cost will not only democratize access to cutting-edge technologies, but also pave the way for their wider use in salivary proteomics research.Targeted approaches, such as the ones based on aptamers, are expected to expand the diagnostic arsenal available to medical professionals.At the same time, the proteomics research landscape will be redefined by the emergence of international consortia dedicated to standardizing research protocols.These collaborative networks will focus on developing universally accepted best practices spanning every phase of a proteomics study -from sample collection and processing to data analysis and reporting.This harmonization of methods across laboratories will not only eliminate inconsistencies resulting from different techniques but will also improve the reproducibility and comparability of research results.
In parallel, we expect to see an increase in collaborative studies with large, diverse patient cohorts.The consolidation of resources and expertise will enable research initiatives on an unprecedented scale, providing more robust data and improving the statistical power and validity of results.Most importantly, this collaborative approach will facilitate the comprehensive validation of potential salivary biomarkers in diverse populations and clinical contextsan indispensable step for their subsequent clinical application.
Finally, artificial intelligence (AI) models incorporating machine learning (ML) and deep learning (DL) strategies present promising approaches for salivary protein analysis, particularly with large datasets from MS-based proteomics.Capable of handling unstructured data and autonomously extracting high-quality features, these AI models enhance the accuracy of data analysis, thereby advancing biomarker discovery in oral cancer research.The ongoing evolution of technology suggests that the seamless integration of salivary biomarker discovery and AI-driven analytics holds significant potential to revolutionize oral cancer management, paving the way for targeted and individualized care.Overall, we are entering an exceptionally vibrant and dynamic era of research and innovation in salivary proteomics for the diagnosis of oral cancer.With greater financial access to cutting-edge technologies, the development of standardized methods, and an increasingly collaborative research ecosystem, breakthrough discoveries are within reach.However, it is important to recognize that this optimistic outlook requires a sustained, collaborative effort by all stakeholders, including researchers, clinicians, and policymakers.Only by confronting the remaining obstacles and seizing the opportunities that present themselves can we hope to fully realize this transformative vision for the future of oral cancer diagnosis and treatment.

Figure 1 .
Figure 1.Steps from biomarker discovery to clinical implementation.The image was created in BioRender.

Figure 3 .
Figure 3. Proteomics workflow and most used experimental approaches in proteomic studies using saliva as liquid biopsy for oral cancer.This image was created in BioRender.

Figure 4 .
Figure 4. Temporal evolution of salivary biomarkers and techniques used in its identification.Vosviewer network of most cited salivary biomarkers and techniques used in proteomics for biomarkers discovery in oral cancer (a).Vosviewer overlay showing the temporal distribution of the most cited salivary biomarkers and techniques used in proteomics for biomarkers discovery in oral cancer (b).Chronological discovery of the most relevant techniques used in studies of proteomics using saliva as source of liquid biopsy for discovery of biomarkers of oral cancer (c).