5´XP sRNA-seq: efficient identification of transcripts with and without 5´ phosphorylation reveals evolutionary conserved small RNA

ABSTRACT Small RNA (sRNA) sequencing has been critical for our understanding of many cellular processes, including gene regulation. Nonetheless, the varying biochemical properties of sRNA, such as 5´ nucleotide modifications, make many sRNA subspecies incompatible with common protocols for sRNA sequencing. Here we describe 5XP-seq that outlines a novel strategy that captures a more complete picture of sRNA. By tagging 5´P sRNA during library preparation, 5XP-seq combines an open approach that includes all types of 5ʹ-terminal modifications (5´X), with a selective approach for 5-phosphorylated sRNA (5´P). We show that 5XP-seq not only enriches phosphorylated miRNA and piRNA but successfully discriminates these sRNA from all other sRNA species. We further demonstrate the importance of this strategy by successful inter-species validation of sRNAs that would have otherwise failed, including human to insect translation of several tRNA (tRFs) and rRNA (rRFs) fragments. By combining 5´ insensitive library strategies with 5´ sensitive tagging, we have successfully tackled an intrinsic bias in modern sRNA sequencing that will help us reveal the true complexity and the evolutionary significance of the sRNA world.


Introduction
High-throughput RNA-sequencing techniques have revolutionized our understanding of various small noncoding RNA (sRNA) species, typically with an arbitrarily defined length of less than 200 nucleotides. Numerous studies have provided insight into the indispensable functions of sRNAs in regulating cellular processes, such as cell development, differentiation and proliferation under physiologically normal and pathological conditions [1,2]. In addition to wellcharacterized micro (miRNA), small interfering (siRNA) and PIWI-interacting RNAs (piRNA) [3,4], new sRNA species deriving from transfer (tRNA), ribosomal (rRNA) [5], small nucleolar (snoRNA), vault (V RNA) and Y RNA [6] exists in many cell types including the transcriptionally dormant sperm [7,8]. These fragments are not just by-products of pervasive transcription or random RNA degradation but are often precisely engineered to exert specialized regulatory functions. For example, specific fragments generated from tRNAs (tsRNA) have been implicated in translational control and cell growth [9]. The regulatory repertoire of sRNA in general, and tsRNA specifically, is expanded by the addition of small molecular modifications including methylations and pseudouridylation. Such modifications affect RNA stability, three-dimensional structure, cellular location and interactions with proteins [10][11][12].
A major limitation when exploring the sRNA world is the biased enrichment of different sRNA species introduced during library preparation that are caused by diverse chemical modifications at the 5´-and 3´-ends of RNA [13][14][15]. Hydroxyl (OH) or phosphate (P) groups can be found either in the 3´-or 5´-end depending on the processing pathway, while 2´,3´-cyclic phosphate (cP) and 2´-O methylation are only present on the 3´-end [16,17]. In addition, sRNAs can have multiple types of 5´-caps such as mGpppC, 7mGpppG, GpppG, GpppA, and 7mGpppA in human cells [18] or NAD+ [19] and 3′-dephospho-CoA, succinyl-dephospho-CoA and acetyl-dephospho-CoA in bacteria [20].
Most high-throughput sequencing methods require the ligation of sequencing-compatible adapters first to a hydroxylgroup at the 3´-end (3´-OH) and then to a phosphate group at the 5´-end (5´-P) of the sRNA molecule. These chemical reactions therefore select for sRNA containing these very specific terminal modifications, that are subsequently PCR amplified and finally detected during sequencing.
Here, we describe 5´XP sRNA-seq, a novel sequencing method with a unique tagging system that allows the detection of both sR NA with a 5´-P (typically miRNA and piRNA) and sRNA with alternative 5´ terminal groups. We show that this method increases specificity towards miRNA and piRNA when compared to a commonly used strategy, while simultaneously allowing for additional fragments to be detected in the same library. Overall, we show that 5´XP sRNA-seq is an adequate strategy for tackling critical biases commonly prevalent in standard nextgeneration sRNA library preparations.

RNA extraction from Drosophila embryos
RNA was isolated from Drosophila melanogaster (W1118) embryos 0.5-2.5 h of age. Embryos were dechorionated in pools of 50 embryos using sodium hypochlorite 3.5% (RECTAPUR) and washed extensively with RNase free H2O. RNA was isolated using the miRNeasy Micro kit (Qiagen) according to the manufacturer's instructions. In brief, 500 µl Qiazol (Qiagen) was added to the embryos and homogenized for 2 min at 40 Hz using a Tissue Lyser LT (Qiagen) and 5 mm stainless steel beads (Qiagen). Phase separation was done by mixing 100 µl of Chloroform followed by centrifugation at 12000xg for 15 min in 4°C. RNA was then collected by columns, washed and eluted in 14 µl nuclease free H2O. RNA integrity was confirmed by Bioanalyzer (Agilent) and concentrations were determined by Nanodrop (ThermoFisher).

5´XP sRNA-seq library preparation
A side-by-side comparison of the workflows of the different methods used in this study is available in Supplementary  Fig. 1.
The 5´XP sRNA-seq method includes the following steps: (i) Ligation of 3´ RNA adapter -18.5 h -proceed immediately (over-night) (ii) Ligation of 5´ RNA tag -2.5 h -proceed immediately (iii) cDNA synthesis -2.5 h -pause point (iv) Ligation of 5´ cDNA adapter -3.0 h -pause point (v) Barcoding & Amplification -1.5 h -pause point (vi) Size selection ->12 h -pause point (over-night) (vii) Pooling barcoded samples -3 h -pause point In detail, the 3´ adapter was first ligated using 0.5 µl of 3´SR Adapter for Illumina (NEB-kit) added to 30 ng of input RNA diluted in 3.0 µl nuclease free H 2 O. Samples were heated to 70°C for 2 min and immediately cooled on ice. A mix of 5 µl 3 ´Ligation Reaction Buffer 2x (NEB-kit) and 1.5 µl of 3 ´Ligation Enzyme Mix (NEB-kit) were added and incubated for 18 h at 16°C over-night. A 2S rRNA blocking oligo was added simultaneously to hybridization of the reverse transcription primer by adding a mix of 0.5 µl of First 2SrRNA block oligo, 0.5 µl of SR RT primer (NEB-kit) and 1.75 µl of nuclease-free water to each sample followed by incubation at 90°C for 30 s, 65°C for 5 min, 37°C for 15 min, and finally 25° C for 15 min. 5´-tagging was then performed by adding 0.5 µl 5P-tag RNA oligo (denatured at 70°C for 2 min), 0.5 µl 5 ´Ligation Reaction Buffer (NEB-kit) and 1.25 µl of 5´Ligation Enzyme Mix (NEB-kit) followed by a 1 h incubation at 25°C. To synthesis cDNA, a mix of 4 µl First strand Synthesis Reaction Buffer (NEB-kit), 0.5 µl Murine RNase Inhibitor (NEB-kit) and 0.5 µl ProtoScript II RT (NEB-kit) was added to each sample and incubated first at 50°C for 1 h, and then at 70°C for 15 min. Enzymatic inactivation and hydrolysis of RNA were done by incubating samples at 95°C for 3 min with 17.5 µl nuclease-free water, 10 µl 0.5 M EDTA and 2.5 µl 5 M NaOH. After room temperature equilibration pH was lowered by adding 1 µl 5 M HCl. Samples were then cleaned using Oligo Clean & Concentrator kit (Zymo Research) according to manufacturer's recommendation but with an extra wash in 250 µl DNA wash buffer (Zymo Research) and cDNA elution in 11 µl of nuclease-free water. Ligation of pre-adenylated R1R adapter with 2S RNA blocking was done by adding 10 µl of cDNA, 2 µl 10x NEBuffer 1 (New England Biolabs), 2 µl 50 mM MnCl2, 2 µl Thermostable 5´App ligase (New England Biolabs), 4 µl 10 µM pre-adenylated R1R Adapter, and 0.5 µl 10 µM Second LNA 2SrRNA block oligo, followed by 2 h incubation at 65°C. Ligated cDNA samples were cleaned and eluted in 24 µl Elution Buffer (Qiagen) using Oligo Clean & Concentrator (Zymo Research) according to the manufacturer's instructions. Clean samples were amplified by adding 25 µl 2x Phusion High-Fidelity PCR master mix (ThermoFisher), 1 µl 10 µM Illumina Multiplex Primer, 1 µl 10 µM Illumina barcode (NEB-kit) to 23 µl of cDNA, and incubated in a thermocycler at 98°C for 5 s, followed by 15 cycles of 98°C for 5 s, 60°C for 10 s, 72°C for 30 s, ending with 72°C for 1 min. Amplified libraries were cleaned using Agencourt AMPure XP (Beckman Coulter), and size selected for 130-200 nt fragments on a pre-casted 6% polyacrylamide Novex TBE gel (Invitrogen). Gel extraction was done using Gel breaker tubes (IST Engineering Inc) in DNA Gel Elution Buffer (NEB-kit). Disintegrated gels were incubated at 37°C for 1 h on a shaker, quickly frozen for 15 min at −80°C, followed by another incubation for 1 h. Remaining gel debris were removed by Spin-X 0.45 µm centrifuge tubes. The libraries were precipitated overnight at −80°C by adding 1 µl of GlycoBlue (Invitrogen) as co-precipitant, 0.1 times the volume of Acetate 3 M (pH 5.5), and three times the volume of 100% ethanol. Library concentrations were estimated using QuantiFluor ONE ds DNAsystem on a Quantus fluorometer (Promega). Pooled libraries were sequenced on NextSeq 500 with NextSeq 500/550 High Output Kit v2.5, 75 cycles (Illumina).

5´P sRNA-seq library preparation
Library preparation was done with NEBNext Multiplex Small RNA Library prep kit for Illumina (New England Biolabs) according to the manufactures instructions except for downscaling all samples to half volume and using 30 ng of input RNA (100 ng recommended) with an appropriate 1:2 dilution of all adapters. This corresponds to the 5XP-seq preparation as above, but with the following changes: In Step [ii.], 5´ RNA ligation was performed using 0.5 µl 5´ SR Adapater (NEB-kit) instead of the 5P RNA oligo.
In Steps [iii-iv.], clean up of cDNA, ligation of the 5´ DNA adapter and the Second LNA 2SrRNA oligo block was not performed. The cDNA reaction mix was immediately amplified according to kit protocol. In

5´X sRNA-seq library preparation
5X-seq library preparation was identical to 5XP-seq library preparation, except for the following adjustments: In Steps [ii-iii.], 5´ RNA ligation of the 5P-tag RNA oligo and the First 2SrRNA block was not performed. The cDNA reaction mix was therefore compensated by adding 0.5 µl 5´Ligation Reaction Buffer 10x (NEB-kit) and 1.75 µl nuclease-free H 2 O.

Quality control and pre-processing
Raw fastq data files have been deposited in Sequence Read Archive under the accession number PRJNA658107. All libraries passed Illumina's default quality control. Demultiplexed fastq files were downloaded from BaseSpace using BaseMount (Illumina) and lanes were merged for each sample. For all library-types the 3´ adapters were trimmed using Cutadapt 2.3 [23] with the following input: -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA CAT -discard-untrimmed -nextseq-trim = 20 -O 5 -m 5. This specifically trims nextseq type sequences between 5-70 nt with at least 5 nt of the 3´ adapter present in the 3´ end and discarding all other sequences. For 5XP-seq libraries, we reloaded the trimmed sequences into Cutadapt using the following input: -g TGGCAA CGATC -m 5 -untrimmed-output. This saves sequences with or without the 5´-tag in separate fastq output files. All trimmed fastq files were quality-filtered using fastq_quality_filter -q 20 -p 80 -v available in the FASTX Toolkit 0.0.14 (https://github.com/agor don/fastx_toolkit), which only retain sequences with PHRED score >20 in more than 80% of nucleotides. The integrity of trimmed and quality filtered fastq files were further verified using FastQC 0.11.9 (https://github.com/s-andrews/FastQC).

Sequence counting, filtering and annotation
Data were summarized over counts of unique sequences across all samples. Tools for such sequence-based approaches are availablesuch as Sports and MintMap [24,25] -and may be contrasted against feature-based counting approaches, where sequences are counted over genomic features -such as the genomic coordinates of miRNA and piRNA. One benefit of unique sequence-based approaches is that the sequence of the original read is maintained during counting, which is often lost due to initial mismatch allowance in feature-based approaches. Here we counted unique sequences in trimmed fastq files using customized scripts in R 3.4.4 [26]. For parallel and efficient large data processing, foreach 1.5.0 [27] and the data table [28] packages were used. Reading and processing fastq-files were done using the ShortRead package [29]. Prior to normalization, noise was reduced by only including sequences with at least 5 counts in 100% of samples within a given method (5P-seq, 5X-seq, 5XP-seq, etc.). This dataset contained 330 971 unique small-RNA sequences with mean total counts per sample of 8 993 458. Normalized counts in reads per million (rpm) were generated by dividing the individual sequence counts (reads) with the total sequence counts for each sample. For some analysis that targeted highly abundant sRNAs, we further reduced the data by only including sequences with at least 10 rpm in 100% of the samples within a given method included in the analysis. Sfiome critical values of the datasets are available in Table 1.
Sequence annotation was performed by mapping unique sequences against small-RNA sequence reference databases using bowtie 1.2.2 [30]. This was done in cycles, where each cycle allowed one additional mismatch, from 0 to 3 mismatches. A sequence was only destined for another bowtie annotation cycle if it failed to align to any of the databases in the previous cycle. Fasta reference sequence files were attained from the following databases: miRNA = miRbase and Ensembl, tRNA = GtRNA db (nuclear) and Ensembl (mitochondrial), rRNA = Ensembl (both nuclear and mitochondrial), piRNA = pirBase, Other sncRNA = Ensembl. We used dm6 versions across all databases. To resolve multimapping issues when sequences matched multiple databases within the same mismatch category, we applied the following hierarchy miRNA > Mt_tRNA > tRNA > MT_rRNA > rRNA > piRNA > other sncRNA. A sequence mapped to miRNA with 1 mismatch and piRNA with 0 mismatch, would therefore annotate as piRNA, while if both had 0 mismatch it would annotate as miRNA.

Statistical methods and visualization
All statistical analysis was done in R 3.4.4 [26]. Data visualization was primarily done using the ggplot2 package in R [31] and finalized using Inkscape 0.92. For fitting the saturation plots we used non-linear least square regression with an asymptotic self-starter (r functions: nls and SSasymp). For hierarchical clustering and correlation plots, we used the corrplot package [32]. For tRNA/rRNA coverage we used custom scripts wrapped around the vmatchPattern function in the Biostrings package [33]. A version of this script has been published [7] and are available here: https://github.com/ Danis102/Natt_et_al_2019_Human_Sperm_Rapid_Response_ to_Diet/blob/master/S1_Text.R. Reference sequences were obtained from GtRNAdb (tRNA) and Ensembl (rRNA). GtRNAdb ss-files were used to map tRNA loops.

5´P, 5´X and 5´XP sRNA-seq employ distinct library preparation strategies
5´-P and 3´-OH dependent ligations amplify some sRNA species, including piRNA and miRNA, but will exclude other sRNA species without a 5´-P that may have biological functions. We refer to this ligation strategy as 5´-P dependent sRNA-seq (from here on: 5P-seq). This should not be mistaken for 5´-P dependent methods recently developed for long RNA sequencing [34][35][36]. It has been shown that the 5´P-ligation is sensitive to secondary structures of the RNA, which further restricts capturing a broad range of sRNA species (for review see [15]). To include highly structured RNA species without 5´-P ends, strategies have been developed in which the 5´ adapter is not ligated to the RNA but is added later to the cDNA [37,38]. We here refer to such methods as 5´ inclusive sRNA-seq (5X-seq). Building on these two approaches we developed 5´XP sRNA-seq (5XP-seq). In this method, the first adapter is ligated to the 3´ end and an 11-nt long oligo is ligated to the 5´ end of sRNA, which allows efficient tagging of RNAs with 5´-P. Then, after reverse transcription of the RNA ligation product, a sequencing-compatible adapter is ligated to the corresponding 3´ end of all cDNA (Fig. 1, Supplementary Fig. 1). Through this approach sRNA with 5´-P and 5´-X can be sequenced in the same library and then bioinformatically discerned based on the presence or absence of the 5´ oligonucleotide tag.

5P and 5X sRNA-seq generate libraries with substantially different sRNA content
To better understand how 5P and 5X sRNA-seq vary in their enrichment for different RNA species, we first compared libraries generated using a popular commercial 5P-seq kit (NEBNext Multiplex Small RNA Library prep kit for Illumina; New England Biolabs) and an adaptation of published 5X-seq protocols [21,[37][38][39]. For comparative reasons, we modified the 5X-seq protocol to work with the reverse transcriptase of the 5P-seq kit. We extracted total RNA from a pool of 50 Drosophila embryos, 0.5-2.5 hours old, which was used as input for all sequencing library preparations. Each preparation used only 30 ng of total RNA that equals to one third of the RNA extracted from one single drosophila embryo, rendering the protocols highly efficient. We adopted sequence-based counting of the reads, for which Library stats for the 5P-and 5X-type libraries using single-end sequencing for 75 cycles are found in Table 1. Data from 5P-seq libraries were in accordance with our previous observations using this kit. To better validate the 5X-seq protocol, we generated a second batch of 5X-seq using independent embryos (Batch B).
Focusing on intermediately-to-strongly expressed sRNA (>10 cpm), 5P-seq resulted in more alignments against miRNA and piRNA, and less alignments against rRNA and tRNA, when compared to 5X-seq (Fig. 2). This result was similar for both cumulative counts ( Fig. 2A) and relative fold changes (Fig. 2B), indicating a general trend across most sRNA. This difference is expected since miRNA and piRNA are commonly 5´ phosphorylated and should be preferably enriched by 5P-seq, while tRNA and rRNA show more diversity in the 5´-end and should preferably be caught by 5X-seq. When not allowing mismatches in the annotation 5Xseq annotated better to the references than 5P-seq (light grey Fig.  2A,Fig. 2B). The advantage for applying sequence-based counting is that mismatches in the alignments between sequenced reads and small-RNA reference sequences only affect the annotation without changing the raw count table, which is often the case for featurebased counting. Allowing up to three mismatches when annotating sequences against small-RNA references removed nearly all sequences with no annotation in 5P-seq (light grey Fig. 2C,Fig.  2D). Since libraries were generated from the same RNA pool, this indicates that 5P-seq may have suffered from misreading during cDNA-synthesis possibly by interference from RNA-modifica tions. Removing sequences mapping to rRNAs, however, also removed this difference (Supplementary Fig. 2).
While the majority of sRNA identified by 5X-seq were also present in 5P-seq libraries, some sRNAs were completely absent in one or the other library, which resulted in a more than 20-fold change ( Fig. 2B and Fig. 2D; Supplementary Table 1). This indicates substantial differences in the composition of sRNA species identified by the two different library strategies. Therefore, studies that have applied only 5P-based sRNA-seq strategies may have misinterpreted the proportions of small RNA present in their samples.

5XP sRNA-seq distinguishes piRNA and miRNA
To preserve sequence diversity given the observed biases described above, in parallel to the 5P and 5X-seq libraries, we generated 5XP-seq libraries using the same two pools of RNA from Drosophila embryos (Batch A and B). Table 1 summarizes the general properties of 5XP-seq after being sorted into the two sub-libraries, either containing (5P-tag) and not containing (5X-notag) the 5´ oligo tag. In theory, 5Ptag and 5X-notag of 5XP-seq would correspond to the 5P-seq and 5X-seq libraries, respectively. Similar to the difference observed between 5P-and 5X-seq, 5XP-seq had more specificity towards miRNA and piRNA in the 5P-tag sub-libraries, than in the 5X-notag sub-libraries (Table 1). In fact, sRNA diversity was highly similar between 5X-notag sub-libraries and regular 5X libraries prepared without tagging 5´-P (Table 1). In support, expression levels of 5X-seq and 5X-notag cohered strongly, both by correlation and unsupervised clustering (Fig. 3A).
In contrast, we determined higher specificity (~2-fold) of miRNA and piRNA species in 5P-tag sub-libraries when compared to regular 5P-seq libraries (Table 1). This indicates that ligating a short oligonucleotide tag at the 5´-terminal -as performed in 5XP-seq -captures 5´-P sRNAs more efficiently when compared to the long 5´ adapter ligation used in regular 5P-seq. The principle steps of 5′XP sRNA-seq. Small-RNA sequencing using 5′XP sRNA-seq generates two sub-libraries, either sensitive (5P-tag) or insensitive (5Xnotag) to a phosphate (P) in the 5′-terminal of the original RNA. This is done by tagging RNA fragments with 5′-P using a sequence-specific oligo that identifies the 5′-P in the downstream bioinformatic analysis.
Nonetheless, the 5P-tag sub-libraries only contributed 10% to 20% of the total library sizes in 5XP-seq (Table 1) and correlated poorly with regular 5P-seq (Fig. 3A). This led to two possible explanations. Firstly, 5´-P sRNA may constitute a smaller proportion of the total sRNA population. Secondly, 5XP-seq may suffer from inefficient ligation of the 5´ oligo tag. In support of the former, a saturation analysis showed that the 5P-tag sub-libraries were equally saturated to the other libraries. Thus, irrespective of library-type, increasing the sequencing depths would result in a little gain in identifying new sRNA specimens from the sRNA population targeted by each method (Supplementary Fig. 3).
To further test the hypothesis that the 5P-tag sub-library of 5XP-seq targets a smaller and possibly more specific population of 5´-P sRNAs, we characterized each library type in more details. Size distribution of different sRNA species confirmed a sharp peak at 22 nt in 5P-tag sub-libraries, while the other library strategies showed more diversity (Fig. 3B). The peak at 22nt is the typical size of miRNA. However, we detected that many sequences of 22nt also mapped to rRNAs. Therefore, we extracted miRNA and piRNA that are known 5´-P subspecies for further comparison. Both the 5P-tag sub-libraries and regular 5P libraries where enriched with a sharp 22-23 nt miRNA peak, and a broader 22-29 nt piRNA peak (left panel Fig. 3C). However, the 5P-tag sub-libraries of 5XP-seq had fewer non-canonical piRNA fragments that were shorter than 21 nt and showed a higher proportion of canonical piRNA with uridine bias at their first nucleotide (middle panel Fig. 3C; [40]). Interestingly, the first nucleotide uridine bias was also observed in miRNA (right panel Fig. 3C), which could indicate a new subspecies of miRNA in early drosophila embryos.
A combination of a lower proportion of unannotated small RNA (light grey in Figs. 3B and Figs. 2A), as well as a smaller target population of sRNA ( Supplementary Fig. 3) and more canonical piRNA profiles (Fig. 3C), altogether indicated that the 5P-tag sub-library of 5XP-seq enriches for 5´-P RNA better than regular 5P-seq. The low correlation in the expression profiles observed between these two strategies (Fig. 3A) was also rescued (Fig. 3D) after noise from less studied subspecies of sRNA was reduced by analysing miRNAs only, which represent the most well-characterized 5´-P sRNA species.

Library strategies are defined by opposite terminal enrichments of small rRNA
Loss of peak integrity of rRNA subunits is often used as an indicator of RNA degradation in studies of long RNA, often referred to as the RNA integrity number (RIN). While a similar measurement is lacking for sRNA, all our libraries showed very sharp peaks indicating specifically processed rRNA with high sRNA integrity (Fig. 4A,Fig. 4B).
The 5P-tag sub-libraries in 5XP-seq were dominated by a peak centred at 22 nt (Fig. 3B), which primarily contained an rRNA fragment (rRF) originating from the 5´-end of 18S rRNA subunit (Fig. 4A). While this rRF is not classified as a miRNA in mirBase, recent findings in Zebrafish show that it may interact with Argonaute proteins in a miRNA-like matter [5,41]. Low detection of this rRF in regular 5P-seq (Fig. 4B) would have been overlooked in other studies. Nonetheless, unlike 5X-type libraries that showed very similar rRF profiles, 5P-type libraries were at first dissimilar ( Fig. 4A.B). A more detailed view on the rRNA subunits revealed, however, that most peaks were present in both the 5P-tag and regular 5P libraries but varied in detection level (Fig. 4C,Fig. 4E). Together this indicates that the 5´ RNA ligation strategy severely affects the enrichment of specific rRF, which could make crossstudy validation challenging.
We also noticed a 5´ bias in 5P-type and a 3´ bias in 5X-type libraries (Fig. 4A,Fig. 4B), and therefore classified each fragment by their 5´ start or 3´ end in a mature rRNA subunit. This confirmed that rRFs generated from the 5´ terminals were enriched in 5P-seq, while rRFs deriving from the 3´ terminals were enriched in 5X-seq (Fig. 4F). Importantly, 5XP-seq contained rRFs originating from both the 5´ and 3´ terminal but kept them separated in the 5P-tag and 5X-notag sub-libraries, respectively. These differences indicate that 5P-type libraries lose sRNA primarily at the 3´ end, possibly as a consequence of interference with RNA-modifications during reverse transcription. Interestingly, 5´ rRF bias has recently been reported in humans using independent 5P-seq kits [8,42].

Coverage of small tRNA fragments strongly depends on library preparation strategy
Like rRFs, tRNA-derived sRNA (tRFs) can also originate from the 5´ or 3´ terminals. In addition, internal (i´) fragments, which neither start nor end in the terminals of full-length tRNA, have been described [8,25]. We used these classifications as an independent confirmation of the differences in 5´ and 3´ affinities between  the 5P-and 5X-type libraries that we observed in sRNA derived from rRNA. As expected, 5P-seq had a clear 5´ preference, while 5X-seq progressively increased towards the 3´ terminal (Fig. 5A). Similar to the rRFs, 5XP-seq contained tRFs originating from both the 5´ and 3´ terminal and successfully separated them into the two sub-libraries.
In addition to 5´, i´ and 3´ stratification, we classified each tRF according to the isodecoder of the mature tRNA in order to better understand the possible biases in the tRF coverage introduced by different library strategies. This enables an overview of tRF subtypes such as tRNA-derived stress-induced small RNA (tiRNA) that are generated by angiogenin dependent anticodon cleavage of specific isodecoders resulting in tRNA halves [43], as well as the recently discovered nuclear internal T-loop tRFs (nitRNA) [7]. As expected, 5X-notag sub-libraries generated primarily i´ and 3´ fragments that were strongly replicated by regular 5X-seq, but poorly replicated by 5P-seq (Fig. 5B). Again, while the most covered tRFs in the 5P-tag sub-libraries of 5XP-seq showed similarities to regular 5P-seq, we found more variability on the isodecoder levels (Fig. 5C). In highly abundant isodecoders, there was also an enrichment of sRNA species derived from the 5´ end of tRNAs but this was less pronounced in less abundant isodecoders.
In line with the rRNA analysis, more in-depth analysis of a selection of tRNA isodecoders showed that most tRFs were detected by each method but detection levels varied greatly across methods. For example, the 5P-tag sub-library was superior in detecting both the 5´and 3´ halves generated by angiogeninmediated cleavage of AspGUC, which generates a known tiRNA (Fig. 5D). In another confirmed tiRNA originating from GlyGCC, the 5´-half was detected in both 5P-tag and regular 5P libraries (Fig. 5E). At the 3´-end, a short fragment generated by T-loop cleavage was detected in the 5P-tag, 5X-notag and regular 5X libraries but to a much lesser degree in regular 5P libraries (Fig.  5E). LysCUU has previously been shown to generate many different tRFs [7,44]. This was confirmed by 5X-notag, regular 5X and regular 5P, but to a lesser degree in 5P-tag libraries (Fig. 5F). Interestingly, primarily 5X-notag and regular 5X libraries detected a weakly expressed long T-loop internal tRFs comparable to the nitRNA that was recently detected in human sperm [7]. Similarly, 5X-notag and regular 5X, but not 5P-tag and regular 5P, detected a short nitRNA in ArgCCU generated by cleavage in the T-and anticodon-loops that was almost identical to the one previously reported in human sperm [7].
Together these findings show that the choice between 5Psensitive and 5P-insensitive methods is critical when studying tRFs using sequencing. Thus, using a method that conserves the diversity of both phosphorylated and non-phosphorylated sRNA -like 5XP sRNA-seq -is highly desirable.

Discussion
Here, we presented a novel method -5XP sRNA-seq -that sequences sRNA with and without 5´ phosphorylation (5´-P) using the same library. Our protocol was optimized for low starting material which makes it attractive for use with precious samples. By ligating a short 5´-P oligonucleotide tag, we demonstrated that 5´-P sRNAs can be separated from other sRNAs in the downstream analysis. We reported multiple examples in which a one-sided approach would have had severe consequences for the interpretation and crossvalidation of sRNA experiments. We also provided evidence that ligating a short oligonucleotide tag enhances the recovery of 5´-P sRNAs, such as piRNA and miRNA.
It has become increasingly clear that the 5´ terminal of RNA is subject to diverse modifications. As this is the start site for transcription, it means that if the first molecule is a nucleotide, it will initially have a triphosphate. If transcribed by RNA polymerase II, to become, for example, mRNA, a cap consisting of modified guanine (G) nucleotide is first added to the initial nucleotide during transcription. Primary piRNAs are also transcribed by RNA polymerase II and receive a similar cap. These 5 ´ caps regulate the stability and determine the downstream fate of the transcript. The process of generating sRNAs from longer precursors by nuclease digestion will result in either -P or -OH 5 ´ terminals, depending on the nuclease. Intriguingly, it was recently shown that sRNAs are capped as well, indicating that not all caps are added during transcription but rather after the cleavage to shorter transcripts [18,45]. In addition to the guanine-based caps, recent discoveries show that the sRNA may have other types of 5´-caps.
First reported in bacteria, and suggested to be an alternative to the eukaryotic m 7 G cap, 5´ NAD caps have now been demonstrated in several species including humans (see review: [46]). Interestingly, NAD + , as well as NADH and dpCoA, can be incorporated into RNA during transcription initiation, thus serving as non-canonical initiating nucleotides [47]. In human cells, the amount of RNA with 5´-NAD caps change in response to shifting cellular NAD concentrations [48], while data from Arabidopsis show enrichment of NAD-caps for transcripts involved in redox responses [46]. These two findings, together with the fact that NAD+/NADH is one of the most important intracellular redox pairs, suggest that NAD-capping of RNA may regulate gene expression as a function of the cell's redox state. Moreover, members of the NUDIX hydrolase superfamily have been shown to remove not only 5´-NAD but also 5´-FAD and 5´-CoA, indicating that metabolite-containing 5´ caps might be a more widespread phenomenon than previously thought [49]. Whether NAD or other metabolic 5´-caps are used to regulate sRNA, however, is yet to be discovered.
These recent findings demonstrate the urge to develop new sequencing methods that not only include 5´ modifications other than phosphate but also correctly identify each modification in downstream analyses. By using RNA ligases with specificity towards different 5´ terminals together with unique oligonucleotide tags, 5´XP sRNA-seq is the first step towards a more holistic sequencing approach. By tagging 5´-P in this way, our data suggest that 5´-P sRNAs constitute only 10-20% of the total pool of sRNA in Drosophila embryos. Therefore, up to 90% of sRNA species may have previously escaped our attention since most commercial kits are based on 5´-P dependent adapter ligation. Using 5´XP sRNA-seq on RNA from Drosophila embryos, we discovered several new sRNA species without 5´-P, including cross-species validation of a zebrafish 5´-rRF from the 18S rRNA subunit [41], and nitRNA derived from internal tRNA T-loop cleavage, previously described only in human sperm [7].
5´XP sRNA-seq bare some limitations that need to be solved in future research. Firstly, while we clearly show a positive impact on sRNA diversity with an inclusive strategy for 5´ adapter ligation, the 3´ adapter ligation is still prone to bias [15]. There are 3´ modifications that may cause similar impacts on sRNA coverage as we have described for the 5´ terminal. This involves for example 2ʹ,3´-cyclic phosphate (cP) [16] and 2´O methylation (2´O-Me) [17]. It must be emphasized, however, that having an all-inclusive strategy also at the 3´ terminal would dramatically increase the noise from partly degraded RNA generated, for example, by RNase A digestion [50]. Since 5´XP sRNA-seq depends on 3´-OH for 3´ adaptor ligation, this ensures that many degradation products are excluded. In addition, we have prolonged the 3´ adapter ligation step at low temperature, which according to the 3´ ligase manufacturer should enhance the ligation to 2´O-Me RNA found on, for example, animal piRNA and plant miRNA [17]. Such lowtemperature ligation comes, however, with a trade-off against another possible bias. At lower temperatures, the adaptor may cofold with target RNAs, which can either inhibit or enhance the ligation process [51,52]. This kind of bias has previously been solved by using a pool of adaptors that differs in sequence composition or contains random nucleotides at a defined rate [14,53]. While this should be explored in future protocols for 5´XP sRNAseq, the adaptor ligation at the 5´-terminal is less affected by cofolding since it is performed on cDNA at a higher temperature, which would limit the formation of three-dimensional structures.
Nonetheless, ligation of the 5´P tag is carried out on RNA at low temperature. Introducing random nucleotides in such a short oligo (11 nt) would severely interfere with the bioinformatic division of the 5´P tag and 5´X notag sub-libraries. Future research should therefore explore smaller sets of 5´P tag oligos with unrelated sequence compositions to tackle this problem. One must bear in mind, however, that some ligases also show preferences against specific nucleotides in the terminals of their targets [39]. We have tried to design the 5´P tag for optimal performance with the NEBNext 5´ Ligation Mix. Thus, introducing a pool of 5 ´P tags may affect ligation both negatively and positively.

Conclusion
By combining 5´ insensitive library strategies with 5´ sensitive tagging, we have demonstrated an innovative strategy for solving an intrinsic bias in modern sRNA sequencing. Our results represent an important step towards a new generation in sRNA sequencing that can explore the complete world of sRNAs in single low-input experiments. Future technologies will be aimed at further expanding the number of specific terminal RNA modifications that can be identified by genome-wide sequencing approaches. We anticipate discovering a rich repertoire of functional RNA modifications that will greatly expand our understanding of how genomes are regulated.