NET-prism enables RNA polymerase-dedicated transcriptional interrogation at nucleotide resolution

ABSTRACT The advent of quantitative approaches that enable interrogation of transcription at single nucleotide resolution has allowed a novel understanding of transcriptional regulation previously undefined. However, little is known, at such high resolution, how transcription factors directly influence RNA Pol II pausing and directionality. To map the impact of transcription/elongation factors on transcription dynamics genome-wide at base pair resolution, we developed an adapted NET-seq protocol called NET-prism (Native Elongating Transcription by Polymerase-Regulated Immunoprecipitants in the Mammalian genome). Application of NET-prism on elongation factors (Spt6, Ssrp1), splicing factors (Sf1), and components of the pre-initiation complex (PIC) (TFIID, and Mediator) reveals their inherent command on transcription dynamics, with regards to directionality and pausing over promoters, splice sites, and enhancers/super-enhancers. NET-prism will be broadly applicable as it exposes transcription factor/Pol II dependent topographic specificity and thus, a new degree of regulatory complexity during gene expression.


Introduction
Transcription is a highly dynamic process that comprises three different stages. Initiation involves RNA Polymerase II recruitment to the promoter followed by release of RNA Pol II towards progressive elongation. Transcriptional termination is promoted when RNA transcripts are processed and RNA Pol II is released from the chromatin template [1,2]. This dynamic shift from one stage to another is facilitated by a compendium of regulatory processes involving phosphorylation of the Pol II C-terminal domain (CTD) and recruitment of factors that facilitate and regulate RNA Pol II activity [3][4][5].
Approaches that precisely map the position of RNA Pol II at a high resolution have provided a deeper insight into transcriptional regulatory mechanisms [6][7][8][9][10][11]. For example, the development of the human NET-seq protocol quantitatively purifies Pol II in the presence of a strong Pol II inhibitor hence omitting the utilisation of an antibody [9]. Although, this particular approach successfully maps the 3ʹend of nascent RNA to reveal the strand-specific position of Pol II with single nucleotide resolution, it does not distinguish between different Pol II variants or specific proteindependent interactions. A similar protocol, the mammalian NET-seq protocol (mNET-seq) uses an immunoprecipitation step to capture the nascent RNA produced by different C-terminal domain (CTD) phosphorylated forms of Pol II [12]. Immunoprecipitation has a potential second benefit as it would allow in principle the interrogation of transcription factor -RNA Pol II interaction genome-wide quantitatively, at nucleotide resolution and strand-specificity, none of which is possible using conventional ChIP-seq. In S. cerevisiae, such an approachcoined TEF-seqwas developed to interrogate Paf1 -RNA Pol II interaction [7] and allowed new insight into Paf1 requirements for gene expression.
Here, we sought to develop a mammalian counterpart to TEF-seq, which includes an immunoprecipitation step of RNA Pol II associated factors, while being efficient enough to capture sufficient amounts of nascent RNA for processing the latter as part of NET-seq type libraries. A detailed protocol is outlined in Figure 1(a) and also in a step-by-step form as part of the Supplementary Information ('NET-prism protocol').

Extraction conditions of NET-prism allow for co-purification of RNA Pol II with known associated complex members
Similarly to the original yeast NET-seq and TEF-seq protocols [6,7], we relied on a strong inhibitor for Pol II (α-amanitin) to prevent run-on of the polymerase during all lysis steps and on DNase I to solubilize chromatin. We optimized conditions for DNase I treatment in the absence and presence of urea, which proved to be necessary for efficient solubilization. We found that 100U DNase I and 50mM urea were sufficient to release a large fraction of engaged RNA Pol II from chromatin (Supplementary Figure 1). We then wanted to investigate if we were able to IP known and new co-factors of RNA Pol II under these experimental conditions and examined the total Pol II protein interactome by Mass spectrometry using the same extraction conditions as NET-prism to identify such factors in a native chromatin state.
We identified both, positive (Supt5, Supt6, FACT, Paf1) and negative (NELF) elongation factors as well as splicing (Srsf5, Srsf6, Sf1) and TFIID (Taf10, Taf15) components as significantly enriched with Pol II under NET-prism conditions (Figure 1(b) and Supplementary Table 1), equipping researchers with a list to guide any follow-up experimentation. We were also able to confirm some of these interactions using immunoprecipitation followed by western blotting (Figure 1(c)).

NET-prism captures unique transcriptional patterns of RNA Pol II-associated factors
We picked one transcription factor, TFIID (antibody raised against TBP) and two elongation factors, Spt6 and Sssrp1 (subunit of the FACT heterodimer) to validate the NETprism approach and interrogate the impact of these factors on RNA Pol II activity. We also performed an IP for Mediator (Med14), serving as a negative control, since it did not display a significant association with Pol II under the conditions used to solubilise chromatin (Figure 1(c)). The data were highly reproducible among biological replicates (Supplementary Figure 2A) and exhibited diverse correlations with total RNA Pol II over promoter regions (Supplementary Figure  2B -NET-seq/prism), indicating that different TFs establish unique patterns of RNA Pol II stalling. Indeed, aligned and averaged NET-prism profiles over the TSS demonstrate additional regulatory complexity during transcriptional initiation and elongation, suggesting that TF binding specificity directly affects RNA Pol II initiation and elongation dynamics. IPs for elongation factors Spt6 and Ssrp1 show strong and broad enrichment of the Pol II complex. These data are in agreement with ChIP-seq densities for both elongation factors [13]. On the other hand, TFIID-bound RNA Pol II displays a sharp signal centred around the TSS, whereas an IP for Mediator (Med14) yields no nascent RNA transcripts as there is minimal interaction between Mediator and RNA Pol II under the conditions used (Figure 2(a)). Similar RNA Pol II patterns were also confirmed at a single gene level (Figure 2(b)). To test more systematically how different transcription factors might influence RNA Pol II initiation and elongation, we sought to determine whether different NET-prism libraries provide improved resolution of RNA Pol II distribution patterns. We calculated the travelling ratio (TR) in the sense direction which is defined as the density of RNA Pol II over the promoter (−30 to +250 bp around the TSS) versus the gene body area (+300 bp downstream of the TSS to −200 bp upstream of the TES). All NET-prism libraries exhibited different TRs indicating different pause-release dynamics of Pol II when bound by different TFs (Figure 2(c)) These data suggest that NET-prism is indeed able to resolve mechanistic and dynamic interplays between transcription factors and active RNA Pol II.
Interestingly, as all three factors examined here (Spt6, SSRP1 and as TFIID antibody was raised against TBP) also interact with RNA Pol I and III, we asked if nascent transcript would also stem from gene products of these two polymerases. Indeed, all three IPs against these proteins also pulldown nascent transcripts generated by RNA Pol I and Pol III (Supplementary Figure 2C), suggesting that NET-prism might be an ideal tool for the investigation of all three RNA polymerases.
Sequential NET-prism confirms that nascent RNA stems from direct interaction between active RNA Pol II and Ssrp1 Using western blotting, we showed that the conditions of NET-prism allow co-purification of RNA Pol II in the IPs against transcription factors (Figure 1(c)). However, at least one of them, Ssrp1 has been previously reported to bind RNA [14][15][16]. Therefore, we decided to test more rigorously if the recovered nascent RNA is specifically associated with RNA Pol II and does not stem from direct binding of Ssrp1 to nascent RNA. In order to address this question, we performed a sequential IP as part of NET-prism as outlined in Figure  3(a).
Initially, RNA Pol II was immunoprecipitated using an anti-CTD antibody, followed by competitive elution of RNA Pol II by an excess of CTD peptide (the exact peptide used to generate the α-CTD antibody). The eluent subsequently served as input for the second round of IP using an anti-SSRP1 antibody to capture exclusively SSRP1-bound Pol II complexes. The isolated nascent RNA was subsequently used for library generation. Importantly, comparing single and sequential IP by metagene profiling (Figure 3(b)) and single gene interrogation (Figure 3(c)) revealed high similarity, strongly suggesting that indeed, NET-prism captures only nascent RNA bound by RNA polymerases and not by TFs.

NET-prism reveals high resolution Pol II pausing at intron-exon boundaries
Transcriptional elongation rates can affect splicing outcomes suggesting that transcription and splicing are tightly coupled [17,18]. Data generated by human NET-seq, mNET-seq, and PRO-seq are consistent with this kinetic model of splicing regulation [9,11,12]. While mNET-seq already implicated different RNA Pol II variants to play distinct roles during splicing dynamics [12], it is not known, whether transcription (elongation) factors facilitate RNA Pol II pausing at splice sites. We also reasoned that NET-prism might be an ideal tool to dissect splicing factor -RNA Pol II interaction at splice sites. Therefore, we performed an additional NETprism library for Splicing factor 1 (Sf1) and included this in our splicing dynamics analysis. As splicing intermediates are known NET-seq contaminants due to the presence of 3ʹ-OH groups in these RNAs [9], we removed them to avoid bias. For the splicing dynamics analysis, we assessed total RNA Pol II (NET-seq [19]) and NET-prism data for Ssrp1, Spt6, TFIID and Sf1 over intron-exon boundaries. Total RNA Pol II in mouse ES cells showed increased pausing at exon boundaries similarly to human cells [9] (Figure 4(a) -Total Pol II). Exploration of NET-prism datasets confirmed that only Sf1 exhibited similar pausing at exon boundaries (Figure 3(a) & Supplementary Figure 4A). In addition, components of the PIC did not associate with Pol II pausing over spliced sites (Supplementary Figure 4B). Interesting to note is also the fact that NET-prism libraries displayed higher Pol II density over exons as opposed to introns suggesting that transcriptional elongation is slower at exons in mouse ES cells (Figure 4(b,c)). In addition, Sf1-PolII interaction clearly marks exons, indicating specificity of the approach. Taken together, these results augment the kinetic model of transcription and splicing coupling. Our data in combination with previously published results therefore suggest that transcriptional splicing mechanics is facilitated by Pol II variants and elongation factors differently and NET-prism might represent one ideal tool to address this at high resolution.

NET-prism reveals diverse transcriptional dynamics at enhancers
Enhancers and super-enhancers have been shown to play a prominent role in the control of gene expression programs essential for cell identity across many mammalian cell types [20][21][22]. Production of enhancer RNAs (eRNAs) is bidirectional and is governed by distinctive patterns of chromatin accessibility [23], but it is not well characterised whether the same transcriptional rules apply over enhancers as in promoters, in terms of initiation and elongation. We therefore extended our analysis to identify high resolution Pol II stalling at distal and super-enhancers using NET-prism. Highest correlations were identified among Total Pol II and Ssrp1 both for distal and super-enhancers ( Figure 5(a)). Total Pol II and TFs exhibited significantly higher ChIP-seq density over super-enhancers as opposed to distal enhancers. Concomitantly, increased transcriptional activity was confirmed over super-enhancers via NET-prism suggesting TF density being proportional to the degree of Pol II recruitment ( Figure 5(b)). Strikingly, both metaplot profiling (Supplementary Figure 5) and single enhancer ( Figure 5(c)) interrogation of NET-prism transcriptional activity exposed distinctive topographic Pol II stalling; Ssrp1 displayed patterns similar to transcriptional initiation whereas Spt6 imitated a trail reminiscent of transcriptional elongation. Moreover, transcriptional activity prompted by TFIID also supports, to some degree, a notion of transcriptional initiation over enhancers (Figure 5(c)).

Discussion
Here, we have developed a new approach to accurately assess transcriptional topography at a high resolution. In summary, NET-prism allows the direct strand-specific investigation of the transcriptional landscape at single nucleotide resolution of any protein of interest in complex with RNA Pol II. Its robustness enables a deeper insight into the interplay of transcriptional mechanisms conferred by different Pol II variants and proteins that are bound to Pol II. The comprehensive Pol IIprotein interactome that we provide here (Supplementary Table 1) facilitates the choice of the protein of interest when applying NET-prism. In addition, given the right RNA polymerase inhibitors and antibodies, NET-prism can be extended to specifically interrogate nascent transcription governed by either RNA Pol I or Pol III.
We hypothesize that NET-prism will be an ideal tool to investigate transcription/elongation factor interactions with actively travelling RNA polymerase at single nucleotide resolution and with strand specificity. An analogous approach has been previously developed in yeast [7], where a variant of the yeast NET-seq protocol [6], called TEF-seq, reveals distinctive patterns of Pol II when bound by diverse elongation factors (Paf1, Spt6, Spt16). Similarly to this approach, NET-prism exposes diverse Pol II signals for every immunoprecipitated TF implying the different dynamics conferred by TFassociated RNA Pol II.
Moreover, our study yields a global picture of how transcriptional elongation is affected at splicing sites and NETprism might shed light on an unresolved dogma encompassing splicing catalysis. The idea of transcriptional elongation influencing alternative splicing arises from two unique models; the recruitment model (differential recruitment of splicing factors) and the kinetic model (Pol II pausing determines the timing in which splicing sites are presented) [5,18]. Similarly to other high resolution approaches [9,11,12], we show that splicing is associated with Pol II exon density and strong pauses at both the 3ʹ and 5'SS, consistent with the kinetic model.
It is important to recognize though that NET-prismsimilarly to ChIP-seqgreatly relies on the quality of the antibody used. Antibody cross-reactivity might result in unspecific binding and thus, generation of artefactual RNA Pol II stalling patterns. Therefore, the choice of a highly specific antibody for the protein of interest is important to achieve unique RNA Pol II footprints.
Similarly to the human NET-seq [9], we expect the adaptation of NET-prism to be equally straightforward in any higher eukaryotic cell type. The use of an IP step in NET-prism makes it practical for studying a range of different Pol IIassociated factors in order to improve our understanding of transcriptional elongation and its connection to transcript fate. The combination of NET-prism with a high resolution ChIP-seq technique, such as ChIP-nexus [24], could illuminate how exactly in vivo binding of transcription or splicing factors correlates with transcriptional activity over different cell states and conditions. Therefore, NET-prism will become a valuable tool for unravelling transcriptional and regulatory complexity.

Nuclear extraction and DNase treament
A detailed protocol is available in the Supplementary  Information ('NET-prism protocol'). Briefly, 10 8 ES cells were used for each IP. It is important to split cells down to five batches of 2 × 10 7 each when performing nuclei extraction. All extraction steps are performed on ice to avoid degradation of the nascent RNA. 2 × 10 7 cells were treated with 200 μl of cytoplasmic lysis buffer (0.15% (vol/vol) NP-40, 10 mM Tris-HCl (pH 7.0), 150 mM NaCl, 25 µM αamanitin (Epichem), 10 U RNasin Ribonuclease inhibitor (Promega) and 1× protease inhibitor mix (Thermo)) for 5 min on ice. Lysate was layered on 500 μl of sucrose buffer (10 mM Tris-HCl (pH 7.0), 150 mM NaCl, 25% (wt/vol) sucrose, 25 µM α-amanitin, 20 U RNasin Ribonuclease inhibitor and 1× protease inhibitor mix) and spun down for 5 min at 16,000g (4ºC). The supernatant was carefully removed and nuclei were resuspended in 100 μl of DNase digestion buffer (1x DNase buffer (NEB), 25 µM α-amanitin, 20 U RNasin Ribonuclease inhibitor and 1× protease inhibitor mix) and further treated with 100 U of DNase I (NEB) for 20 min on ice. It is important for nuclei to be fully resuspended in the DNase digestion buffer. Non-resuspended nuclei are an indication of harsh cytoplasmic lysis conditions -In this case reduce the volume of cytoplasmic lysis buffer.

Chromatin Immunoprecipitation (IP) and nascent RNA extraction
A detailed protocol is available in the Supplementary Information ('NET-prism protocol'). Briefly, combined supernatants from the previous step were incubated in a final 1/10 dilution in IP buffer for 2 hours at 4ºC. For the sequential IP, a total Pol II antibody was used for 2 hours, followed by elution twice with 100 μl 2.5 mM CTD peptide (synthesized by Peptide Specialty Laboratories, Heidelberg, Germany; identical to Abcam ab17564) for 30 min. The eluate was further incubated with Ssrp1 antibody-coated beads for an additional 2 hours. Beads were washed 4 times with 1 ml of IP buffer and 700 μl of Qiazol (Qiagen) was directly added to the beads, followed by 140 μl of Chloroform. Samples were spun down and supernatant was ethanol precipitated (0.3M NaOAc, 2 μl Glycoblue). Concentration and size of nascent RNA was assessed by Nanodrop and TapeStation 2200, respectively. An IP from 10 8 ES cells usually yields~200-1000 ng of nascent RNA. Assessment of RNA size is important in order to evaluate the fragmentation time during the library preparation.

NET-prism library preparation
Two biological replicates were processed for each IP and library preparation. NET-prism libraries were prepared similarly to the human NET-seq protocol [25] with few modifications. The random barcode was ligated overnight at 16ºC to maximise ligation efficiency. Alkaline fragmentation of the ligated nascent RNA varies depending on the size of the RNA fragments obtained from each IP. IPs for Pol II S5ph, Pol II S2ph, Ssrp1, and Spt6 yielded large RNA fragments and therefore the ligated nascent RNA was fragmented until all RNA transcripts were within the range of~35-200 nucleotides. IPs for TFIID, and Mediator yielded fragments < 200 nt and therefore no fragmentation was performed. Maximum recovery of ligated RNA and cDNA was achieved from 15% TBE-Urea (Invitrogen) and 10% TBE-Urea (Invitrogen), respectively, by adding RNA recovery buffer (Zymo Research, R1070-1-10) to the excised gel slices and further incubating at 70°C (1500 rpm) for 15 min. Gel slurry was transferred through a Zymo-Spin IV Column (Zymo Research, C1007-50) and further precipitated for subsequent library preparation steps. cDNA containing the 3ʹ end sequences of a subset of mature and heavily sequenced snRNAs, snoRNAs, and rRNAs, were specifically depleted using biotinylated DNA oligos (Mylonas et al.). Oligo-depleted circularised cDNA was amplified via PCR (9-12 cycles) and double stranded DNA was run on an 8% TBE gel. The final NET-seq library running at~150 bp was extracted and further purified using the ZymoClean Gel DNA recovery kit (Zymo Research). Sample purity and concentration was assessed in a 2200 TapeStation and further sequenced on a HiSeq 2500 Illumina Platform (Supplementary Table 2).

NET-prism analysis
All the NET-prism fastq files were processed using custom Python scripts (https://github.com/BradnerLab/netseq) to align (mm10 genome) and remove PCR duplicates and reads arising from RT bias. Reads mapping exactly to the last nucleotide of each intron and exon (Splicing intermediates) were further removed from the analysis. The final NET-prism BAM files were converted to bigwig (1 bp bin), separated by strand, and normalized to x1 sequencing depth using Deeptools [26] (v 2.4) with an '-Offset 1' in order to record the position of the 5ʹ end of the sequencing read which corresponds to the 3ʹ end of the nascent RNA. NET-seq/prism tags sharing the same or opposite orientation with the TSS were assigned as 'sense' and 'anti-sense' tags, respectively. Promoter-proximal regions were carefully selected for analysis to ensure that there is minimal contamination from transcription arising from other transcription units. Genes overlapping within a region of 2.5 kb upstream of the TSS were removed from the analysis. For the NET-seq/prism metaplots, genes underwent several rounds of k-means clustering in order to filter regions; in a 2kb window around the TSS, rows displaying very high Pol II occupancy within a < 100 bp region were removed from the analysis as they represent non-annotated short non-coding RNAs. For Figure 1(b), genes that displayed an RPKM > 1 for Total Pol II (n = 6,107) were used for metaplot profiling. Average Pol II occupancy profiles were visualised using R (v 3.3.0).

Travelling ratio & termination index
The travelling ratio is calculated via:

ChIP-seq data processing
All ChIP-seq fastq files were aligned to the mm10 genome using Bowtie2 (v 2.2.6) with default parameters [27]. All BAM files were converted to bigwig (10 bp bin) and normalised to x1 sequencing depth using Deeptools (v 2.4) [26]. Duplicated reads were removed. Blacklisted mm9 co-ordinates were converted to mm10 using the LiftOver tool from UCSC and were further removed from the analysis. Average binding profiles were visualised using R (v 3.3.0).

Mass spectrometry sample preparation
Independent ES cell cultures were grown in 10cm dishes. Per IP, 20 × 10 7 cells were extracted, lysed, and nuclei were treated with DNase I as described above. The supernatant was incubated for 2 hours with a total Pol II antibody (ab817 -Abcam) or IgG (Cell Signalling) at 4ºC. In total, four samples were prepared for each IP (Total Pol II, IgG). After thorough washing of beads with IP buffer, samples were incubated overnight at 37°C with Tris pH 8.8 and 300 ng Trypsin Gold (Promega). Peptides were desalted using StageTips [28] and dried. The peptides were resuspended in 0.1% formic acid and analysed using liquid chromatographymass spectrometry (LC-MS/MS).

LC-MS/MS analysis
Peptides were separated on a 25 cm, 75 μm internal diameter PicoFrit analytical column (New Objective) packed with 1.9 μm ReproSil-Pur 120 C18-AQ media (Dr. Maisch) using an EASY-nLC 1200 (Thermo Fisher Scientific). The column was maintained at 50°C. Buffer A and B were 0.1% formic acid in water and 0.1% formic acid in 80% acetonitrile. Peptides were separated on a segmented gradient from 6% to 31% buffer B for 45 min and from 31% to 50% buffer B for 5 min at 200 nl/min. Eluting peptides were analyzed on a QExactive HF mass spectrometer (Thermo Fisher Scientific). Peptide precursor m/z measurements were carried out at 60,000 resolution in the 300 to 1800 m/z range. The top ten most intense precursors with charge state from 2 to 7 only were selected for HCD fragmentation using 25% normalized collision energy. The m/z values of the peptide fragments were measured at a resolution of 30,000 using a minimum AGC target of 8e3 and 55 ms maximum injection time. Upon fragmentation, precursors were put on a dynamic exclusion list for 45 sec.

Protein identification and quantification
The raw data were analyzed with MaxQuant version 1.6.0.13 [29] using the integrated Andromeda search engine [30]. Peptide fragmentation spectra were searched against the canonical and isoform sequences of the mouse reference proteome (proteome ID UP000000589, downloaded December 2017 from UniProt). Methionine oxidation and protein N-terminal acetylation were set as variable modifications; cysteine carbamidomethylation was set as fixed modification. The digestion parameters were set to 'specific' and 'Trypsin/ P,' The minimum number of peptides and razor peptides for protein identification was 1; the minimum number of unique peptides was 0. Protein identification was performed at a peptide spectrum matches and protein false discovery rate of 0.01. The 'second peptide' option was on. Successful identifications were transferred between the different raw files using the 'Match between runs' option. Label-free quantification (LFQ) [31] was performed using an LFQ minimum ratio count of 2. LFQ intensities were filtered for at least three valid values in at least one group and imputed from a normal distribution with a width of 0.3 and down shift of 1.8. The median value of the log2 LFQ intensities for the RNA Pol II IPs was used for the imputation of the missing values in the IgG IPs. Differential abundance analysis was performed using limma [32] (Supplementary Table 1).

Enhancers and super-enhancers
BED files containing typical enhancer and super-enhancer coordinates in mESCs were downloaded from Whyte et al. [22]. Distal enhancers were defined as regions that are not overlapping with any annotated gene within a 2000 bp window. Only the distal enhancers that displayed an RPKM > 1 for Pol II were kept for subsequent analyses.