Non-coding RNA polymerases that silence transposable elements and reprogram gene expression in plants

ABSTRACT Multisubunit RNA polymerase (Pol) complexes are the core machinery for gene expression in eukaryotes. The enzymes Pol I, Pol II and Pol III transcribe distinct subsets of nuclear genes. This family of nuclear RNA polymerases expanded in terrestrial plants by the duplication of Pol II subunit genes. Two Pol II-related enzymes, Pol IV and Pol V, are highly specialized in the production of regulatory, non-coding RNAs. Pol IV and Pol V are the central players of RNA-directed DNA methylation (RdDM), an RNA interference pathway that represses transposable elements (TEs) and selected genes. Genetic and biochemical analyses of Pol IV/V subunits are now revealing how these enzymes evolved from ancestral Pol II to sustain non-coding RNA biogenesis in silent chromatin. Intriguingly, Pol IV-RdDM regulates genes that influence flowering time, reproductive development, stress responses and plant–pathogen interactions. Pol IV target genes vary among closely related taxa, indicating that these regulatory circuits are often species-specific. Data from crops like maize, rice, tomato and Brassica rapa suggest that dynamic repositioning of TEs, accompanied by Pol IV targeting to TE-proximal genes, leads to the reprogramming of plant gene expression over short evolutionary timescales.


Transposable element silencing: DNA methylation meets RNA interference
Early DNA association studies revealed that eukaryotic genomes are full of repetitive sequences, hinting that most chromosomal DNA does not code for proteins [1,2]. Advances in molecular genetics and DNA sequencing revealed that most non-coding DNA consists of transposable elements (TEs), mobile genetic parasites that excise or copy themselves to then insert elsewhere in the genome. TEs represent 40% of the human genome, 20% of the Arabidopsis thaliana genome and ~80% of crop genomes such as Zea mays (maize), Hordeum vulgare (barley) and Triticum aestivum (wheat) [3][4][5][6]. In the course of evolution, TEs can generate useful genetic diversity [7,8], but on shorter timescales TE insertions cause deleterious mutations and genomic instability [9,10].
Animal and plant cells express elaborate molecular surveillance systems to recognize and silence TEs. A common mechanism of TE silencing involves dimethylation of histone H3 at lysine 9 (H3K9me2) along with methylation of cytosines in DNA, a chromatin state typically refractory to RNA polymerase II (Pol II) transcription [5]. Noncoding RNAs guide such repressive chromatin marks to specific TE targets. In metazoans, chromatin-level TE surveillance is driven by ~26-32 nt Piwi-interacting RNAs (piRNAs), though many animal taxa do not methylate their DNA [11]. In plants, the enzymatic machinery for piRNAs is absent, but an analogous pathway mediated by ~24 nt small interfering RNAs (siRNAs) triggers RNA-directed DNA methylation (RdDM). RdDM is a functionally specialized, nuclear RNA interference pathway that evolved in terrestrial plants.
Plants express enzymes that methylate cytosines in three sequence contexts referred to as CG, CHG, and CHH sites (where H is A, C, or T). The enzyme METHYLTRANSFERASE 1 (MET1) is a maintenance methyltransferase that copies CG methylation from parent to daughter strands during plant DNA replication [12], like its mammalian ortholog, DNMT1. The plant CHROMOMETHYLASES, CMT2 and CMT3, maintain DNA methylation by "reading" histone methylation marks and catalyzing CHH or CHG methylation in adjacent DNA [13][14][15]. Finally, de novo methylation is mediated by either DOMAINS REARRANGED METHYLTRANSFERASE (DRM; CG, CHG and CHH sites) or DNA METHYLTRANSFERASE 3 enzymes in plants (DNMT3; CG and CHH sites). In the bryophyte Physcomitrella patens, PpDNMT3b is the major de novo methyltransferase at CG and CHH sites, with PpDRMs playing only a minor role [16]. By contrast, DRM activity is crucial for de novo methylation in flowering plants, because DNMT3 is absent in these species [16][17][18]. A. thaliana has two known DRM genes, DRM1 and DRM2; the double mutant drm1 drm2 abolishes RNA-directed DNA methylation [19]. The higher expression of DRM2 compared to DRM1, and the fact that drm2 single mutants recapitulate the late-flowering phenotype of drm1 drm2 double mutants, suggest that DRM2 is the key de novo methyltransferase in A. thaliana [20].
Four pioneering studies in 2005 reported the discovery of Pol IV and Pol V as a key specialized transcription machinery for RdDM [21][22][23][24]. Pol IV and Pol V are enzymes that assemble from unique combinations of Pol II-like subunits that evolved ~470 million years ago in the terrestrial plant lineage [25][26][27][28]. Pol IV and Pol V transcription activities converge to ensure that DRM2 methylates appropriate targets. Pol II mostly transcribes genes in pursuit of mRNA biogenesis. By contrast, Pol IV and Pol V transcribe TE loci, intergenic repeats and the promoter regions of certain genes. The consensus in the field is that Pol IV synthesizes precursors for siRNAs that guide RdDM [29,30], whereas Pol V transcribes loci into non-coding scaffold RNAs that are critical for target recognition [31].
In this review, we first describe how Pol IV and Pol V orchestrate TE surveillance, which is a central function of RdDM in plants. Then, we survey what is known about Pol IV-specific subunits, their internal domain structure, unique protein partners and emerging findings about what brings the Pol IV pathway together in the nucleus. Finally, we present a survey of novel biological functions of Pol IV-RdDM that have been discovered in recent years.

Pol IV and Pol V non-coding transcripts guide de novo DNA methylation
A combination of genetic and biochemical experiments has shown that Pol IV non-coding RNA transcripts initiate siRNA biogenesis for RdDM ( Figure 1a) [29,[32][33][34][35]. The Pol IV complex is physically coupled to the enzyme RNA-DEPENDENT RNA POLYMERASE 2 (RDR2), one of six different RDRs expressed in A. thaliana. The Pol IV-RDR2 partnership is one key difference between the RdDM pathway and other functionally distinct small RNA pathways [30,35,36]. This protein-protein interaction enables channeling of Pol IV primary transcripts to RDR2 for double-stranded RNA (dsRNA) synthesis in vivo and in vitro ( Figure 1a) [29,33,35].
The Pol IV primary transcripts and RDR2's dsRNA products are not detectable by northern blotting of RNA from wild-type plants, nor are they seen in conventional RNA-seq, which makes them challenging to detect [29,32]. This suggests that RDR2 products are very efficiently processed into 24 nt siRNAs by DICER-LIKE 3 (DCL3). However, Pol IV-RDR2 products accumulate in vivo when Dicer processing is disrupted in dcl3 single mutant or in dcl2 dcl3 dcl4 triple mutant plants [29,32,34]. The RDR2 products are relatively short (~26-45 nt) both in vivo and in vitro, and have a 3'overhang of 1-2 nontemplated nucleotides, attributable to RDR2's terminal transferase activity [29,33]. These properties of Pol IV-RDR2 products have the logical consequence that DCL3 can dice each such dsRNA substrate only once to generate a single 24 nt siRNA duplex [29,32].
The short length of Pol IV-dependent RNAs is likely due to their unusual termination mechanism. Pol IV transcript termination does not rely upon specific signal sequences akin to other RNA polymerases. Pol IV termination could, instead, be primarily determined by the geometry of its transcription bubble, because Pol IV is ineffective at displacing non-template DNA in transcription assays. According to this model, Pol IV transcribes single-stranded DNA in a conventional bubble (~18-25 bases for eukaryotic Pol II [37]), but then encounters base-paired DNA at the bubble's edge and only extends a further ~12-18 nt before terminating to release ~30-43 nt transcripts, as was observed in vitro [33], in close agreement with in vivo RNA-seq data [29].
After RDR2 synthesizes dsRNAs from Pol IV primary transcripts, DCL3 dices these dsRNAs into 24 nt siRNAs, the enzyme HEN1 catalyzes 2'O-methylation at siRNA 3' ends, and the siRNAs are loaded onto ARGONAUTE4 (AGO4) (Figure 1a). In plants, the gene families encoding RNA interference factors have diversified: four to five genes encode DCL proteins and over 10 genes encode AGO proteins. The mechanism governing the specific function of DCL3 in processing dsRNA products of RDR2 is not clear, but a preference of DCL3 to dice short, 30-50 nt dsRNAs with a 5'-terminal adenine has been reported from in vitro assays [38]. The mature 24 nt siRNAs are primarily loaded onto AGO4 [39] and to lesser extents onto AGO6 and AGO9 [40][41][42]. The specificity of Pol IV-RDR2 derived 24 nt siRNAs for AGO4 is not fully understood, but the affinity of AGO4 for a 5'-terminal adenine corresponds to features of Pol IV-RDR2 derived siRNAs [43].
During the RdDM effector phase, siRNA sequence-specific DNA methylation depends on Pol V transcription of the target locus ( Figure 1a) The Pol IV-RdDM pathway is initiated by recruitment of Pol IV to silent chromatin; this typically occurs in distal chromosomal regions by the dimethylated Histone 3 Lysine 9 (H3K9me2) reader, SHH1, which interacts with Pol IV through chromatin remodelers CLSY1 or CLSY2. In pericentromeric regions, CLSY3 and CLSY4 are required for Pol IV recruitment, which may interact with these DNA regions using a DNA methylation reader, so far unknown in a direct or indirect fashion. (c) Pol V is recruited to chromosomal targets by a dedicated machinery, mostly different from the factors required for Pol IV transcription. SUVH2 and SUVH9 are SET and RING-associated (SRA) domain proteins thought to recruit Pol V to regions of methylated DNA. The DDR complex (DRD1, DMS3 and RDM1; not detailed here) serves as a bridge complex that mediates Pol V transcription at many, if not all RdDM targets. Pol V interactions with the target DNA and chromatin are further consolidated by MORC6. [31]. Pol V transcribes chromosomal loci into long non-coding RNAs that more closely match the DNA template sequence than Pol IV transcripts [29,[44][45][46]. Nascent Pol V transcripts are scaffolds to which AGO4-siRNA complexes physically associate; logically, this could occur via basepairing of the AGO4-loaded siRNA guide to complementary transcripts [47]. In addition, the Pol V largest subunit (NRPE1) possesses a carboxyterminal domain (CTD) that interacts with AGO4 [48]; the Pol V partner protein, SPT5L, also interacts with AGO4 [49]. The NRPE1 CTD and SPT5L, combined, provide independent functions that consolidate AGO4-Pol V association [50,51]. Formation of the AGO4-Pol V-SPT5L complex is thought to attract DRM2, which catalyzes DNA methylation [17]. How cycles of AGO4-siRNAtranscript tethering are coupled to DRM2 methylation is unclear, but it could involve cotranscriptional cleavage of Pol V transcripts by AGO4's slicer activity [44,52]. For a subset of targets, DRM2 methylation also depends on AGO4 interaction with three RNA binding proteins (IDN2, IDNL1 and IDNL2; not depicted in Figure 1) [53][54][55].
The partnership between Pol IV and Pol V steps creates an RdDM positive feedback loop [21,23] by amplifying the silent chromatin marks required for the recruitment of each RNA polymerase [56,57]. At most targets, the synthesis of high levels of 24 nt siRNAs depends on this full RdDM cycle, including the Pol V-AGO4-DRM2 effector step [23,52,58,59]. The importance of Pol IV-Pol V cooperation was directly tested by the artificial targeting of Pol IV and Pol V to the same locus [60]. Because Pol IV and Pol V are both needed for robust RdDM, a potential consequence of this cooperative mechanism is the prevention of ectopic silencing linked to DNA methylation spreading. Pol V transcription is prominent at the edges of RdDM targets, such as TE boundaries, which limits the action of the RdDM pathway while repressing Pol II transcription [45,61]. Together, these findings all illustrate the importance of recruiting Pol IV and Pol V to appropriate genomic loci.
Several factors have been identified that may recruit Pol IV to chromosomal targets ( Figure  1b). Mass spectrometry found that CLASSY proteins, a subfamily of SWI2/SNF2-like ATPases related to chromatin remodelers, copurify with the Pol IV complex in A. thaliana and maize [36,62]. The biochemical activity of CLSY proteins has not been elucidated, but genetic screens have isolated clsy mutations that disrupt gene silencing in both these plant species [63][64][65]. The four A. thaliana proteins, CLSY1 through CLSY4, facilitate the association of Pol IV at about 90% of loci that give rise to 24 nt siRNAs [66]. Intriguingly, CLSY1 and CLSY2 mainly facilitate Pol IV association at distal loci in the A. thaliana chromosome arms, while CLSY3 and CLSY4 assure this function in dense pericentromeric heterochromatin [66].
CLSY1 and CLSY2 may provide protein-protein interactions that bridge Pol IV to its key partner protein, SAWADEE HOMEODOMAIN HOMOLOG 1 (SHH1) [36,66]. SHH1 would read repressive H3K9me2 marks at targets via its SAWADEE domain, recruiting CLSY1/2 and Pol IV to silent chromatin ( Figure 1b, left-hand diagram) [56,67]. Supporting this model, SHH1's SAWADEE domain selectively interacts with H3K9me2 and unmethylated H3K4 on peptide arrays, the 24 nt siRNA clusters requiring SHH1 overlap with those requiring CLSY1/2, and Pol IV complex copurification with SHH1 depends on CLSY1/2 [56,66]. By contrast, biogenesis of 24 nt clusters at CLSY3 and CLSY4-dependent loci does not correlate with a reduction in H3K9me2 in mutants implicated in H3K9 methylation. Mutants defective in CG methylation do cause a loss of CLSY3 and CLSY4-dependent 24 nt clusters, though, suggesting that a DNA methylation reader is directly or indirectly involved [66]. No epigenetic readers for CLSY3/4-dependent guidance of Pol IV to pericentromeric regions have yet been identified (Figure 1b, right-hand diagram).
At the downstream effector step, two SU(VAR)3-9 homolog class proteins, SUVH2 and SUVH9, appear to recruit Pol V to genomic regions marked by DNA methylation (Figure 1c). SUVH2 and SUVH9 are histone methyltransferase-like proteins that have lost their intrinsic methyltransferase activity, but that can bind methylated DNA via a conserved SET and RING-ASSOCIATED (SRA) domain [57]. Furthermore, SUVH2 and SUVH9 interact with the microrchidia adenosine triphosphatase proteins MORC1 and MORC6 to assist Pol V recruitment to chromatin [68]. MORC6 mediates heterochromatin condensation at certain loci, thereby contributing to the silencing effects of RdDM independently of DNA methylation [69][70][71]. Interaction of SUVH2 and SUVH9 with Pol V occurs via the DDR complex [68].
The DDR complex consists of three proteins, , and RNA-DIRECTED DNA METHYLATION 1 (RDM1), which are essential for Pol V recruitment and transcription in vivo [31,[72][73][74]. The DDR complex core is an RDM1 dimer with plant-specific protein folds but enigmatic biochemical features [74][75][76]. This RDM1 dimer serves as a bridge to recruit two DMS3 dimers, which are proteins homologous to hinge domain regions of cohesin and condensin ATPases [77]. Finally, a putative SWI2/SNF2 chromatin remodeler, DRD1, is recruited resulting in an ordering of the coiledcoil helix of the DMS3 dimers [74,78]. The SWI2/ SNF2 ATPase domain of DRD1 could plausibly allow it to interact with chromatin, but how the DDR complex gets recruited to Pol V or SUVH2/9 is unknown.

More than a sum of Pol II parts: unique Pol IV and Pol V subunits
Pol IV and Pol V have evolved from Pol II [25,28,35]. Consequently, these plant-specific enzymes are composed of 12 subunits, comparable to Pol II ( Figure 2a). However, Pol IV and Pol V contain distinct catalytic subunits, interact with a unique set of recruitment factors, target mostly non-genic loci, and generate products with novel biological functions. Logically, these peculiarities of Pol IV and Pol V must be reflected in their protein structure. Phylogenetic and structure-function analyses have probed the composition of these specialized RNA polymerases to determine how the plant non-coding RNA transcription machinery governs RdDM and genome surveillance.
The architecture of eukaryotic RNA polymerases has been deeply investigated in yeast. Pol II was the first such enzyme solved at atomic resolution via x-ray crystallography [79][80][81]. Highresolution structures are now available for Pol I and Pol III as well [82][83][84]. Pol I, Pol II and Pol III have two core catalytic subunits, with a total of 14, 12 and 17 subunits, respectively [85]. Decades of intense study have identified structural elements that are conserved across these RNA polymerases, and often also in archaeal and bacterial RNA polymerases [86,87]. The RNA polymerase complex consists of a crab-claw shape, with clamp and the jaw structures allowing opening and closing of the primary channel [88,89]. Two highly conserved metal-binding sites (Metal A and Metal B) chelate Mg 2+ ions necessary for DNA-templated base addition, which proceeds 5' to 3' using ribonucleotide triphosphates as substrates. Formation of the Metal A site requires three aspartates of the largest subunit (e.g., NRPB1 in Pol II) arranged in a conserved DFDGD motif (Figure 2b) [79,90]. Other conserved structural elements include the fork loop (s), rudder, wall, trigger loop and bridge helix allow the basic mechanism of transcription. In addition, all polymerase have a protruding "stalk" structure, composed of two peripheral subunits, which promotes the formation of an open complex and increases processivity [86,91].
Pol IV and Pol V subunit composition resembles Pol II: each enzyme is composed of 12 subunits, about half of which are encoded by the same genes ( Figure 2a) [25]. The plant RNA polymerase subunits are named NUCLEAR RNA POLYMERASE x (NRPx1 to NRPx12) proteins. In this nomenclature, the RNA polymerase complex is indicated by x's position in the Latin alphabet (A for Pol I, B for Pol II, C for Pol III, D for Pol IV and E for Pol V). NRPB1 and NRPB2 are Pol II's two largest subunits, together forming the catalytic core. Hence, NRPD1 and NRPD2 form the Pol IV core, and NRPE1 and NRPE2 form the Pol V core. In A. thaliana, the 2nd subunits of Pol IV and Pol V are encoded by a single gene NRPD/E2, whose gene product can assemble to form either Pol IV (NRPD1 + NRPD/E2) or Pol V (NRPE1 + NRPD/E2) (Figure 2a). Similarly, the 4th and 7th subunit heterodimer of the Pol IV stalk is distinct from the heterodimer associated with Pol II. Again, a particular subunit combination (NRPD/E4 + NRPD/E7) is inferred to form the stalk that is functional in either Pol IV or Pol V, but not in Pol II [25,62]. The 5th subunit of Pol V is specialized (NRPE5), whereas Pol IV competes with Pol II for the same 5th subunit (NRPB/D5). Finally, certain subunits are common to all these RNA polymerases: NRPB/D/E3, NRPB/ D/E6, NRPB/D/E8, NRPB/D/E10, NRPB/D/E11 and NRPB/D/12 subunits are each encoded by common genes and can assemble with Pol II, Pol IV or Pol V (Figure 2a). Based on the many mutually orthologous subunits in Pol IV, Pol V and Pol II, it is hypothesized that the general structure and assembly of the core enzymes is evolutionary conserved [25,62]. The discovery of common assembly factors for Pol IV, Pol V and Pol II, called MINIYO (IYO) and QUATRE QUART 2 (QQT2) supports this hypothesis [92].
Despite being expressed from different genes, the largest subunits of Pol II (NRPB1), Pol IV (NRPD1) and Pol V (NRPE1) have similar primary structures (Figure 2b). In these largest subunits, the eight domains (A to H) that are conserved in Pol I, Pol II and Pol III [93,94] are also found in NRPD1 and NRPE1, including the aspartate triad (DFDGD motif) located in the D domain that forms the Metal A binding site essential for the catalytic activity of all RNA polymerases [23,35]. The difference between the largest subunits is mainly positioned at their carboxyterminal domains (CTDs). The Pol II CTD, containing ~25 to 52 tandem copies of a conserved heptad peptide (34 repeats in A. thaliana), is known to be involved in Pol II recruitment and is subject to phosphorylation important for different transcriptional steps (activation, elongation, termination) [95,96].
By contrast, Pol V's CTD is intrinsically disordered and contains varying numbers of glycine and tryptophan (WG/GW) motifs (17 repeats in A. thaliana), known as "AGO-hooks" that stabilize the interaction of AGO4-clade proteins with Pol V [48,97,98]. Additionally, the DEFECTIVE CHLOROPLASTS AND LEAVES (DeCL) domain in the NRPE1 CTD is important for Pol V transcription in vivo [99]. In Pol V, the DeCL domain and adjacent glutamine-serine (QS) repeats mediate binding of a 3'->5' exoribonuclease, RRP6L1 [99]. The trimming action of RRP6L1 on Pol V transcripts could potentially lead to a pausing of Pol V in chromatin, necessary for robust RdDM [99,100]. A DeCL domain is also present in the CTD of the Pol IV subunit NRPD1. Pol IV mutants missing this DeCL domain display reduced transcription activity, resulting in an ~80% loss of Pol IV-dependent siRNA production and corresponding quantitative losses in RdDM across the entire genome [101]. The exact role of each DeCL domain in their distinct Pol IV and Pol V enzyme contexts will be an interesting avenue for investigation in coming years.
We recently discovered that Pol IV harbors a novel amino acid motif in its NRPD1 N-terminus, which is absent in Pol II but highly conserved in Pol IV [102]. This motif is composed of a C[KR] YC box followed by a 5-10 amino acid spacer, then by a YPx [MV][KR]F[KR] box (Figure 2b). A point mutation in the motif caused a loss in 24 nt siRNA biogenesis, disrupted de novo DNA methylation and reactivated TE loci in A. thaliana [102]. Beyond its critical role in genome surveillance and DNA methylation patterning, the precise function of the motif is not fully understood. Residual 24 nt siRNAs accumulate at TE extremities and other hotspots in the epigenomic landscape of the C[KR]YC-box mutant, suggesting that these could be sites of RdDM initiation. One attractive model is that the Pol IVspecific motif in NRPD1's N-terminus governs the mechanism of silent chromatin amplification as RdDM spreads across a locus in WT plants [102].
While having gained novel protein motifs and domains, plant NRPD1 and NRPE1 subunits have also shed structures that are highly conserved in eukaryotic NRPB1. In the NRPB1 G domain, there is a structural element called the trigger loop ( Figure  2b) [103] that is significantly modified in NRPD1 and NRPE1 [35,104]. The trigger loop is not essential for Pol II in vitro transcription activity, but its deletion from NRPB1 causes reduced transcription fidelity [105,106]. The absence in the NRPD1 G domain of otherwise conserved amino acids is a plausible explanation for the high error rate of Pol IV transcription [35,46]. The trigger loop is the direct target of the fungal toxin α-amanitin, a Pol II inhibitor, explaining Pol IV's lack of sensitivity to this drug [35,105].
Substantial progress has been made over the last 15 years, since the discovery of Pol IV and Pol V. A concrete molecular understanding has emerged. These plant non-coding RNA polymerases have unique subunit combinations, functional domains, specialized motifs and other features that distinguish them from Pol II and from each other. Yet, the domains in Pol IV that mediate its assembly with the partner enzyme RDR2, or that assure Pol IV recruitment via SHH1 and CLSY remain unknown. The structures in Pol V that mediate its specific association with the DDR complex and SUVH2/ SUVH9 also need to be identified. Furthermore, much remains to be discovered about the Pol IV and Pol V transcription cycles: their precise requirements for recruitment, transcription initiation, elongation and termination. In the future, a comprehensive structure-function analysis of unique domains in Pol IV and Pol V will be highly informative, especially in relation to specific proteinprotein interactions that allow the assembly of unique subunit combinations with their specialized partners.
Looking at three different species of the Brassicaceae family, RdDM deficiency causes reproductive defects of varying severity. In pol IV mutants of C. rubella, both female and male gametes show abnormalities consistent with the high rate of seed abortion [119], whereas only female reproductive cells are disrupted in pol IV mutants of B. rapa [118]. The few viable seeds in pol IV mutants of C. rubella and B. rapa are abnormally small. Despite this reduced seed size in pol IV mutants of A. thaliana, however, fertility is not significantly impaired [118,131]. The diversity of pol IV mutant phenotypes suggests that the genomic targets causing these defects may differ between species.
At least two molecular mechanisms could explain the pleiotropy of deficiency phenotypes in pol IV mutants. A first possibility is that disrupting Pol IV-RdDM derepresses silent chromatin leading to TE insertion mutations in developmental and stress regulatory genes. However, MET1-dependent maintenance methylation and associated heterochromatin are often sufficient to silence TEs under standard growth conditions. Outside of stress conditions and chemical treatments, TE mobilization is rarely observed in pol IV single mutants [132][133][134][135][136]. By contrast, novel TE insertions are frequent in plants lacking DNA methylation maintenance (e.g., ddm1 or met1 null mutants), and especially in plants defective in both RdDM and maintenance pathways [137][138][139][140]. The possibility exists, of course, that the balance between CG methylation maintenance and Pol IV-RdDM differs among plant species, explaining the severe defects caused by pol IV mutants in species other than A. thaliana.
A more likely explanation of why pol IV mutants affect reproduction and stress responses in species-specific ways is that RdDM targets gene promoter-proximal TEs. Because of the variation in TE distribution in plant genomes, pol IV mutations in different taxa could trigger pleiotropic phenotypes due to misexpression of different sets of genes [137][138][139][140]. Promoter-proximal TEs provoke various regulatory outcomes, but the most common is transcriptional silencing of a gene promoter because of RdDM at the adjacent TE (Figure 3b). In the Arabidopsis genus, for example, the FLOWERING WAGENINGEN (FWA) gene promoter contains direct repeats reminiscent of a SINE TE, which attracts the RdDM machinery [21,131,141,142,143,144,145]. FWA is a maternally imprinted gene that encodes a repressor of flowering [146]. Consequently, FWA gene activation in vegetative tissues of RdDM-deficient plants, such as pol IV null mutants, causes late flowering (Figure 3b).
Comparable mechanisms explain the phenotypic consequences of RdDM deficiency in rice (Figure 3c). In pol IV null mutants of rice (Os nrpd1a Os nrpd1b), loss of siRNAs from the miniature inverted-repeat TEs (MITEs) flanking a microRNA precursor gene is linked to miR156 overaccumulation [112]. Ectopic miR156 can then target the mRNA of IDEAL PLANT ARCHITECTURE 1 (IPA1), a key repressor of tillering [147]. Deregulation of RdDM affecting this miR156-IPA1 developmental pathway is thus thought to cause increased tillering in Os nrpd1a Os nrpd1b mutants (Figure 3c). Similarly, because of the MITEs in genes important for phytohormone biosynthesis, gibberellin and brassinosteroid levels are perturbed in Os dcl3 and Os rdr2 knockdown lines, leading to stunted plant growth [113]. In maize, genome-wide association studies for drought tolerance revealed that RdDM targets a MITE insertion located in the Zm NAC111  [208]. Highlighted in green are the species in which genes encoding RdDM players are present. Red indicates that no evidence for genes encoding RdDM factors has been reported. (b) Tandem repeats similar to transposable elements (TEs) in the FWA gene promoter allow Pol IV and gene promoter [124]. The resulting silencing of Zm NAC111 causes reduced drought tolerance in temperate maize. These studies show that RdDM can silence TE-proximal genes in cis, modifying economically important traits in cereals.
RdDM also acts in trans, as was documented using viroid, virus and transgene-induced systems in the 1990s [148][149][150], leading to the discovery of the Pol V effector machinery [22,77,78]. Such in trans RdDM is known to modify the expression of natural plant genes with consequences for disease resistance. In rice, MITE polymorphisms in an intron of the WRKY45 gene correlate with rice susceptibility to the Xanthomonas oryzae pv. oryzae (Xoo) bacterial infection and attenuate resistance to Magnaporthe oryzae fungus (Figure 3d) [129]. Zhang and colleagues showed that MITE siRNAs derived from an intron in the WRKY-1 allele associated with Xoo susceptibility can target a homologous MITE sequence in the unrelated STI gene (Figure 3d). Pol IV involvement was not directly tested, but DNA methylation of the STI locus requires Os RDR2 and Os DCL3 functions. RdDM suppression of STI expression leads to a crippled defense against Xoo infection in rice subspecies with the WRKY45-1 locus harboring this MITE insertion.
In maize, Pol IV-dependent siRNAs can target genes in a process called paramutation [151]. Paramutation is an interaction between different alleles of the same locus that results in non-Mendelian inheritance: after a cross the silent "paramutagenic" allele triggers heritable silencing of the other, "paramutable" allele. This in trans effect results in the inheritance of two silent alleles without changing the DNA sequence of either [152]. A famous case is the booster1 (b1) locus, which encodes a transcription factor that promotes anthocyanin biosynthesis. Maize plants expressing b1 display purple coloration (Figure 3e). There is an enhancer consisting of seven tandem repeats situated ~100 kb upstream of b1, which likely forms a complex secondary structure to activate b1. This enhancer comes in two allelic forms: Pol IV transcription, RDR2 production of dsRNA and subsequent siRNA biogenesis seem to be critical for paramutation, because genetic lesions in the largest subunit of maize Pol IV (MOP3/RMR6) [104,153], in the second largest subunit of maize Pol IV and Pol V (MOP2/ RMR7) [115,116], in maize RDR2 (MOP1) [117,121] or in maize CLSY (RMR1) [154] each disrupt B-I to B'* paramutation. The best model to explain these results is that Pol IV-dependent siRNAs from the B' paramutagenic allele target DNA methylation and silencing of this B' allele in cis, preventing b1 locus activation (Figure 3e, green arrows). The same B'-derived siRNAs RdDM to repress FWA expression for flowering time regulation in the Arabidopsis genus. (c) Insertion of Miniature Inverted-repeat Transposable Elements (MITEs) near gene loci can regulate gene expression in Oryza sativa (rice). Pol IV represses a miRNA precursor gene, OsMIR156j, in wild-type rice. Transcriptional silencing of OsMIR156j is disrupted in Os nrpd1a Os nrpd1b mutant plants, causing miR156 to overaccumulate and target the mRNA of Os IPA1, which ultimately leads to increased tillering. (d) Plant siRNAs can also target and transcriptionally repress genes in trans, for example to regulate innate immunity in rice. Expression of the Os STI gene leads to Xanthomonas oryzae pv. oryzae (Xoo) resistance. In Xoo susceptible plants harboring the WRKY45-1 allele, an intronic MITE triggers production of siRNAs via Pol IV-RdDM that will guide DNA methylation to a homologous MITE sequence in an intron of STI gene and silence it. In Xoo resistant plant harboring the WRKY45-2 allele, the intronic MITE and resultant silencing of STI is missing. would induce in trans DNA methylation at the paramutable B-I allele in heterozygous plants (Figure 3e, blue arrows). This DNA methylation is heritable, with all F2 progeny of the B'*/B' heterozygote showing the same green phenotype as the original B'/B' parent and heterozygous F1 generation. Why RdDM does not continuously target the tandem-repeat enhancer natively present in B-I/B-I parent plants is unresolved. Pol IV-dependent paramutation has also been observed for other loci that regulate pigment biosynthesis in maize [155], with analogous phenomena occasionally reported in other plant species [156].
Together, these examples show that distinct genes are silenced by Pol IV-RdDM in different plant species. In analogy to miRNAs, Pol IV can modulate temporal and spatial patterns of gene expression, with essential functions in reproductive development. The regulation of flowering by FWA silencing occurs in multiple species [144]. More typically, however, RdDM targets do not show synteny or sequence similarity across species. Comparison of Pol IVdependent siRNA clusters in A. thaliana to those inferred to exist in A. lyrata (based on 24 nt siRNA profiling) revealed only limited conservation [157]. More recently, comparison of gene expression in wild type and pol IV mutant plants from two Brassicaceae species (A. thaliana and C. rubella) found only negligible overlap in RdDM-targeted genes [119]. Unlike the many cases of ancient plant miRNA-target mRNA pairs, Pol IV-dependent siRNAs rarely target genes in an evolutionary conserved manner across species [158,159]. Instead, the dynamic reprogramming of gene expression via RdDM, observed under stress conditions [122,160] and during key phases in reproduction [161][162][163][164] may allow plants to evolve and adapt to challenging conditions in a stochastic manner over shorter timescales.

Conclusions and future directions
Pol IV and Pol V are plant-specific RNA polymerases that evolved from an ancestral Pol II into enzymes specialized in generating different noncoding RNAs for RdDM. The most evident biological function of Pol IV-RdDM is TE silencing (Figure 1). In addition, the balancing of paternalmaternal imprinting appears to be an evolutionary conserved function of RdDM, as reviewed elsewhere [165,166]. Plant species have also coopted RdDM, in different ways, to regulate the expression of TE-proximal genes in key biological pathways ( Figure 3) [112,113,116,[118][119][120]. Whether Pol IV and Pol V are regulated in a spatio-temporal manner to optimize distinct RdDM functions in TE silencing, imprinting, developmental control and stress response is an open question (Figure 4a).
Protein components of RdDM are not constitutively expressed in A. thaliana, showing higher expression in the shoot apical meristem cells [167] and during late embryogenesis [161]. In addition, abiotic and biotic cues from the environment influence the expression and stability of RdDM proteins. For instance, heat shock dramatically decreases the expression of several RdDM pathway components, including NRPD1 and NRPE1 [160]. Moreover, Rice grassy stunt virus hijacks the host plant's ubiquitin-proteasome system to degrade Os NRPD1A (Pol IV) proteins and modify plant development (Figure 4a) [168]. Similarly, the cell cycle regulatory anaphase promoting factor (APC) can mediate the degradation of DMS3, a core component of the DDR complex needed for Pol V function [169]. However, the expression patterns of RdDM players do not necessarily correspond to locations of siRNA action, because siRNAs can act non-cellautonomously; differential regulation of siRNA translocation could thus play a role in RdDM control [170][171][172][173][174][175]. Further tissue-specific investigations of RdDM will be needed to understand how Pol IV and Pol V transcription are coupled in time and space to ensure appropriate deposition of DNA methylation during development, or in response to environmental stress conditions.
In addition to global changes in the accumulation of RdDM players induced by stress or developmental cues, local spatio-temporal changes in the activity of RdDM components seem likely. For instance, during UV-C light-induced DNA damage, an abrupt increase in Pol IV-dependent siRNA levels is observed at damaged sites [123], suggesting that Pol IV is recruited to these sites upon damage. It is not yet known whether differential expression of Pol IV recruitment factors or local changes in chromatin status help explain Pol IV function at sites of DNA damage (Figure 4b). The local activity of RdDM is further reinforced or antagonized by alternative epigenetic pathways. Best understood in this respect are the DNA methylation maintenance pathways, which often target the same types of loci as those subject to RdDM (Figure 4b) [176]. In addition, histone deacetylation by HDA6 [177] and histone demethylation by JMJ14 [178,179] are critical for TE silencing and Pol IV function at subsets of RdDM targets.
Changes in partially redundant pathways might explain the variable importance of RdDM in different regulatory and chromatin contexts. For example, MET1, DDM1 and HDA6-dependent pathways counterbalance Pol IV to regulate the intensity of RdDM in ribosomal RNA gene tandem repeats [180][181][182]. Chromatin remodeling factors [183,184] have the potential to affect the recruitment of RdDM, while DNA demethylases [127,[185][186][187][188][189][190] and Pol II transcriptional factors [135] are known to antagonize the RdDM silencing mechanism (Figure 4b). Future genetic analyses and artificial co-targeting strategies [60] will help dissect the complex interplay between the multiple layers of chromatin modification and non-coding RNA regulation in plants.
An alternative explanation for the variation in regulatory function of the RdDM pathway across plant species might lie in recent duplications and subfunctionalizations of RdDM protein families. Several of the Pol IV and Pol V subunits, as well as other proteins of the RdDM pathway, are encoded by a variable number of gene paralogs across species. Most notable are duplications of the largest and second largest Pol IV/ V-like subunits in cereal monocots compared to Pol IV assembly and turnover is likely governed by the specific subunits and functional domains that mediate Pol IV's interactions with SHH1, CLSYs, RDR2 and yet unknown, specialized regulatory proteins. (b) Initial recruitment of Pol IV to specific sites in the genome may require factors other than SHH1 and CLSY proteins, which still remain to be discovered. Another important process that controls the intensity of Pol IV-RdDM is the balance between CG/CHG methylation maintenance (involving MET1, CMT3, and HDA6 proteins), and active 5-methylcytosine removal by plant glycosylase lyases (ROS1 and DME). (c) In addition to the canonical RdDM pathway involving Pol IV-RDR2-DCL3 and 24 nt siRNAs loaded onto AGO4 (bold arrows), other pathways can trigger de novo DNA methylation in plants (thin arrows). The alternatives include Pol II-RDR6 production of dsRNAs that are diced into 21-22 nt siRNAs, or Pol IV-RDR2 production of dsRNAs that are diced by the alternate enzymes DCL2 and DCL4, into 22 and 21 nt siRNAs, which tend to associate with different effectors, such as AGO1 and AGO2, to guide DNA methylation. eudicot plants [62,191]. Although the molecular and biological functions of these extra RNA polymerase subunits remain unknown, they might reinforce or even extend the RdDM machinery in species like rice, barley and maize.
In addition, molecular pathways have been recently described in A. thaliana that use varying combinations of the core Pol IV-RdDM components. For instance, Pol IV was found to mediate the biogenesis of 21 nt siRNAs at DNA double-strand breaks [192] or at photodamage-induced lesions [123]. Similar to Pol IV transcripts during RdDM, the siRNA precursors at DNA damage sites are RDR2-dependent, but the downstream processing step requires DCL4 rather than DCL3 [123,192]. Resulting 21 nt siRNAs are loaded onto AGO1 in the case of photodamage [123] or onto AGO2 in the case of DNA double-strand breaks (Figure 4c) [192]. These siRNA-AGO complexes could facilitate DNA damage recognition [123,192] and prevent excessive alterations of the DNA methylation landscape upon photodamage [193].
In fact, siRNAs derived from Pol II transcription also trigger DNA methylation via the Pol V effector machinery. RDR6, an enzyme paralogous to RDR2, uses Pol II transcripts from TEs or transgenes to synthesize dsRNAs that are diced into 21 and 22 nt siRNAs by DCL4 and DCL2, respectively, [138,194,195]. Genetic experiments suggest that these siRNAs guide AGO1 (or other AGOs) to sites of Pol V transcription for non-canonical RdDM (Figure 4c) [196]. Moreover, endogenous Pol II transcripts that fold into hairpin RNAs or miRNA precursors are at times processed by DCL2, DCL3 or DCL4, again leading to non-canonical RdDM [196][197][198][199][200]. Intriguingly, Pol II and Pol V can dynamically modify chromatin topology in response to the hormone auxin, via synthesis of APOLO long noncoding RNA in A. thaliana [201]. To what extent the abovementioned alternative Pol II/Pol IV/Pol V functions are evolutionarily conserved is unknown, but DNA repair proteins like DDB2 and silencing factors like RDR6 have orthologs throughout terrestrial plants [202][203][204][205]. The discovery of diverse non-canonical siRNA pathways thus opens exciting new avenues for exploring Pol IV and Pol V transcription in model and crop species.
Both the canonical Pol IV-RdDM pathway ( Figure  4c, bold arrows) and emerging variant pathways ( Figure 4c, light arrows) are likely to exploit specific subunit variants or structural changes in plant noncoding RNA polymerases. The largest subunits of Pol IV and Pol V, and their common stalk domains each possess unique amino acid sequences and highly divergent domains (Figure 2) [23,25,108,206,207]. Theoretically, these unique subunit features must combine to account for the distinctive biochemical activities of Pol IV, of its partners SHH1, CLSY and RDR2, as well as of the related Pol V complex ( Figure  4c) [33,35]. As described above, researchers have begun exploring non-catalytic domains and isolating hypomorphic mutations in novel residues to test their contribution to the non-coding RNA specialization of Pol IV and Pol V [48,50,99,101,102]. Further structure-function analyses will be needed to fully understand the subunit assemblies of Pol IV and Pol V, and how their activities are regulated during plant growth and development. Finally, advances in protein purification and cryo-electron microscopy will, no doubt, one day reveal the precise structures of Pol IV and Pol V and give exquisite insights into their functional specialization.

Acknowledgments
BR and LF conceived and wrote the manuscript. LF, BR and TB designed and mounted the figures, and collectively edited the manuscript text. Marcel Böhrer provided valuable critical comments and helped correct the manuscript. The authors wish to apologize to any colleagues, in this highly productive scientific field, whose work we could not comprehensively cite or discuss due to space constraints. This work was supported by the LabEx consortia ANR-10-LABX-0036_NETRNA and ANR-17-EURE-0023 from French Investissements d'Avenir funds and by the French Agence Nationale de la Recherche grant ANR-17-CE20 -0004-01.

Disclosure statement
No potential conflict of interest was reported by the authors.