LncRNAs: genetic and epigenetic effects in plants

Abstract Long non-coding RNAs (lncRNAs) transcribed from the eukaryotic genome play important roles in essential biological processes, transcriptional and post-transcriptional gene regulation. LncRNAs act both in the nucleus and in the cytoplasm, mostly in association with chromatin in the nucleus. LncRNAs appear to be important regulators of gene expression, gene regulation and genome stability. This review outlines the major types of plant lncRNAs, their genetic and epigenetic effects with a focus on plant lncRNA instances, and discusses the recent advances in our understanding of their mechanism of action.


Introduction
One of the greatest challenges in modern day genomics is to fully reveal the functions of genes and genetic elements. The recent approaches, including wholegenome and RNA-sequencing (RNA-seq) studies and many biochemical techniques such as native RNA immunoprecipitation (nRIP), cross-linking and immunoprecipitation (CLIP), RNA immunoprecipitation, the RNA pull-down assay, chromatin isolation by RNA purification (ChIRP), capture hybridization analysis of RNA targets (CHART) and RNA antisense purification (RAP), have revealed that the transcription pattern in eukaryotes is much more complex than previously appreciated [1][2][3][4][5][6][7][8][9]. Especially, the human genome is pervasively transcribed, but less than 2% of these outputs are putatively functional RNAs that encode for proteins [10]. They mostly remain uncharacterized and make up the 'dark matter' of the genome. Long noncoding RNAs (lncRNAs) have been revealed to emerge from these dark regions of the genome [11]. For a long time, these transcripts remained unknown and considered to be functionless due to their lack of any protein-coding potential and presented as transcriptional noise (reviewed in [12]). However, over the past ten years, thousands of novel non-coding RNAs (ncRNAs), which involve no open reading frame (ORF), have been identified in many organisms, including human, animals and plants by using computational and experimental approaches, indicating that they are actually functional molecules and key players in many biological processes [11][12][13][14][15].
LncRNAs are categorized according to various properties, including transcript length, sequence and structure conservation, genomic location, functions exerted on DNA or RNA [32,33]. Based on length, the typical threshold value is 200 nt. LncRNAs shorter than 200 nt are termed small ncRNAs, and those longer than 200 bases are classified as lncRNAs [34]. However, the most common criterion for discrimination of lncRNAs is their genomic location. LncRNAs are grouped into six categories [32]: (a) Sense lncRNAs, which are transcribed from the sense strand of protein-coding genes and overlapping transcripts, comprise a coding gene within an intron on the same strand. (b) Intronic transcripts lying within the introns of a coding gene, which do not have exon-exon overlapping, are defined as sense intronic lncRNAs. (c) Antisense lncRNAs are transcribed from the antisense strand and they intersect any exon of a protein-coding locus on the opposite strand. (d) Long intervening non-coding RNAs (lincRNAs) with a length >200 bp are also called as long 'intergenic' non-coding RNAs which do not overlap protein-coding exons and reside within the genomic interval between two genes [35]. (e) Bidirectional lncRNAs are expressed within 1 kb of promoters in the opposite direction from the neighbouring protein-coding gene. (f) Enhancer lncRNA (elncRNA or eRNA), which are generally considered as <2 kb, are transcribed from enhancer regions of the genome and might contribute to enhancer function [36,37].
The majority of lncRNAs are transcribed by RNA polymerase II (Pol II) in eukaryotes such as mRNAs which are assumed as stable 'typical' or classic lncRNAs [38]. Studies demonstrated that, unlike mRNAs, lncRNAs are primarily found to be within the nucleus. Also, most of them can be in very low quantity in specific patterns and in diverse tissue types [34,39]. However, some lncRNAs are transcribed by RNA polymerase III (Pol III), which was previously believed to transcribe only tRNA and 5S rRNA [40]. Additionally, lncRNAs are transcribed by RNA polymerase IV (Pol IV) and V (Pol V) in plants [41,42]. The lncRNAs transcribed by plant-specific Pol V are involved in the process of RNA-directed DNA methylation (RdDM), which is a plant-specific de novo DNA methylation mechanism that requires long lncRNA. However, the siRNAs, which are combined into ARGONAUTE (AGO) (binding to chromatin), pair with lncRNAs transcribed by Pol V, which facilitates the recruitment of AGO to define the particular genomic loci [43,44].

Historical development of lncRNA research
In the 1990s and early 2000s, before the advent of gene knockdown techniques became widely available, the functions of lncRNAs were studied by using classical gene targeting strategies and genetic approaches. Xist and Tsix are two overlapping antisense-transcribed lncRNAs, which are involved in X chromosome inactivation in female mammals [45][46][47]. Additionally, in Drosophila, the functionally redundant roX1 and roX2, which regulate sex chromosome dosage compensation, were initially demonstrated by researchers in the late 1990s [48]. Moreover, Gtl2, Airn and Kcnq1ot1 lncRNAs have been shown to be transcribed from imprinted genomic loci [49][50][51], and these studies suggest the concept of chromatin associating lncRNAs which regulate epigenetic gene expression, serving as the 'Rosetta Stone' for recent analyses of lncRNAs, as reviewed by Nakagawa [52]. The advent of the gene knockdown technique and high-throughput genome-wide gene expression analyses facilitated the study of candidate lncRNAs with differential expression patterns in particular tissues or cells.
GmENOD40 (soybean), MtENOD40 (Medicago truncatula), TPS11 (tomato), OsPI1 (rice), LDMAR (rice) lncRNAs, which are associated with diverse biological processes, were identified and their physiological functions were initially demonstrated in plants [53][54][55][56]. For instance, ENOD40 lncRNAs, which are considered as 'riboregulators' residing in the cytoplasm associated with the growth control and differentiation, have been identified in different plant species such as Glycine max, Medicago truncatula, Medicago sativa, etc. [53]. An increasing number of plant lncRNA studies have contributed insight into the diverse biological roles of lncRNAs in different plant species. This review focuses on the genetic and epigenetic effects of lncRNAs.

Genetic effects of lncRNAs
The functions of lncRNAs are often deduced from their subcellular localization, i.e. in the nucleus, the nucleolus, the cytoplasm or in some cellular compartments. LncRNAs located in the nucleus and/or the cytoplasm were confirmed by our previous study via fluorescence in situ hybridization (FISH) in barley as shown in Figure 1 [unpublished data]. Other studies indicate that some lncRNAs may act as ribozymes or riboswitches at the RNA level [57]. However, lncRNAs mostly function as ribonucleoprotein particles (RNPs). In turn, RNPs may contain a different number of RNA partners, including snRNA, mRNA, telomerase RNA, etc. [58]. Many DNA-binding proteins involved in eukaryotic transcription are also able to bind lncRNAs in vitro and in vivo [59,60].
LncRNAs function in either cis or trans position by complementary sequence or homology with RNAs or DNA, and/or structure, by forming molecular frames and scaffolds for assembly of macromolecular complexes. Additionally, lncRNAs perform significant roles in gene regulation at different levels such as regulating DNA-binding activity by phosphorylation. Some lncRNAs regulate gene expression via diverse posttranscriptional mechanisms, although the regulation of gene expression is mostly controlled at the transcriptional level [61]. For instance, lncRNAs can act as decoy which inhibits the influence of regulatory proteins. In Arabidopsis, some lncRNAs have been demonstrated to interact with miRNAs as competitors by mimicking their targets. This type of lncRNAs serve as miRNA target mimics for inhibition of miRNA activity. The best known lncRNA as decoy for miRNA target mimicry in Arabidopsis is The Induced by Phosphate Starvation1 (IPS1) lncRNA, which competes with miR399 and binds miR399 to inhibit the degradation of PHO2 (PHOSPHATE2) [62]. Another lncRNA which is defined as a decoy of Arabibopsis lncRNA is Alternative splicing competitor lncRNA (ASCO-lncRNA), which binds the alternative splicing regulators and nuclear speckle RNA-binding proteins to regulate plant development by changing the alternative splicing patterns [63]. Recently, Cho and Paszkowski [64] discovered the retrotransposon-derived transcript MIKKI, which comprises multiple introns and has low coding potential. The fourth intron of MIKKI originates from an independent family of retrotransposons; thus, the splicing of this intron produces a binding site for miR171 at the exon-exon junction [64]. However, the cleavage of MIKKI mRNAs does not occur by miR171, so the miR171-binding site of MIKKI is not perfectly complementary to its cognate miRNA, there being two mismatches in the cleavage positions. The cleavage activity of miRNA is attenuated by such mismatches, which are considered a landmark of miRNA target mimicry [62,[64][65][66]. Some MIKKI knock-out rice plants lacking the target mimicking sequence show higher miR171 levels, whereas over-expression of MIKKI is associated with down-regulation of miR171 in rice roots [64,67].
LncRNAs can directly regulate Pol II by promoting the phosphorylation of transcription factors (TFs). The phosphorylation of TFs is controlled by regulating their DNA-binding activity via lncRNAs [68]. Transcription initiation and elongation are regulated through several eukaryotic lncRNAs by control of RNA Pol II pausing, functioning via transcriptional interference and as scaffolds recruiting chromatin remodelers (reviewed in [61]). They play important roles and affect the chromatin topology and nuclear organization [69]. Wang et al. [70] demonstrated that trans-acting lncRNA HID1 interacts with the chromatin of PIF3, which encodes a TF, and represses its transcription in Arabidopsis. Recently, CDF5 LONG NONCODING RNA (FLORE), a circadian-regulated lncRNA, which is a natural antisense transcript (NAT) of CDF5, has been identified and its functional characterization were carried out by Henriques et al. [71]. In Arabidopsis, FLORE and CDF5 antiphasic expression reflects mutual inhibition in a similar way to frq/qrf, which is an lncRNA encoding central oscillator component that is proposed as modulator of core clock function in fungi. Moreover, Henriques et al. [71] showed the CDF5/FLORE NAT pair formsan additional circadian regulatory module with conserved (mutual inhibition) and unique (function in trans) features. The unique features of CDF5/ FLORE NAT pair provide ability to fine-tune its own circadian oscillation, and consequently, adjusting the onset of flowering to favourable environmental conditions. Additionally, elncRNAs, which are clearly distinguishable from the canonical lncRNAs, perform important roles in gene activation, and they function as cis-acting scaffolds to recruit co-activator complexes. ElncRNAs regulate the chromatin topology by mediating the chromosome looping between enhancer and promoter regions ( Figure 2) [72,73]. Some studies have demonstrated that at least some elncRNAs play key roles in target gene expression. For example, the knockdown of several elncRNAs resulted in decreased expression of nearby target genes [74][75][76]. In a plasmid-based reporter system, the artificial tethering of elncRNAs upstream of a minimal promoter enhanced the reporter gene expression, and these results were found to be consistent with the proposed role of elncRNAs in transcriptional activation. However, the critical determinants of the activating function of elncRNAs, including sequence-or strandspecificity are not clear [74,77].

Epigenetic effects of lncRNAs
LncRNAs in plants can function during tissue development, sexual reproduction and in response to external stimuli such as drought, salinity, heat stress and infections [9,[78][79][80][81]. Transcriptional and posttranscriptional regulation of gene expression by lncRNAs is only recently beginning to be recognized, together with revealing the molecular mechanisms underlying this type of regulation. Studies indicated that lncRNAs can play important roles in gene regulation at different levels (reviewed in [61]).
As we mentioned above, some lncRNAs can act as decoys to prevent the influence of regulatory proteins to DNA or RNA by mimicking their targets. The mechanism is generally used for the inhibition of miRNA activity called 'target mimicry'. In Arabidopsis, lncRNAs including IPS1 and alternative splicing competitor ASCO-lncRNA have been demonstrated to interact with miRNAs as competitors which serve as miRNA target mimics. IPS1 lncRNA interferes with binding of ath-miR399 as a decoy for PHO2; it binds to mir399 pairing with a three-nucleotide bulge and miR399 targets the PHO2 mRNA for degradation. However, ASCO-lncRNA comprises an alternative splicing regulatory module to bind the regulators of alternative splicing, nuclear speckle RNA binding proteins (NSRs). ASCO-lncRNA hijacks nuclear alternative splicing regulators to alter the alternative splicing patterns to generate alternative splice isoforms during development ( Figure 2) [63].
Chromatin-bound lncRNAs are considered as scaffolds that are able to associate with chromatin-modifying enzymes involved in the recruitment of chromatin regulatory proteins to particular genomic DNA locations. LncRNAs scaffold function for the cooperative assembly of chromatin-modifying complexes to recruit them in either an sRNA dependent or an sRNA-independent manner. The findings indicate that lncRNA scaffolds recruit chromatin-modifying complexes independently of sRNAs. However, how protein complexes recognize, join or target lncRNAs still remains unclear. In plants, lncRNAs interact with chromatin-modifying complexes by lncRNA-mediated chromatin modifications (lncR2Epi) regulation pathway [82]. COLD ASSISTED INTRONIC NONCODING RNA (COLDAIR) and COOLAIR are involved in expression of FLOWERING LOCUS C (FLC) in cold-stressed Arabidopsis. These are the best known plant lncRNAs associated with the lncR2Epi regulation pathway [26,83]. FLC is a MADS-box TF which represses flowering under cold temperature [84]. On the one hand, COLDAIR physically associates with PRC2 to promote H3K27me3 accumulation at FLC [25]. On the other hand, COOLAIR mediates the repression by reducing in H3K36me3 or H3K4me2 at FLC during vernalization [28,82,85].
As previously outlined by Chekanova [61], plant lncRNAs can also participate in epigenetic silencing via the RdDM silencing pathway, which generally functions by silencing of repetitive sequences. This pathway typically requires the plant-specific RNA polymerases Pol IV and Pol V [41] with some involvement of Pol II [100]. Recent findings indicate that RdDM is a plant-specific de novo DNA methylation mechanism that requires lncRNA as scaffold to define target genomic loci [100]. Matzke and Mosher [101] reviewed that some lncRNAs transcribed by Pol IV produce 24nt siRNAs, while a group of lncRNAs generated by Pol V act as scaffold RNAs recognized via the siRNA-AGO complex through sequence complementarity. The illustration of complexity of siRNA biogenesis in plants centred that Pol IV produces most siRNAs, while Pol V, and to a lesser extent Pol II, generate the templates for siRNAs [102][103][104][105]. Based on their very low abundance or stability, discrimination of lncRNAs generated by Pol IV and Pol V is difficult. To date, several Pol Vtranscribed scaffold lncRNAs have only been shown to be non-polyadenylated and either tri-phosphorylated or capped at the 5' ends [41,61]. In Arabidopsis, Li et al. [106] identified that P4RNAs are Pol IV/RDR2dependent transcripts which are non-polyadenylated, correspond to both DNA strands and are derived from thousands of loci. Interestingly, the findings demonstrated that the 5' ends of P4RNAs carry a monophosphate instead of a 5' triphosphate, or a cap structure [61,106]. Recently, Au et al. [107] obtained sequences of RdDM-associated lncRNAs by utilizing nuclear RNA immunoprecipitation against ARGONAUTE 4 (AGO4), a key component of RdDM binding specifically with the lncRNA. They compared these lncRNAs with gene expression data of RdDM mutants and identified novel RdDM target genes. Interestingly, a large proportion of these target genes were normally activated by RdDM. These RdDM-activated genes were found to be more enriched for gene body lncRNA than the RdDMrepressed genes, suggesting that RdDM, or AGO4, may play a role in maintaining or activating gene expression by directing gene body chromatin modification preventing cryptic transcription.
In our laboratory, we performed physiological, molecular and fluorescence in situ hybridization (FISH) studies to reveal the effects of lncRNA on barley under salt stress conditions during the germination period. Especially, expression analysis demonstrated that the expression levels of AK363461 and AK370506 barley lncRNAs, additionally, the expression levels of CNT0018772 (maize) and CNT0031477 (rice) lncRNAs, were altered under 150 mmol/L salinity stress [81,108]. Salinity also affected the expression levels of sense lncRNA AK370814 of AK372815 barley gene, which is related to the vitamin B6 salvage pathway.
However, it was observed that the relative expression levels of sense lncRNA AK370814 tended to be upregulated, although the relative expression levels of the AK372815 gene tended to be down-regulated under salt stress conditions, indicating a relationship between the protein-coding gene and its own sense lncRNA, through alternative splicing [109]. In FISH analysis, AK363461 and AK370506 probes were labelled by tetramethylrhodamine-dUTP (TRITC). The results showed that the AK363461 and AK370506 probes revealed four signals and one signal under a confocal microscope, respectively. To our knowledge, this was the first report to demonstrate barley lncRNA localisations at the prophase stage in the nucleus and also to present a protocol for visualization of barley lncRNAs [108].

Bioinformatics approaches
The improvement of next generation sequencing (NGS) technologies, and the large number of sequencing projects have been generating enormous volumes of biological data [110]. Experimental (NGS technologies) and computational screenings generate fragments of transcripts; then the transcript sequences are mapped to a reference genome and the identified transcribed units are defined as the 'novel' RNAs [111][112][113]. To distinguish between coding and noncoding sequences, the most important and also popular criterion is similarity to known coding sequences or statistics of codon frequencies for coding potential [114].

Conclusions
Compelling evidence supports the involvement of lncRNAs' control in the epigenetic state of particular genes, participation in transcriptional regulation; involvement in alternative splicing and contribution of sub-nuclear compartments. However, our knowledge of the regulatory and functional roles of lncRNAs is still in its infancy, research being hindered by their diversity, low expression level and sequence divergence. To better understand their genetic and epigenetic functions, researchers are employing the latest advancements in computational technologies. Our recent studies demonstrated that lncRNAs were also associated with the responses to salt stress in barley, as well as drought in Arabidopsis, heat and biotic stress in wheat. It is hoped that accumulating data about the expression patterns, tissue distribution, subcellular localization, abundance and splicing patterns of lncRNAs under particular conditions will help to unravel the functional roles of lncRNAs on the genetic and epigenetic level.

Disclosure statement
No potential conflict of interest was reported by the authors.