Comparative analyses of phytochelatin synthase (PCS) genes in higher plants

Abstract Plants employ various defence strategies to ameliorate the effects of heavy metal exposures, leading to re-establishment of metal homeostasis. One of the strategies includes the biosynthesis of main heavy metal detoxifying peptides phytochelatins (PCs) by phytochelatin synthase (PCS). In the present study, 14 PCS homologues were identified in the genomes of 10 selected plants. The size of these PCSs was 452–545 amino acid residues, with characteristic phytochelatin and phytochelatin_C domains. The N-terminal site of the proteins is highly conserved, whereas the C-terminal site is less conserved. Further, the present study also identified fully conserved Cys residues involved in heavy metal binding reported earlier. In addition, other preserved cysteines, with minor substitutions Cys(C)→Ser(S) or Tyr(Y) or Trp(W), were also identified in the PCS sequences that might be associated with metal binding. The reported catalytic triad residues from Arabidopsis, Cys56, His162 and Asp180, are all conserved at the respective sites of PCSs. A clear monocot/dicot separation was revealed by phylogenetic analysis and was further corroborated by the exon–intron organisations of the PCS genes. Moreover, gene ontology terms, co-expression network, cis-regulatory motif and miRNA analyses indicated that the complex as well as dynamic regulation of PCSs has significant involvement in different metabolic pathways associated with signalling, defence, stress and phytohormone, in addition to metal detoxification. Moreover, variations in protein structure are suggested to confer the functional divergence in PCS proteins.


Introduction
Beginning from the mid-20th century, the anthropogenic heavy metal pollution, from traffic, metal industries and mining, has been reported to pose serious threats to all living organisms [1]. Further, plants are exposed to heavy metals, particularly in contaminated environments [2]. However, plants utilize their defence strategies to ameliorate the effects of these adversities. This in turn helps plants to re-establish their homeostasis and progress with their stages of development [3]. Some strategies constitute the formation of complexes with organic molecules to reduce heavy metal availability [4]. These organic molecules include organic acids, malate, citrate, low molecular weight protein, metallothionein (MT), low molecular weight peptides, phytochelatins (PCs) and glutathione (GSH) [4,5]. Phytochelatins are heavy metal-binding peptides in plants with a (c-Glu-Cys)n-Gly structure (n ¼ 2-11) [6]. However, in some plants, the C-terminal Gly can be replaced by serine as (c-Glu-Cys)n-Ser, glutamine as (c-Glu-Cys)n-Gln, glutamate as (c-Glu-Cys)n-Glu and alanine as (-c-Glu-Cys)n-b-Ala [7]. They are enzymatically synthesized from GSH by phytochelatin synthase (PCS) (Figure 1). PCS proteins demonstrate high similarity in their N-terminal domains, whereas their C-terminal domains are less conserved. The N-terminal core domain has been reported to confer the PCS activity, whereas C-terminus ensures improved protein stability, higher PCS activity and broader heavy metal spectrum [6,9,10].
PCS is constitutively expressed; however, heavy metals are major inducers for its activity [11]. PCs are easily induced when exposed to heavy metal ions such as cadmium (Cd), nickel (Ni), copper (Cu), zinc (Zn), silver (Ag), mercury (Hg), lead (Pb) and arsenic (As). Actually, Cd is one of the most effective activators for PC [9]. It was first proposed that the variable C-terminal site could bind the heavy metals via conserved Cys residues and translocate them to the catalytic N-terminal domain [12]. Also, Cd-binding assays revealed five conserved Cys residues in the N-terminus of AtPCS1, involved in the Cd 2þ binding, which is essential for PCS activity [13]. Later, another model suggested that the protein does not directly bind to Cd 2þ but instead forms a PCn-heavy metal thiolate complex as an acceptor molecule [11]. Site-directed mutagenesis demonstrated three conserved residues (Cys56, His162 and Asp180) in the N-terminal site and their substitutions resulted in the complete loss of AtPSC1 activity [6,11,14].
PCS genes have been identified from various plant species and some microorganisms. These include Triticum aestivum [15], Oryza sativa [16], Arabidopsis thaliana [17], Brassica juncea [18], A. halleri [19], Thlaspi caerulescens [11], Schizosaccharomyces pombe [20] and Caenorhabditis elegans [21]. Multiple earlier studies have demonstrated the role of PCS in heavy metal detoxification (reviewed in [5]). For instance, AtPCS1 shows slightly higher expression under Cd exposure during the early stages of development [22]. A similar expression pattern is reported for TaPCS1 in the roots of 4-day-old wheat seedlings under Cd treatment [20]. One-week Cd applications in adult plants cause a significant rise in AtPCS1 expression [23]. Transgenic Indian mustard (B. juncea) with moderate AtPCS1 expression has indicated significantly higher tolerance to Cd and Zn stresses. However, they showed lesser accumulation than wild types in both shoot as well as root tissues [24]. In the model legume Lotus japonicus, alternatively spliced variants of PCSs, LjPCS1-3 show variable responses towards Cd treatments [25]. The overexpressed PCS (NtPCS1) cause elevation in the tolerance to Cd and arsenite in transgenic tobacco plants [26]. In addition to metal detoxification activity, new functions have been also reported for both the PCS and the PC [19]. PCs could be involved in Zn-sequestration [27] and long-distance transport of Cd 2þ [28]. The catabolism of GS conjugates in plants has been linked to the peptidase as a second enzymatic activity of PCS [29,30]. Further, another study suggests that AtPCS1 is involved in the plant innate immunity [31].
The prospect of preventing plant-based heavy metal toxicity and utilising plants for the bioremediation of heavy metal polluted sites is an emerging issue [32]. Thus, exploration and understanding of the molecular basis of major genes involved in heavy metal detoxification were the primary tasks of the present study. In this sense, this work attempted to understand the PCS gene, which is responsible for the biosynthesis of one of the main heavy metal detoxifying peptides-PCs-in plants. Moreover, potential PCS genes were identified at genome-wide scale in 10 different plant species and were comparatively investigated at the primary, secondary and tertiary level.

Sequence analyses of PCSs
Two known Arabidopsis PCSs, PCS1 (Q9S7Z3) and PCS2 (Q9ZWB7), were obtained from the UniProt [33] database to be exploited as reference. These references were queried against the proteome datasets of 10 selected higher plants in the plant genomic resource Phytozome v11 [34,35] database. The studied species included A. thaliana, Populus trichocarpa, Solanum lycopersicum, Medicago truncatula, Cucumis sativus, Phaseolus vulgaris, Brachypodium distachyon, Sorghum  [8]). GSH is synthesized by two sequential enzymatic reactions by cEC synthetase and GSH synthetase from precursors, glutamine, cysteine and glycine. Subsequently, PC synthase catalyses the biosynthesis of PCs from GSH. PCs are related to glutathione (GSH) with a (c-Glu-Cys)n-Gly structure. However, in some plants, C-terminal Gly can be replaced by serine as (c-Glu-Cys)n-Ser, glutamine as (c-Glu-Cys)n-Gln, glutamate as (c-Glu-Cys)n-Glu and alanine as (-c-Glu-Cys)n-b-Ala. Multiple sequence alignment of PCS proteins in 10 plant species. Sequences are aligned by ClustalW, and identical and similar residues are shaded as black and grey, respectively, with a very strict (100%) threshold. Highlighted red and yellow columns show, respectively, the strictly conserved Cys residues and catalytic triad (Cys56, His162 and Asp180 in Arabidopsis [6]. Highlighted green columns also show the conserved Cys residues but some amino substitutions are observed such as Cys(C)!Ser(S)/Tyr(Y)/Trp(W). Blue and purple lines below sequences indicate the approximate locations of Phytochelatin (PF05023) and Phytochelatin_C (PF09328) domains, respectively. bicolor, Zea mays and Oryza sativa. Subsequently, the sequences were retrieved by application of a Hidden Markov Model search for protein domains verification by Pfam [36,37] (Figure 2). The physicochemical properties of the PCS proteins were determined by using the ProtParam tool [38]. The CELLO server was utilized further to predict the subcellular localisations [39]. The MEME tool [40] was exploited to perform a  . Block diagram representation of the most conserved 10 motifs in PCS protein sequences. Motifs were more uniformly distributed in the N-terminal site of the enzyme. Motif 1 is indicated with red, motif 2 with cyan, motif 3 with light-green, motif 4 with purple, motif 5 with gold, motif 6 with green, motif 7 with blue, motif 8 with pink, motif 9 with orange and motif 10 with yellow.
search for conserved protein motifs [41] with the following parameters: max number of motifs to find, 10; and min/max width of motifs, 6-50 ( Figure 3). ClustalW was used for alignment of protein sequences [42]. Further, the phylogenetic tree was generated by using MEGA 7 [43] with the maximum-likelihood (ML) method that was based on the JTT matrix-based model [44] for 1000 bootstraps [45]. Initial tree(s) for the heuristic search were generated automatically by application of Neighbour-Join and BioNJ algorithms to a matrix of pairwise distances. The distances were estimated using a JTT model. This was followed by selection of topology with the help of superior log likelihood value. The analysis involved 14 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 438 positions in the final dataset. Gene ontology (GO) terms such as cellular component, molecular function and biological process were retrieved based on the predefined 'Templates' using InterMine interface in the Phytozome database ( Figure 4).

Gene structure and promoter analyses
Exon/intron structures of PCS genes were extracted from Phytozome v11 database. From the transcription start site (TSS), 1000 bp upstream regions of PCSs were retrieved from Phytozome, where the database has an internal option to allow retrieving up-or down-stream regions of a given gene. Further, they were delivered to the PlantCARE database for promoter analysis [46,47]. PCS coding sequences were scanned for putative miRNA targets in the psRNATarget database [48,49]. We employed a maximum expectation score of 5 for higher prediction coverage.

Co-expression network of Arabidopsis PCS
A co-expression network analysis was performed in order to understand the PCS genes in a complex cell environment at the molecular level. The network was constructed for gene AtPCS1 using ATTED-II server v8 [50,51]. KEGG (Kyoto Encyclopedia of Genes and Genomes) data of connected genes were extracted from the gene network. ATTED-II is a co-expression database for plant species using multiple co-expression data sets and network analysis tools [51,52]. For Arabidopsis, the database included 20,836 genes from microarray (15,275 samples) and 25,296 genes from RNA-seq (1401 samples) data.

Homology modelling of PCSs
Three-dimensional (3D) models were predicted using the Phyre 2 server [53,54] at intensive mode. The crystal structure of Alr0975 (2BTW, structures of transferase from Nostoc sp. PCC 7120) was used in homology modelling with resolution of 2.0 Å. The model quality was checked by Ramachandran plot analysis [55]. We used the VADAR 1.8 server [56] for secondary structure analysis [57]. Structural overlap percentage was calculated using the CLICK server based on the alpha-carbon superposition of models [58,59].

Sequence analysis of PCSs
Using two known Arabidopsis AtPCS1 and AtPCS2 sequences as references, 15 putative PCS homologues were identified in the genomes of 10 selected plant species with high e À161 value (Table 1). Genomewide search revealed that A. thaliana, P. trichocarpa, M. truncatula and S. bicolor harbour two PCS homologues in their genomes, whereas S. lycopersicum, C. sativus, P. vulgaris, B. distachyon, Z. mays and O. sativa only contain a single PCS gene. All PCS proteins were characterized with both Phytochelatin (PF05023) and Phytochelatin_C (PF09328) domains ( Figure 2). They are 452-545-residue proteins with molecular weight in the range of 51.55-60.38 kDa and pI values in the range of 5.29-6.60. The transcripts of PCS genes included 6-10 exons ( Figure 5A-B). Interestingly all grass species possessed six exons, suggesting that PCS gene structures might be well conserved in the monocots during the evolutionary history of the PCS genes. In variations of exon-intron organisations, three main mechanisms have been proposed, including exon-intron gain/loss, exonisation/pseudoexonisation and insertion/deletion [60]. In addition, alternative splicing is also considered to be another important factor responsible for tissue localisation differences or stress responses [61]. A recent study reported alternatively spliced transcripts of rice OsPCS2 gene to be involved in alleviation of Cd and As stresses in a tissue-specific way [62]. In another recent work, four rice OsPCS1 transcript variants and one OsPCS2 were isolated and their expression in PCS-deficient Arabidopsis and Schizosaccharomyces pombe mutants revealed PCS activity in response to Cd and As in the longest PCS1 variant, OsPCS1full [63]. In this regard, it was quite possible that the evolutionary forces could attribute exon-intron variations between monocot and dicot species. Conserved motif/domain analysis PCS proteins were multiple-aligned to figure out the potential conserved residues or domains that were involved in the enzyme activity and stability. The strict shading of the identical and similar residues on the alignment revealed highly conserved residues ( Figure 2). Further, PCS proteins have been reported to have high similarity in their N-terminal domains, whereas C-terminal domains remained less conserved as confirmed in earlier studies. The N-terminal core domain is also reported to confer the PCS activity, whereas the C-terminus provides improved protein stability, higher PCS activity and broader heavy metal spectrum [9,10]. In addition, conserved Cys residues at N-(five in AtPCS1) and C-terminal regions have been observed to be involved at the sites of the heavy metal (e.g. Cd 2þ ) binding [12,13]. Moreover, studies from site-directed mutagenesis revealed three conserved residues, Cys56, His162 and Asp180, as a catalytic triad at the N-terminal site of the enzyme. Substitution of these residues caused the complete loss of AtPSC1 activity [6,11,14]. In light of the above facts, alignment analysis showed that the N-terminal site of the studied PCS proteins was highly conserved, whereas the C-terminal site turned out to be less conserved ( Figure 2). Six columns of fully conserved Cys residues including five in the N-terminus and one in the C-terminus (red highlights on alignment) were identified in the aligned sequences. In addition, other near full columns of Cys residues but with some substitutions Cys(C)!Ser(S) or Tyr(Y) or Trp(W) were also present in the PCS sequences (green highlights on alignment). These minor substitutions at Cys residues might be associated with the metal-binding properties of sequences, considering the roles of conserved Cys residues during heavy metal binding [12,13].
Moreover, reported catalytic triad residues (Cys56, His162 and Asp180 in Arabidopsis) were also fully conserved in the corresponding sites of all sequences (yellow highlights on alignment). The approximate locations of the PCS domains such as phytochelatin (PF05023) and phytochelatin_C (PF09328) also indicated the sequences given below with the blue and purple lines, respectively, where the catalytic triad localized in the phytochelatin domain. Thus, inferring from the above-given studies, the presence of the catalytic triad and the conserved Cys residues implied that they may be somehow involved in PCS activity and heavy metal binding. Furthermore, most conserved 10 motif sequences in PCS proteins were analysed using the MEME tool (  DNPKTVISGTVV' and motif 10 of 'DSLTYIAASVCCQ GAEMLTGN'. Motifs 1-6 were related with the phytochelatin domain (PF05023) structure, motifs 9-10 with the phytochelatin_C domain (PF09328), whereas motifs 7-8 did not relate to the domain structures. Particularly, motifs 1 and 2 were of significance due to the localisation of the catalytic triad in these motifs. The patterns of distributed motifs along sequences were also relatively more uniform at the N-terminal site. This also corroborated the earlier reports [9,10]. In addition, all 10 motifs were also available in all the PCSs and were distributed along the sequences. This confirmed that the PCSs were well conserved in higher plants.

GO annotations of PCSs
GO terms, 'cellular component', 'molecular function' and 'biological process' provide the controlled vocabulary for genes or gene products. These are quite essential for the understanding of the diverse molecular functions of proteins [64]. Herein, enrichment of the PCSs with GO terms was performed using the InterMine interface of the Phytozome database (  . Cis-regulatory elements distribution at þ 1000 bp upstream regions of the 14 PCS genes. Many regulatory elements were revealed in promoter sites of PCS genes but they were broadly categorized as cis-acting elements (dark blue segment), light responsive elements (red segment), hormone-responsive elements (green segment), tissue-specific elements (purple segment) and stress-responsive and other elements (light blue segment).
substance' (GO:0046685), 'response to cadmium ion' (GO:0046686), 'defence response to fungus' (GO:0050832) and 'defence response by callose deposition in cell wall' (GO:0052544). However, the categories of 'response to metal ion' and 'phytochelatin biosynthetic process' as biological processes were common in the all the PCS sequences. As for the 'cellular components', only some sequences were inferred with terms 'chloroplast' (GO:0009507) and/or 'cytosol' (GO:0005829).
Overall, the presence of terms related to phytochelatin biosynthesis and metal-binding activities in the studied PCSs was reasonably understandable. However, ontology terms associated with different functions, such as signalling, defence, stress and phytohormone, indicated that PCSs could also have some other functions in addition to heavy metal detoxification. This was in agreement with some recent studies, suggesting novel functions for PCS and PC [19,65,66]. For example, PCs could be involved in Zn-sequestration [27] and long distance Cd 2þ transport [28]. The catabolism of GS conjugates has recently been linked to the peptidase as a second enzymatic activity of PCS [29,30]. In another study, AtPCS1 was suggested to be involved in the plant innate immunity [31]. All these observations implicated that PCSs could also be involved in various other metabolic processes. Therefore, further elucidation in terms of better understanding of their roles in metal homeostasis/detoxification is awaited.

Phylogenetic analysis of PCS proteins
Another major objective of the present study was to comparatively analyse the phylogenetic distribution of PCS proteins. A phylogenetic tree was constructed by MEGA 7 with the ML method ( Figure 5A). The tree demonstrated two major clusters as groups A and B, and was based on the clustering topology. Group A was further subdivided into two subgroups, A1 and A2. Group A (A1 and A2) only contained dicot species, whereas group B had monocots. Subgroup A1 included the sequences of both Arabidopsis PCSs, poplar (Potri.014G195800.1), tomato (Solyc09g072620.2.1) and Medicago (Medtr7g097190.1), whereas subgroup A2 had poplar (Potri.014G195900.1), Medicago (Medtr7g097200.1), common bean and cucumber sequences. However, all monocots including Brachypodium, Sorghum, maize and rice sequences were clustered in group B. Thus, a clear monocot/dicot separation was apparent in the analysed sequences. This divergence was also obvious in the exon-intron organisations of the PCS genes where all monocots included a fixed number of six exons ( Figure 5B). These variations between monocots and dicots might also be the results of exon-intron gains or losses during the time of monocot/dicot divergence. Ramos et al. [25] reported that phylogenetic tree with 20 PCS sequences showed separate clades for cyanobacteria, yeasts, nematodes, ferns and higher plants, as well as with divergence of monocots and dicots. Further, monocot-dicot divergence of 27 PCS protein sequences from bacteria and eukaryotes was identified with a higher bootstrap value. Moreover, AtPCS1 and AtPCS2 were grouped together [29]. In addition, PCS homologues of the same species, Medicago and poplar, were distributed in separate clusters. One variant was observed in subgroup A1 and another in subgroup A2. This suggested that in addition to metal homeostasis/detoxification, PCSs could also participate in different metabolic processes. Inferred PCS GO terms associated with signalling, defence, stress and phytohormones as well as some reports [19,27,28,30,31] support this suggestion.

Co-expression analysis of PCS genes
To understand the interactions of PCS genes in the complex cell environment, a co-expression network analysis was performed. The network was constructed for the AtPCS1 (At5g44070) gene using the microarray and RNA-seq datasets in ATTED-II ( Figure 6). Six genes directly connected with AtPCS1 on the network were identified, including syntaxin of plants 121 (At3g11820), S-adenosyl-L-methionine-dependent methyltransferases superfamily protein (At1g55450), HOPZ-ACTIVATED RESISTANCE 1 (At3g50950), seventransmembrane MLO family protein (At1g11310), calcium-dependent protein kinase 4 (At4g09570) and Ypt/Rab-GAP domain of gyp1p superfamily protein (At3g49350). The syntaxin family proteins played important roles in vesicle sorting, docking and fusion in the secretory pathway [67]. The vesicle-trafficking protein SYP121 (SYR1/PEN1) was associated with ion channel control at the plasma membrane of stomatal guard cells [68]. Plant S-adenosyl-L-methioninedependent methyltransferases (SAM-Mtases) were the major enzymes in the phenylpropanoid, flavonoid and many other metabolic pathways [69]. Plant resistance (R) proteins involved in the plant defence against potential pathogens, e.g. Arabidopsis R protein HOPZ-ACTIVATED RESISTANCE 1 (ZAR1; At3g50950) [70]. The seven-transmembrane domain proteins were homologues of barley mildew resistance locus o (MLO) and  were localized in the plasma membrane [71]. The MLO gene family has been reported to be crucial in the resistance to the powdery mildew disease [72]. Calcium-dependent protein kinases have been reported to be present in all plants. They are confirmed in an earlier study to play important roles in metabolism, osmosis, hormone response and stress signalling pathways [73]. Ypt/Rab proteins modulate the vesicular protein transport in all eukaryotic cells [74]. In addition, based on KEGG pathway data, AtPCS1 is connected to the plant-pathogen interaction pathway with two genes (At4g09570 and At4g26090). The primary interaction partners of AtPCS1 were associated with various pathways including defence, stress, signalling, secondary metabolite production and vesicle trafficking, indicating that PCS proteins might also participate in different metabolic processes, in addition to their metal detoxification activities. Some reports from recent studies also corroborated this, but further studies under various perturbations are essential to gain further insights into the PCS proteins.

Promoter site analyses
The promoter sequence analysis was performed to obtain insights about the regulation of identified PCS genes. The 1000 bp upstream flanks of PCSs from the transcription start site (TSS) were supplied to the PlantCARE database for cis-regulatory element analysis (Figure 7). A number of cis-regulatory elements were revealed in the promoter sites of PCS genes and were broadly categorized as cis-acting elements, light responsive elements, hormone-responsive elements, tissue-specific elements, stress-responsive elements and other elements. Ten different types of cis-acting elements were present in PCS promoters including CAAT-box, TATA-box, 5UTR Py-rich stretch, TA-rich region, AT-rich element, AT-rich sequence, Box III, CCAAT-box, OBP-1 site and A-box. However, all PCS genes shared CAAT-and TATA-box motifs. The light responsive elements formed the largest number of cisregulatory motifs in the PCS promoters with 27 members. The seed germination, seedling photomorphogenesis, shade-avoidance and photoperiod responses are some light-regulated processes. In addition, recent genomic studies also revealed that light induces massive reprogramming in the plant transcriptome with involvement of various transcription factors [75]. Therefore, this provided some clues about the presence of a high number of light responsive motifs in the promoter regions of the genes. Eleven types of identified hormone-responsive elements included CGTCA-motif, TGACG-motif, TCA-element, ERE, TGAelement, AuxRR-core, ABRE, motif IIb, P-box, TATC-box and GARE-motif. They were associated with plant hormones, methyl jasmonate, salicylic acid, ethylene, auxin, abscisic acid and gibberellin. Phytohormones have been reported to be involved in the regulation of numerous metabolic processes, e.g. seed germination, flowering, senescence, stress response and fruit ripening [76]. Therefore, it appeared that PCS genes might be modulated by various internal and external factors. Five types of tissue-specific elements were identified in the PCS promoters including CCGTCC-box, CAT-box, GCN4_motif, Skn-1_motif and as-2-box. These regulatory elements were associated with meristem, endosperm and shoot tissues. In addition, PCS promoters also included stress-responsive elements related with drought, fungal elicitor, low temperature, wound and cold. Heavy metals in plants could cause oxidative stress, leading to cellular damage via lipid peroxidation, protein oxidation and DNA damage. PCs are one of the major molecules chelating heavy metals for detoxification [77]. Herein, the existence of various types and number of stress-responsive elements in PCS promoters indicated that PCS genes might also play important roles in stress conditions. Taken together, the presence of diverse cis-regulatory elements connected with various metabolic processes confirmed the complex as well as dynamic regulation of PCS genes in the plant genomes.

Predicted miRNA-PCS targets
MicroRNAs (miRNAs) are non-coding endogenous RNAs, often consisting of 21-25 nucleotides. They function either by suppressing the translation of a target gene or by degrading the target mRNAs post-transcriptionally. miRNAs play important roles in many metabolic activities such as morphogenesis, signal transduction, various biotic/abiotic stresses, cell development and differentiation [78,79]. In the present study, a total of 53 miRNA types were identified for the adopted search parameters (described in Materials and methods  [81]. In a different study, miR396, 397, 398 and 408 were identified as related to Cd exposure in B. napus [82]. In radish (Raphanus sativus), Cd-responsive conserved microRNAs such as miR156, 159, 160, 164, 167, 169, 395, 397 and their target genes were identified in Cd-free (CK) and Cd-treated (Cd200) libraries from radish roots. These authors also proposed that target genes of Cd-responsive miRNAs were phytochelatin synthase 1 and iron and ABC transporter proteins [83]. In two soybean genotypes (Huaxia3 and Zhonghuang24), gma-miR397a was observed to be a cadmium tolerance-associated miRNA [84]. Huang et al. [85] reported that the expression levels of bnamiR393 in leaves, bna-miR156a, bna-miR167a/c in roots and leaves, and bna-miR164b and bna-miR394a/ b/c in all tissues of rapeseed (Brassica napus) under Cd exposure showed increase. The above-reported miRNAs from Cd treatments were one of the most effective activators for PC [9]. Thereby, PCS were also identified for targeting some PCS sequences: Arabidopsis (AT1G03980.1) was targeted by ath-miR156ai-5p with cleavage, Solanum was targeted by sly-miR156a-c with cleavage, Medicago (Medtr7g097190.1) targeted by mtr-miR156g-5p and mtr-miR160a,b,d,e with cleavage, and by mtr-miR164a-c with translation, Populus (Potri.014G195900.1) targeted by ptc-miR395a with translation, Brachypodium targeted by bdi-miR395d-3p and bdi-miR395p-3p with cleavage, Sorghum (Sobic.003G024200.1) targeted by sbi-miR397-5p with cleavage and Sobic.003G024300.1 targeted by sbi-miR159b with translation, Zea targeted by zma-miR169n-3p with translation and Oryza targeted by osa-miR167h-3p with cleavage. The above observations suggested that the regulation of PCS genes in heavy metal detoxification is a very dynamic process. Moreover, it is transcriptionally controlled by variable miRNAs either by suppressing the translation of the target gene or by degrading the target mRNAs. Also, various miRNA types implied a highly complex regulation of metal homeostasis at a species-dependent level [86,87].

Secondary and tertiary structures of PCSs
Finally, PCSs were investigated at secondary and tertiary structural levels to gain further insights about the functional forms of proteins. It is well documented that divergence of protein structures was observed less in comparison to sequence divergence. This indicated that selective constraints favoured to preserve the protein structure [88]. In addition, many studies show relation of the protein structural information to the amino acid replacement process, thereby resulting in protein divergence or evolution. For example, Goldman et al. [89] reported that there is a significant relationship between secondary structure environment and amino acid replacement process. Another study introduced an evolutionary model combining the protein secondary structure and amino acid replacement [88]. In another study on protein structural information, transmembrane proteins were employed for evolutionary inference [90]. In the present study, the secondary structures of PCS proteins ranged from 29 to 41% for a-helices, from 6 to 14% for b-sheets, from 46 to 59% for coils and from 19 to 27% for turns (Table 3). Although the above secondary structure entities did not show much divergence, some variations were also present. Considering the reports from earlier studies, as well as the findings of this work, PCSs were suggested to play different metabolic roles in the plants. Therefore, it is quite reasonable to claim that these variations in secondary structures might account for the functional diversities of PCS proteins. 1'aa' stands for amino acids and indicates the number of amino acid residues in the given secondary structure. '%' stands for percentage and indicates the percentage of residues in the given secondary structure. Ã The percentage of secondary structure elements was calculated by using the VADAR 1.8 server.
Moreover, using the Phyre2 server at intensive mode for all 14 PCS sequences, we generated 3D models of the PCS proteins in the present study. Template '2BTW' from structures of transferase from Nostoc sp. PCC 7120 with resolution of 2.0 Å was used in homology modelling (Figure 8). The qualities of generated models were satisfactorily verified by Ramachandran plot analysis where !90% of residues were in the allowed region. The models of Arabidopsis AtPCS1 (AT5G44070.1) and 2 (AT1G03980.1) were then superimposed with models of other PCSs on the basis of the alpha carbon superposition in order to figure out the degree of conservancy, thereby protein evolution. The superposed structural overlaps ranged from 41.44 to 56.91% for AtPCS1 and 45.58 to 60.84% for AtPCS2. The highest structural overlap (60.84%) was between dicot AtPCS2 and monocot ZmPCS, whereas the lowest value (41.44%) was noticed between dicots AtPCS1 and MtPCS (Medtr7g097200.1). Interestingly, contrary to the general acceptance that suggested divergence of protein structures occurs much less rapidly in comparison to divergence of protein sequences [88], in the present work, the 3D structures of PCS proteins were relatively less conserved when compared with primary protein sequences (refer to 'sequence analysis' section). So, it could be speculated that these variations in protein folding state/conformation might be somehow related to the diverged protein functions, however further experimental analysis is required to confirm this.

Conclusions
The present study identified 14 putative PCS homologues in the genomes of 10 selected plant species (A. thaliana, P. trichocarpa, S. lycopersicum, M. truncatula, C. sativus, P. vulgaris, B. distachyon, S. bicolor, Z. mays and O. sativa) responsible for the biosynthesis of one of the main groups of heavy-metal detoxifying peptides phytochelatins (PCs) in plants. PCS proteins showed variations in protein length and some differences in the C-terminal site, indicating functional diversities of PCSs in plants. In addition, vital residues such as Cys56, His162 and Asp180 are well conserved in PCSs, and PCSs were found to be related to many metabolic pathways, proving the importance of PCS genes in plant metabolism as well as metal detoxification. Thus, this study was an initial, theoretical step that can provide a basis for future studies into the PCS genes in higher plants.

Disclosure statement
The authors declare that they have no conflict of interest.