Genome-wide identification, phylogeny and expression analysis of the bZIP gene family in Alfalfa (Medicago sativa)

Abstract Among eukaryotic transcription factors, the basic leucine zipper (bZIP) transcription factor is the most widely distributed and the most conserved protein. Alfalfa (Medicago sativa) is the largest forage crop in the world. However, few studies have evaluated the bZIP family in Alfalfa. In this study, bZIP genes in Alfalfa were identified, with 57 MsbZIPs being distributed into 11 sub-families in the phylogenetic tree. Motif and gene structure analysis revealed that the eleven subfamilies had similar motifs and gene structures, 15 cis-acting elements in MsbZIPs, including eight elements in hormones and four in plant stress resistance; 57 MsbZIPs were localized on 8 chromosomes, among which chr3 and chr4 had more collinearities, and MsbZIPs had more collinearity with the soybean. Tissue specific expression revealed that MsbZIPs were highly expressed in leaves and roots. These findings formed a foundation for further studies on functional characteristics, evolution and biological functions of bZIP transcription factors in alfalfa.


Introduction
In plants, transcription factors play a key role in regulating many important biological processes [1]. It is essential to understand the functional characteristics of transcription factors in order to study their transcriptional regulatory networks and biological processes [2]. As one of the largest transcription factor families in plants, bZIPs have many different classifications. The bZIP protein has a conserved domain that is composed of 40-60 amino acids, which contains a basic DNA binding domain. This protein can bind specific DNA sequences through a fixed N-x7-R/K structure, and a leucine zipper dimer domain that is tightly bound to the basic region. There is one leucine in the seventh position of every seventh amino acid and others at the third and fourth positions of hydrophobic residues. The leucine zipper forms an amphiphilic α helix, which can affect the dimerization of bZIP proteins before binding with DNA [3]. In addition to the bZIP domain, the transcription factor family also contains other domains, such as R/KxxS/T and S/txxd/E domains, which are the phosphorylation sites of Ca 2+ independent protein kinase and casein kinase II [4]. In addition, the proline rich region, glutamine rich region and acid domain also played important roles in transcriptional activation of bZIP genes [5].
In plants, the bZIP gene family is involved in a variety of biological processes, including seed maturation and germination, flower development, light signal and responses to biological and abiotic stresses, hormonal responses as well as abscisic acid (ABA) signalling. Chen et al. [6] reported that OSbZIP74, a specifically expressed gene in rice (Oryza sativa L.), was expressed during stamen development and its mutations can result in male sterility. The AtbZIP18 gene in Arabidopsis interacted with AtbZIP34 as well as AtbZIP52 and was involved in pollen development [7]. The bZIP protein was also involved in plant root growth and development. AtbZIP29 regulated the expression of genes associated with cell wall formation in root tip meristematic cells, affecting root growth and development [8]. ZmbZIP4 in Maize (Zea mays L) could directly bind to promoters of genes associated with root development, and promote their transcription [9]. Cell division in Arabidopsis leaves was inhibited by the bZIP gene, AtORG3, which suppresses leaf enlargement [10]. The OsbZP48 gene in rice inhibits OsKO2 expression, and the overexpression of OsbZIP48 inhibited the synthesis of gibberellin, resulting in plant dwarfing [11]. There are 131 bZIP gene family members in soybean (Glycine max L), with more than one third of them being involved in at least a defence response to ABA, salt, drought and cold stress [12]. Overexpressed BnbZIP2 (Boehmeria nivea) in transgenic Arabidopsis plants enhances their sensitivity to drought, and their tolerance to salt stress, when compared to wild-type plants [13]. In rice, the ABF1 gene is involved in drought responses by regulating the ABA pathways. Its overexpression has been shown to significantly enhance drought tolerance in transgenic rice. The ABF1 gene can directly regulate the expression of OsbZIP23, OsbZIP46 and OsbZIP72 of the bZIP family [14]. Many bZIP family genes, including ABP9 in maize [15], ABF3 in Arabidopsis [16], OsbZIP23 in rice [17], and TabZIP60 in wheat [18] have been reported to improve the tolerance of transgenic plants under abiotic stress, especially in Arabidopsis and rice.
Alfalfa (Medicago sativa) is an autotetraploid, cross-pollinated leguminous plant. It is a high-quality leguminous forage with high nutritional values that can be a source of proteins, cellulose, minerals, vitamins and other nutrients for communal animals [19]. It also has the potential as a sustainable cellulose feedstock in ethanol production [20]. In addition, alfalfa can improve soil quality, promote biodiversity and improve soil fertility through symbiotic nitrogen fixation. Therefore, alfalfa can be used in ecological restoration. In this study, 57 bZIP members of Alfalfa (Medicago sativa) were identified according to the reference-genome sequence, and their locations, evolutionary relationships, gene structures, motifs and collinearity evaluated. Our findings elucidate the functions of the bZIP gene family in Alfalfa, and could form the basis for further studies.

MsbZIP member identification and phylogenetic analysis
The HMMER software (http://hmmer.janelia.org/) and the Pfam protein family database (http://pfam.sanger. ac.uk/) [21] were used to identify candidate bZIP proteins at the bZIP domain (PF00170). The protein annotation file was retrieved from the Esembl plants website (http://plants.ensembl.org/index). Subsequently, P3DB (http://www.p3db.org/) and ExPASy Proteomics Server (http://prosite.expasy.org/) were used to verify the integrity of the bZIP domain in candidate genes. Each bZIP member was named according to their precise position on the chromosomes.bZIP protein sequences of other species such as Arabidopsis and maize (Zea mays L.) were searched using the Pfam of bZIP domain from the database of Esembl plants. These bZIP members were introduced into MEGA X for multiple sequence alignments. The LG + G model which MEGA predicted was used in Maximun Likelihoodphy (ML); 1000 replicates were used to produce bootstrap values. Expression levels of the bZIP members were obtained from the Phytozome database (https://phytozome.jgi.doe.gov/pz/portal.html).

Analyses of gene structures, conserved motifs and promoter
The GSDS platform (http://gsds.cbi.pku.edu.cn/) was used [22] to show and analyze the exon/intron structure of MsbZIP genes. Gene wise [23] was used to determine correspondence on coordinates in DNA (comprising both intron and exon) versus protein sequences. Subsequently, in-house Perl scripts were used to transform bZIP domain coordinates in protein sequences to that in gene sequences. The intron splicing phase in the hinge and basic regions of bZIP domains from all MsbZIP genes were characterized and separated into various types. The MEME tool (http://meme.nbcr.net/ meme/) [24] was used to detect additional motifs outside the bZIP domain of protein sequences. Motifs containing 10-50 amino acids and E-value <1e −20 were characterized and compared among the MsbZIP genes to identify group-specific or group-conserved signatures. Then, they were numbered based on the order in which they occurred in the protein sequences.

Collinearity analysis of MsbZIPs
Mapping of the MsbZIP genes on the chromosome was done as per the chromosomal location given in the Esembl plants. Multiple Collinearity Scan toolkit (MCScanX) was used to analyze gene duplication events based on default parameters [25]. Circos (version 0.69) (http://circos.ca/) was used to generate visualization as previously described [26].

Plant materials and processing methods
A local cultivar, WL319, was provided by Daqing Branch of Heilongjiang Academy of Sciences. This cultivar was adapted to the saline soil of Daqing. Uniform sized and well matured seeds were first selected, washed using 0.5% NaClO solution for 3 min, washed 8-10 times using sterile water, placed on a clean workbeach, dried, and planted in plastic pots with sterile vermiculite in an artificial climate box. Growth and cultivation conditions of alfalfa seedlings were: light intensity, 150 μmol photons/(m 2 ·s); 10 h light/14 h dark photoperiod, while the temperature was continuously maintained at 28 ± 2 °C in the greenhouse. After expansion of the first true leaf, the full plant was uprooted and washed 8-10 times using sterile water to thoroughly remove vermiculite. Leaves, stems and root tissues of Alfalfa seedlings were obtained as samples, and each sample weighed about 0.5 g. Total RNA was extracted using the TRIzol reagent (QingKe, Beijing, China). After extraction, the qualities of the RNA samples were checked by NanoDrop spectrophometry. Quantitative real-time polymerase chain reaction (qRT-PCR) primers were designed by Premier 5.0 (Supplemental Table S1). The actin-gene of Alfalfa served as the internal control gene. RNA was transcribed to cDNA after which qRT-PCR was performed using the 2 × ChamQ Universal SYBR qPCR Mix (Vazyme biotech, China). The qRT-PCR was performed in triplicate. Relative gene expression was determined using the formula below [27]: Relative expression = 2 ΔCt , ΔCt = Ct (MsbZIPs) -Ct (actin-gene)

MsbZIP family members
A total of 67 protein sequences with bZIP domains were identified by HMM profile analysis from the Alfalfa (Medicago sativa) genome. Among them, 57 members were assigned to the Alfalfa bZIP family by SMART analyses, and were accordingly named MsbZIP01~MsbZIP57 (Supplemental Table S2). The 57 MsbZIP members were located on the 8 chromosomes (Chrs) of Alfalfa. Most of the MsbZIPs were located on Chr 3 (11), while only 3 MsbZIPs were located on Chr 6. Protein lengths of MsbZIPs were between 96 and 758, while their isoelectric points were from 5.15 to 10.38.

Phylogenetic analysis of MsbZIPs
A phylogenetic tree for the 57 bZIP protein sequences of Alfalfa (Figure 1) was established using the Maximun Likelihoodphy in MEGA software (MEGAX). The LG + G model was forecasted as the optimal model by MEGA. Based on the phylogenetic tree, the predicted bZIP gene family was clustered into 11 subgroups in Arabidopsis [1], which were I-XI. Subgroup I had 15 members (the most), while subgroup V, VII and X had one member each (Supplemental Table S3).

Motif and gene structure of MsbZIPs
The MEME software was used to analyze the motifs of 57 MsbZIPs. Symbols of the motifs and their composition are shown in the MsbZIP family tree (Figure  2(a) and (b)), while the information regarding motif sequences is shown in Supplemental Figure S1. Basically, all MsbZIPs subgroups had motif 1 and motif 2, only subfamily VI had motif 3 and motif 5, and only subfamily VIII had motif 10. This confirmed the close association between the same subgroups.
Gene structures of exons and introns of the 57 MsbZIPs were then determined. Intron numbers in MsbZIPs ranged from 0 to 15 (Figure 2(c)). In comparison, the number of introns in subfamily I was relatively small, while there were more introns in subfamilies VIII, X, and XI.

Cis-regulatory elements of MsbZIPs
Cis-regulatory elements in the MsbZIPs promoters according to the Alfalfa reference genome database, and the prediction results of cis-acting elements are shown in Supplemental Table S4. Twelve elements associated with hormone formation and stress responses were revealed ( Figure 3). Red represents hormone-related elements, suggesting that the MsbZIPs are involved in hormonal responses associated with plant growth. Blue represents abiotic stress response associated elements, which indicates that some MsbZIPs are involved in plant abiotic stress responses.

Location and collinearity of MsbZIPs
The location of MsbZIPs on the Alfalfa genome is shown in Figure 4. The genes marked in red exhibited interspecies collinearity. Among them, Chr 3 and Chr 4 had the most collinear genes. Ch 6 only had one gene, MsbZIP39, which exhibited collinearity. MsbZIPs on the two chromosomes (Chr 2 and Ch 8) did not exhibit collinearity.
Collinearity analysis of MsbZIPs in soybean is shown in Figure 5. A total of 67 MsbZIPs gene pairs exhibited collinearity. MsbZIPs genes that exhibited collinearity were distributed on 19 chromosomes, with only Chr 9 lacking an MsbZIP's collinearity gene, indicating that alfalfa and soybean have a close relationship.

Phylogenetic analysis of bZIP in different species
To determine the function and role of bZIP in alfalfa, the evolutionary relationships of the three species of model plant, Arabidopsis, monocotyledonous plant, maize (Zea mays L.) and dicotyledonous plant alfalfa were analyzed. The members of bZIPs of these three plant species were divided into 11 sub-families. The motif (Supplemental Figure S2) and gene structure of each sub-family were relatively similar (Figure 6), probably because bZIPs have similar functions in different species.

Tissue-specific expression analysis of MsbZIPs
Tissue-specific expression levels of MsbZIPs were determined by high-throughput sequencing in the Phytozome database, including the leaves, nodules and roots (Figure 7(a)). The MsbZIPs were expressed in various tissues, and differences in expression were relatively large. Seedling alfalfa was selected to elucidate on the functions of MsbZIPs in Medicago sativa. Six MsbZIP genes were randomly selected, and the expression levels of MsbZIP genes in roots, stems and leaves were analyzed by qRT-PCR. As shown in Figure  7(b), all of the 8 selected MsbZIPs were expressed in roots, stems and leaves, and some of their expression levels were significantly different. Some MsbZIPs such as MsbZIP05, MsbZIP13 and MsbZIP38 exhibited higher expression levels in the roots compared to leaves and stems, while the expression levels of some MsbZIPs such as MsbZIP09, MsbZIP24, MsbZIP28, and MsbZIP44 in leaves were higher than in the stems and roots. Most of the MsbZIPs exhibited lower expression levels in stems.

Characterization of MsbZIP family members
Alfalfa (Medicago sativa) is an important livestock grass with a high output and excellent quality whose demand is on the increase. To scientifically grow alfalfa and improve its quality and yield, it is particularly important to study its genes. Most of its gene families, including NAC [28], ERF [29], DOF [30], MYB [31], HD-ZIP [32] and CML [33] have been analyzed. However, to our knowledge, the bZIP gene family in alfalfa has not been analyzed yet, although as an important transcription factor family, bZIP has been reported in many plants, such in Arabidopsis [1], rice [34], sesame (Sesamum indicum) [35], strawberry (Fragaria vesca), watermelon (Citrullus lanatus) [36], and pepper (Capsicum annuum L) [37]. In this study, MsbZIPs were identified through the reference genome of alfalfa (Medicago sativa), and a total of 57 members were found to be distributed on 8 chromosomes.

Analysis of MsbZIPs
In the phylogenetic tree, MsbZIPs were predicted and analyzed by the optimized model, LG + G, after which they were divided into 11 sub-families. Similar results were reported in Arabidopsis [3]; Ustilaginoidea virens [38], Wild sweet potato (Ipomoea trifid) [39], Radish (Raphanus sativus L.) [40] and Carthamus tinctorius [41]. Each subfamily exhibited similar motifs and gene structures, corresponding to the findings reported in Arabidopsis, maize and alfalfa. It was also found that MsbZIPs have many cis-acting elements associated with hormonal and stress responses. There were some bZIP genes that have been reported to be associated with hormones and environmental stress. VvABF2, a bZIP transcription factor in grape, enhances osmotic stress through ABA by being overexpressed in Arabidopsis [42]; CsbZIP18, a tea plant gene, is a negative regulator of ABA signaling and cold stress [43], while MdbZIP44 in apple responds to ABA, which activates anthocyanin biosynthesis [44]. MsbZIPs exhibited some collinearity genes with soybean. These collinear genes in soybean could be used as reference genes to study the functions of MsbZIPs.

Tissue-specific expression of MsbZIPs
The bZIP gene family members exhibited specific expression levels in 7 tissues of Carthamus tinctorius L [41]. StbZIPs (bZIP members in Solanum tuberosum) were found to be differently expressed in tissues [45], bZIP transcription factor members in sweet potato (Ipomoea trifida) were expressed in a tissue-specific manner [39], while bZIP members in grapevine (Vitis vinifera) [46], pepper (Capsicum annuum L.) [37] and switchgrass [47] were differently expressed in different tissues. IbABF4, as an ABF gene encoding bZIP domain, is specifically expressed in leaves, petioles, stems and roots, with the highest expression in roots [48]. Baloglu et al. [49] found that 4 CsbZIPs were highly expressed in leaves, flowers and root tissues, while ZmbZIP72, a bZIP transcription factor, is differentially expressed in various maize organs [50]. In this study, MsbZIPs exhibited tissue-specific expression levels in the seedling stage of alfalfa. Four MsbZIPs exhibited elevated expression levels in roots, while four MsbZIPs exhibited elevated expression levels in the leaves. Therefore, MsbZIPs exhibited tissue-specific expression levels.

Conclusions
bZIP genes are important in plant growth and stress responses, however, the bZIP gene family in alfalfa (Medicago sativa) has not been studied. In this study, a reference genome was used to analyze MsbZIPs. It was found that MsbZIPs exhibit tissue-specific expression levels. These findings provide the basis for breeding of alfalfa as well as verification and analysis of bZIP genes in alfalfa.

Data availability
All data that support the findings reported in this study are available from the corresponding author upon reasonable request.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This study was financially supported by two projects: