QTL mapping of soybean node numbers on the main stem and meta-analysis for mining candidate genes

ABSTRACT The soybean (Glycine max) node number on the main stem (NNMS) is closely related to seed yield of soybean. The aim of this study was to identify important loci affecting soybean NNMS using meta-analysis based on a reference physical map. Twenty-nine NNMS-related QTLs were mapped across 8 years with a recombination inbred line (RIL) population. Fifty-four QTLs related to NNMS of soybean identified from the database and our research were collected and each QTL was projected onto the soybean physical map. Through meta-analysis, 11 consensus soybean NNMS QTLs were obtained and located on LG D1b, LG C2, LG B1, LG F, LG L, and LG I. The map distance was from 0.02 to 4.22 Mb, the variation of original QTLs was from 2 to 4, and the mean R2 values ranged from 6.90% to 27.40%. Furthermore, 488 candidate genes were located in these consensus QTLs, 6 of which had a relationship with NNMS. These results may help lay a foundation for fine mapping of QTLs/genes related to NNMS that can be used to help breed high-yield soybean cultivars.


Introduction
Soybean (Glycine max) is one of the most important crops in the worldfor its use as a source of both edible oil and vegetable protein [1,2]. Breeding high-yield soybean cultivars is an ongoing aim of plant breeders. The number of nodes is both a key agronomic trait that influences soybean seed yield and is a quantitative trait that is closely related to plant height [3,4]. In the development of the soybean genetic map, many quantitative trait loci (QTLs) underlying the node number on the main stem (NNMS) have been identified in different genetic backgrounds, environments and statistical methods [3,[5][6][7][8][9][10]. However, these QTLs play only a minor role in soybean breeding programmes because the distribution of the QTLs in different populations varies, with confidence intervals (CIs) too long and LOD scores too low [11]. Therefore, a statistical approach is needed to confirm whether a cluster of original QTLs detected in different backgrounds are referencing the same locus.
Previously, meta-analysis has been used to combine different sources and analyse all results in a single study with a statistical analysis [12,13]. Meta-analyses have been widely applied in evolution and genetics research to narrow down the CIs in order to refine and confirm the integration of QTLs via mathematical models [14]. A large number of studies have used the meta-analysis method in plant QTL analysis, including flowering time in maize [15], drought-related traits in rice [16], fusarium head blight resistance in wheat [17] and abiotic stress tolerance in barley [18]. Several studies using meta-analyses have been performed for QTL location in soybean. Guo et al. [19] applied meta-analysis to the integration of 62 QTLs associated with resistance to cyst nematodes, discussing the relationship between QTLs and SCN races for the first time in soybean. Furthermore, many meta-QTLs for soybean oil [20], soybean protein [21], soybean delayed-canopy-wilting [22], soybean protein and oil contents [23] and soybean plant height [24] have previously been analysed. On this basis, a meta-analysis approach was adopted to detect real QTLs for NNMS in the current study.
Several NNMS genes have been identified in different crops, including the MADS-box gene ZmMADS3, which affects maize stem node number, which was expressed most highly in the uppermost node [25]. A new MT gene was strongly expressed in the stem node of rice [26]. The dt1 gene has been found to have a large effect on decreasing both plant height and the number of nodes in soybean [27]. With the genome of soybean having been sequenced, access to the physical map will provide opportunities for fine mapping and cloning [28]. To date, to our knowledge, no study has reported the use of meta-analysis based on the physical map to identify NNMS QTLs/genes in soybean. Thus, meta-QTL analysis will provide valuable information to determine the molecular basis of key soybean agronomic traits. In this study, NNMS QTLs were mapped using data obtained over 8 years from a recombination inbred line (RIL) population based on a high-density genetic map constructed by Specific Length Amplified Fragment Sequencing (SLAF) [29]. Combining the QTLs results from this research with 25 previously published QTLs with physical positions, an integrated NNMS QTL map was constructed based on the Williams 82 physical map for the first time. Meta-analysis and gene mining have been performed to refine the QTLs and screen the candidate genes. This research will lay the foundation for marker-assisted selection and improve the precision of fine gene mapping in soybean breeding programmes.

NNMS QTL mapping
The population of 147 recombination inbred lines (RIL) was derived from the cross of the American semi-draft variety Charleston [30]    Note: The NNMS data of the RIL population displayed a typical quantitative genetic model-approximate normal distribution and were well suited to QTL mapping.
interval mapping (CIM) were used to analyse 8 years of RIL data with a minimum limit of detection (LOD) score of 2.5 [31].

Integration of mapping information for NNMS QTLs
Using SoyBase database (http://www.soybase.org), 37 NNSM QTLs were screened by marker physical position, leaving 25 NNSM QTLs available for meta-analysis. In total, 54 NNSM QTLs, including 25 QTLs from SoyBase and 29 QTLs from this research were integrated for meta-analysis (Supplemental Figure S1). Several types of population were used, including RIL, F2 and F2:3, along with different mapping methods (Supplemental Table  S1). NNMS QTLs, including trait, name, chromosome, marker and phenotypic variance (R 2 ) detected in a single population in a corresponding environment were used for QTL integration. For a QTL, the R 2 (explained phenotypic variance) and CI (confidence interval) were the key parameters. CI was also the physical map distance in this research. Projection onto a reference map was performed according to molecular markers flanked with original QTLs; if a QTL did not possess available markers matched with the physical map, that QTL was rejected.

Results and discussion
The phenotypic information in the RIL population across 8 years  Table 1.
The differences in NNMS of the maternal parent was significant. The NNMS data of the RIL population displayed a typical quantitative genetic model-approximate normal distribution and were well suited to QTL mapping ( Figure 1). Previous studies have been performed to analyse the genetic variation underlying NNMS [5,6]. However, lowresolution genetic maps with low detection accuracy were routinely applied in most of these published studies. Therefore, a high-density soybean genetic map constructed by SLAF-seq that included 5,308 markers on 20 linkage groups, 2,655.68 cM in length, with an average distance of 0.5 cM between adjacent markers was used in the present study [29]. Based on the phenotype of RIL populations cultivated in Harbin locations over 8 years, 29 QTLs were detected by both CIM and ICIM. Among these QTLs, two, one, three and eight QTLs were identified in 5, 4, 3 and 2 years, respectively. Most of the 29 QTLs were consistent over different years, indicating that the high-density soybean genetic map constructed by SLAF-seq plays an important role in detecting NNMS traits in soybean.

Meta-analysis for collected NNMS QTLs
By using meta-analysis, 11 consensus soybean NNMS QTLs were obtained and located on LG D1b, LG C2, LG B1, LG F, LG L and LG I. The map distance was from 0.02 to 4.22 Mb, the original QTLs numbered from 2 to 4, and the mean R 2 values ranged from 6.90% to 27.40% (Table 3). On LG C2, the consensus QTL merged four QTLs from three populations with a CI of 39.65-42.74 Mb and an R 2 value of 23.25%. As Figure 2 shows, QTLs were integrated on Gm11 and one consensus QTL was obtained from four original QTLs. The minimum CI of Note: QTLs were integrated on Gm11 and one consensus QTL was obtained from four original QTLs.
NNMS consensus QTLs in this study was 0.02 Mb, which is greatly shorter than the results of previous researches. Moreover, bioinformatics analysis on the bases of the physical position of these consensus QTLs makes it easy to find candidate genes and prompt the process from QTL to QTG. The QTL meta-analysis approach applied in this study both validated and refined the CIs of QTLs and also provided a new method of identifying consensus QTLs from a reference map by comparing different physical positions. Many similar studies have been performed using meta-analysis that have identified many candidate genes such as plant height genes in soybean, protein and oil content genes in soybean, grain size genes in rice and abiotic stress tolerance genes in barley [18,23,24,33].

Gene mining from NNMS consensus QTL intervals
To date, only a few genes related to NNMS have been identified in plants, including ZmMADS3 in maize [25], MT in rice [26] and dt in soybean [27]. Therefore, identifying the genes contained in the QTLs is still a challenging task [34]. This may be because the lower detection power of mapping populations makes it hard to reveal real QTLs with small effects and because not all QTLs for NNMS were integrated.
In this study, according to the physical position of the consensus QTLs, candidate genes were screened from 11 consensus QTLs. In total, 488 candidate genes were identified and annotated by SoyBase (http://www.soy base.org) (Supplemental Table S2). Among these genes, six may show a relationship with NNMS (Table 4).
Glyma.13G052700 was annotated as a K-box region and MADS-box transcription factor family protein. MADS-box genes encode important transcription factors with essential functions during the organ differentiation processes in plants [35]. MADS-box gene expression in the stem has been reported in previous studies [36,37], and recently expression of the barley MADS-box gene BM1 was reported in stem nodes [38]. Notably, ectopic expression of the MADS-box gene ZmMADS3 is involved in reducing height as evidenced by a reduced number of nodes in some transgenic maize plants [25]. Glyma.06G243500 was annotated as an auxin-responsive GH3 family protein. GH3 protein, which can produce indole-3-acetic acid (IAA) conjugates, is an early auxinresponsive gene [39]; stem elongation has previously been shown to be induced by exogenous IAA in intact light-grown pea seedlings [40]. Glyma.13G221400, which was annotated as auxin response factor 8, plays an important role in cell division and cell elongation [41]. Glyma.11G087300 and Glyma.20G014300 were annotated as auxin efflux carrier family proteins, with pin1b (polar-localised auxin efflux protein) mutants having increased stem elongation in Brachypodium [42]. Glyma.13G052900 was annotated as brassinosteroid-6-oxidase 2; brassinosteroids are plant steroid hormones that promote stem elongation in a variety of plants such as soybean and rice [43,44]. These observations showed an indirect relationship with the NNMS.

Conclusions
In the present study, 29 NNMS related QTLs were firstly mapped across 8 years within a recombination inbred line population based on a high-density soybean genetic map constructed by SLAF-seq. Using the method of meta-analysis, 11 consensus QTLs of soybean NNMS were obtained from 54 original QTLs detected from previous studies and this research and were found to be located on LG D1b, LG C2, LG B1, LG F, LG L and LG I. Six candidate genes which linked closely with NNMS were screened from 11 consensus QTLs. Furthermore, of these 6 genes, Glyma.13G052700 could be a key factor related to NNMS in soybean. Overall, this study conducted fine mapping of NNMS QTLs and isolated the genes underlying consensus QTLs. Further studies will be vital in elucidating the molecular cloning and characterization of the candidate genes identified in the present study.
Disclosure statement