Identification of major QTLs associated with agronomical traits and candidate gene mining in soybean

Abstract Soybean is valued in foods, animal feed and some industrial applications, and agronomic traits are important targets for soybean breeding. In this study, two sets of recombinant inbred line (RIL) populations were used to map yield and quality trait quantitative trait loci (QTLs) across three environments. In total, we detected 37 QTLs in two RIL populations grown in three locations. In the 3613 RIL population, three QTLs related to quality traits and 13 related to yield traits were identified among the 16 QTLs. In the 6013 population, we identified six QTLs related to quality traits and 15 related to yield traits. Among these QTLs, some genomic regions showed pleiotropic effects on yield and quality traits. We further identified candidate genes from these regions. Our results lay the foundation for producing high-yielding soybean with excellent quality traits via molecular breeding.


Introduction
Soybean [Glycine max (L.) Merr.] is one of the most important legume crops worldwide, with extensive acreage in the USA, Brazil, China and other countries [1]. Soybean has several beneficial properties, including a 20% oil content, 40% protein content and high fatty acid levels [2]. Therefore, soybean represents an important source of raw materials for the production of plant protein and vegetable oil products, as well as industrial and pharmaceutical products [3]. Although soybean breeders focus on increasing yield traits and improving quality traits, these quantitative traits are controlled by multiple genes, and the underlying genetic mechanisms are relatively complex [4]. Traditional breeding methods based on phenotypic screening have several disadvantages, such as the large amount of labour and materials required, low efficiency and long operating cycles. Therefore, the use of molecular marker technology to construct soybean genetic linkage maps has become a growing area of study. This method, which involves detecting the degree of linkage between molecular markers and related quantitative trait loci (QTLs) underlying useful traits, can greatly facilitate soybean breeding efforts [5].
Molecular genetic maps of soybean have been continuously improving, as they are required for QTL mapping of soybean quality-and yield-related traits. In the late 1980s, Apuya et al. [4] constructed the first molecular genetic linkage map of soybean, containing 11 restriction fragment length polymorphism (RFLP) markers. In 1990, Keim et al. [5] constructed a second soybean linkage map, which covered 1200 cM of the soybean genome with 130 RFLP markers. A map was constructed with 490 markers covering 3000 cM of the soybean genome. However, the lack of integration of conventional markers limited the use of these of molecular marker maps for QTL localisation. In 1995, Shoemaker and Specht [6] used an F 2 segregating population obtained via hybridisation of the near-isogenic soybean lines Clark and Harsoy and constructed the first soybean 'consensus linkage map'. Cregan et al. [7] integrated the genetic linkage maps of three populations and obtained a common map consisting of 26 classical and 10 isozyme loci along with 79 random amplified polymorphic DNA (RAPD), 689 RFLP and 606 simple sequence repeat (SSR) loci. Song et al. [8] generated a genetic linkage map by integrating five soybean genetic maps, including those for two F2 populations ('A81-56022' Â 'PI468916' and 'Clark' Â 'Harosoy') and three recombinant inbred line (RIL) populations ('Minsoy' Â 'Noir1', 'Minsoy' Â 'Archer', a n d 'Noir1' Â 'Archer'). In the past 10 years, as genome sequencing efforts have advanced, many high-density genetic maps have been constructed for soybean [9][10][11].
To date, many QTLs underlying agronomic traits in soybean have been mapped and deposited in the SoyBase database (www.soybase.org), including 322 seed oil QTLs, 240 seed protein QTLs, 318 seed weight QTLs, 255 plant height QTLs, 24 seed weight per plant QTLs and 48 pod number QTLs. However, to date, only a few QTLs affecting multiple agronomic traits have been mapped in crops. Gahlaut et al. [12] performed QTL mapping of wheat double haploid populations planted in four different locations in India for 3 years and identified essential genomic regions carrying multiple QTLs for different traits on the 5A and 7A linkage groups. The locus associated with lodging in maize identified by Larsson et al. [13] coincides with the positions of previously identified QTLs for stem strength. Using two F 2:3 populations of maize under drought stress, Zhao et al. [14] identified QTLs controlling multiple traits at several locations. Wu et al. [15] integrated QTLs from soybean at different growth stages and identified 10 QTLs controlling multiple growth stages in five linkage groups (C2, D1a, D1b, F and J). Hacisalihoglu et al. [16] performed QTL mapping of a soybean RIL population grown in environments with different levels of P availability, and found that QTLs controlling seed quality and volume on Gm06 shared overlapping regions. The use of these types of loci could help improve multiple agronomic traits in soybean, thereby increasing the yield and quality of this important crop.
In this study, we genotyped two sets of RIL populations and investigated their yield and quality traits in three environments. The aims of this study were to (1) construct a genetic map of two RILs, (2) map the main-effect QTLs of agronomic traits and (3) identify the key loci that affect multiple agronomic traits in soybean. The results lay the foundation for producing high-yielding soybean with excellent quality traits via molecular breeding.

Construction of RIL populations and experimental design
Three soybean species G. max (from Northeast Agriculture University Soybean Institute) with clearly different morphological traits were employed as the parents for mapping. Two RIL populations were constructed using Dongnong L13 as the female parent and Heihe 36 and Heinong 60 as the male parents. An RIL mapping population was obtained by multipleseed descent using single-seed descent until the F 5 generation, and two F 6 RIL populations were constructed: RIL3613 and RIL6013 [17].

Field experiment
On 10 May 2016, soybean seeds were planted in the experimental fields of Northeast Agricultural University in Harbin (E128 N45 ), S'cheng (E125 N45 ) and Acheng (E126 N45 ), China. Experimental treatments were applied in a randomised block design, and the planting density was 220,000 plants/hm 2 . The plants were arranged in a randomised experimental threerow plot (5 m long and 0.67 m wide) at a spacing of 13.89 cm via ridge cultivation.

Phenotypic measurements
At the maturity stage, five individual plants were randomly selected from the middle row to measure quality and yield traits. A FOSS Infratec TM 1241 Grain Analyser was used to analyse seed protein and oil content, and post-harvest surveys were performed to investigate yield-related traits

Genetic map construction and QTL mapping
Genetic maps have been constructed previously [17]. The inclusive composite interval mapping (ICIM) method was employed to analyse 1-year multilocation RIL data using IciMapping software (http://www. isbreeding.net/software/). The model was used with a mapping step of 2.0 cM, a LOD significance threshold determined by 1000 permutation tests with a minimum value of 2.5, and significance set at the p ¼ .05 level. The positive additive effect values were derived from the female parent, 'Dongnong L13', and the negative additive effect values were derived from the male parents, 'Heihe 36 and Heinong 60'.

Gene mining involving consensus QTL intervals
Several candidate genes related to protein and oil accumulation and plant growth and development were selected based on genomic information from the Williams 82 Assembly two Genomic Sequence (Wm82.a2) and physical locations of the QTL intervals.

Linkage map and traits of two RIL populations grown in three environments
Several studies have highlighted the importance of plant height and plant weight to soybean yield. Lee et al. [18] used an RIL population to construct a highdensity single-nucleotide polymorphism (SNP) map, and mapped QTLs associated with seven plant height traits in soybean via composite interval mapping in three environments. Liang et al. [19] located two QTLs for plant weight in an RIL population derived from a cross between soybean lines BD2 and BX10. However, the RIL populations utilised in these studies were derived from two parents, and the linkage analysis involved only two alleles at the same locus. Due to the use of populations with limited genetic backgrounds, the low molecular marker polymorphism of such single populations leads to the identification of fewer markers for genetic map integration and low saturation, which limits the utility of these maps [20].
Here, we used three soybean species differing in morphological traits to generate two RIL populations, RIL3613 and RIL6013 [17], so that we could identify QTLs affecting yield-related traits. The RIL populations used in this study were designed by four-way hybridisation [20][21][22]. The use of populations produced from four parental double crosses increased not only the number of polymorphic markers, but also the density of the genetic map. The total length of the RIL 3613 genetic map was 2848.56 cM, including 150 SSRs. The average distance between markers was 18.99 cM, the length of a single linkage group ranged from 1.15 to 283.42 cM, and the number of markers ranged from 2 to 13. The RIL 6013 genetic map was 1886.8 cM, with an average length between markers of 13.77 cM, including 137 pairs of SSR primers. The length of a single linkage group ranged from 19.68 to 163.67 cM, and the number of markers ranged from 4 to 11 [17].  In recent years, most studies focused on QTL mapping of soybean quality and yield traits have been based on single-environment analyses, which makes the mapping results difficult to confirm. We reasoned that results from plants grown in different environments could be compared to increase the accuracy of the analysis. Phenotypic data for the two RIL populations grown in three environments are therefore provided in Table 1, Figures 1 and 2. Thirty-seven QTLs were detected in the two RIL populations from three environments. Table 1 lists information about these populations, including the population name, trait ID, trait name, chromosome, position, left marker, right marker, LOD, phenotypic variability explained (PVE), additive effect, left confidence interval (CI), right CI and environment.
The high or low LOD values obtained in previous studies are another important factor limiting the reproducibility of those results; when LOD values are too high, QTLs might not be detected and data will be lost. By contrast, using an excessively low LOD value may result in a false positive for the detected QTL. In this study, we used multiyear and multipoint data for joint analysis, which not only increased the detection intensity of QTLs, but also greatly improved the accuracy of the detected locations and effects of the QTLs.

QTL mapping analysis of the RIL populations
The chromosome number was 17 in both populations. Chromosomes Gm04, Gm06 and Gm07 had 37 left markers but only 21 right markers, indicating that the QTLs in these populations contained overlapping regions, a phenomenon that deserves further investigation. The minimum LOD value was only 2.508, which was detected in the 3613 RIL population and is related to grain weight per plant. The maximum LOD value was 20.8606, which was detected in the 6013 RIL population and is associated with seed set one per pod. Eight QTLs (all from RIL population 6013) had PVE values of 0. The highest PVE value was also obtained from this population, which reached 13.0708%. The positive and negative additive effect values were derived from the female parent and male parent, respectively. The highest absolute additive effect value was 98.3181. The highest absolute additive effect values were all detected from RIL population 3613, and were all associated with grain number of a single plant; these QTLs had additive effects: À98.3181 À 43.1966  We detected 37 QTLs in the two RIL populations from three locations (Table 1). Sixteen QTLs were detected in RIL population 3613, and 21 were detected in RIL population 6013. In all three locations, the number of QTLs ranged from 22 to 32, and the number of traits ranged from 11 to 12, with 12 traits identified only in A'cheng City. These QTLs were related to various soybean traits, including protein, oil and other yield traits, and were distributed on all 17 chromosomes except Gm04, Gm06 and Gm07. Among these QTLs, five were associated with protein content, four with oil content, and the remaining QTLs with 10 yield traits. These yield-related QTLs were mainly located on chromosomes Gm01 and Gm16. The QTLs for both quality and yield traits were mainly located on Gm16, a key soybean chromosome. These QTLs were related to 12 types of traits. The largest number of QTLs (24) was associated with seed set per pod and the smallest number (4) was associated with 100-seed weight and oil content.
Warrington et al. [23] previously located four QTLs related to protein content based on an RIL population. Wang et al. [24] combined the positioning results from two RIL populations and conducted joint analysis, identifying 12 seed yield QTLs, 16 oil-related QTLs and 11 QTLs controlling protein content. Zhang et al. [25] employed an RIL population derived from a cross between Bogao and Nannong94-156 to locate 10 QTLs that regulate the number of soybean pods. Here, we identified nine QTLs for protein and oil content, found on seven chromosomes: Gm11, Gm12, Gm15, Gm16, Gm18, Gm19 and Gm20. The QTLs related to protein content that were mined from the 3613 population all had negative additive effects, but the protein content-related QTLs mined from the other RIL population all had positive additive effects. The protein content QTLs had additive effect values ranging from À0.7244 to 2.6519.
We identified four QTLs related to oil only in the 6013 population, with additive effects of À0.0592, À0.0156, 0.0351 and À0.0648. One of these QTLs (Gm15; 39.16-56.7 cM) associated with the soybean oil index is located in the same segment as a previously identified oil QTL [24,[37][38][39][40], indicating that there is a high probability that this region contains one or several important genes that control oil traits.
The QTL related to 100-seed weight mined from the 3613 population had an additive effect value of À0.5026 and the highest LOD value (6.7912), indicating that it is located in a concentrated area of QTLs associated with 100-seed weight. Number of grains per pod was the most common yield trait among the QTLs identified from the two RIL populations, especially seed set four per pod, involving 24 QTLs. Among the QTLs related to seed set four per pod, two QTLs were detected, with additive effects of À4.2785 and À4.2216 and relatively large LOD values of 16.8139 and 15.0703, respectively, indicating that a concentrated area of QTLs is associated with seed set four per pod. A number of QTLs associated with pod number at the same position on chromosome Gm11 (satt197-sat_123) were also found to be related to pod numbers in previous studies [40]. The QTLs related to plant height had the highest additive values and LOD values.

Identification of QTLs with pleiotropic effects
A total of 32 QTLs were found to control multiple traits (i.e. QTLs with pleiotropic effects), whereas only five were associated with a single trait, including seed set one per pod, seed set four per pod and plant height. In the 3613 population, the QTLs for 11 traits are distributed on 11 chromosomes, with most QTLs associated with seed set four per pod. In the 6013 population, the QTLs for 12 traits are distributed on 13 chromosomes, with most QTLs also related to seed set four per pod. QTLs associated with more than four traits were designated as multi-effect QTLs in this study.
In the 6013 RIL population, a QTL associated with markers sat_394 and Sat_366 is related to nine traits, including one quality trait and eight yield traits, making it the QTL associated with the most traits. This QTL has negative additive effects, with a maximum PVE value of 0. The multi-effect QTL related to the second highest number of traits was that associated with markers satt237 and Sat_166, which is related to seven traits, including pod number per plant, grain weight per plant and grain number of a single plant, with a PVE of 0. The QTL associated with markers satt197 and sat_123 is related to six yield traits, with a maximum LOD value of 20.8606. The QTL associated with markers Sat_413 and Sat_160 is related to six traits, including all seed pod numbers, plant height and 100-seed weight. The QTL related to seed pod number had negative additive effects and large LOD values of 13.1038, 14.2334 and 16.8139. The QTL associated with markers Sat_341 and Sat_221 was related to five yield traits and has a large LOD value of 18.7469, and the QTL related to the pod number per plant and seed set two per pod had a PVE of 0. The QTL associated with markers Sat_366 and sat_228 was also related to quality and yield traits, with positive additive effects for the quality trait and negative additive effects for the other yield traits. The QTLs associated with marker Sat_366 are all related to quality traits.
In the 3613 RIL population, we identified two major QTLs that were only related to yield traits. One QTL, with markers Sat_200 and Satt353, was related to five yield traits, and a QTL associated with grain number of a single plant was detected with a maximum additive effect of À98.3181. Another QTL associated with markers Satt424 and Sat_294 was related to five yield traits, with a large LOD value of 13.0507.
In summary, we detected 37 QTLs in two RIL populations from three locations. In the 3613 RIL population, three QTLs related to quality traits and 13 QTLs related to yield traits were identified among the 16 QTLs. In the 6013 population, we identified six QTLs related to quality traits and 15 related to yield traits. Seeds per pod was the main yield trait associated with QTLs in both populations, especially seed set four per pod. We identified one essential chromosome, Gm16, harbouring one major QTL, containing sat_394 as the left marker and Sat_366 as the right marker, related to quality and yield traits.

Candidate gene mining for multiple-effect QTLs
We used three types of QTLs for candidate gene annotation, including QTLs related to quality traits and yield traits, QTLs related to multiple quality traits and QTLs related to multiple yield traits. We identified two QTL intervals related to quality traits and yield traits and six QTL intervals associated only with yield traits; no QTL intervals were related only to quality traits. Based on information about the markers obtained using the SoyBase online program, in the eight consensus QTL intervals, 1215 candidate genes were screened and annotated using the GO and GO Annotations and KEGG databases. Among these, 344, 396, 218, 52 and 205 genes were located in consensus QTL regions on GM01, GM8, GM11, GM12 and GM16, respectively ( Table 2). The following enriched GO categories were identified: GO:0006807 (nitrogen compound metabolic process), GO:0034641 (cellular nitrogen compound metabolic process), GO:0006886 (intracellular protein transport), GO:0008565 (protein transporter activity) and GO:0015031 (protein transport). Among the enriched KEGG pathways, Ko04075 Based on pathway and functional annotation, we ultimately identified 36 candidate genes (Table 3). Glyma.01G052800 was annotated as a vesicle transport v-SNARE family protein gene, which functions in intracellular vesicle transport processes in protein storage vacuoles in plants [42,43]. Glyma.11G118600, Glyma.08G222400 and Glyma.08G207300 were annotated as Coatomer with beta subunit and gamma-2 subunit genes, which function in intracellular biosynthetic protein transport mediated by non-clathrincoated vesicles [44] and might be related to protein accumulation. Among the candidate genes, 13 were annotated as SAUR-like auxin-responsive protein family genes; SAUR proteins are involved in cell division and auxin biosynthesis [45]. Glyma.08G188300, Glyma.01G004500 and Glyma.01G183500 were annotated as sucrose non-fermenting-related protein kinase (SnRK) genes, which are closely related to sugar signalling, biotic and abiotic stress responses, seed germination and seedling growth [46,47].
Glyma.11G180900 was annotated as encoding nitrilase 4, which converts indole-3-acetonitrile to auxin, as revealed in Arabidopsis thaliana [49]. Glyma.08G203100 was annotated as encoding a phytochrome-associated protein related to flowering [50]. Glyma.01G103500 was annotated as encoding auxin response factor 9; ARF9 is expressed during early fruit development and is auxin responsive [51]. Six candidate genes were annotated as encoding Adaptin family proteins or Adaptin family proteins with gamma subunits, which are involved in the formation of intracellular transport vesicles and the selection of cargo for incorporation into the vesicles [52]. Glyma.01G179300 was annotated as Clathrin heavy chain, which functions in transport from the Golgi to vacuoles [53]. Glyma.11G215500 was annotated as glutamine synthase, a highly regulated enzyme at the core of nitrogen metabolism [54]. Glyma.16G082400 was annotated as the Sec23/Sec24 protein transport family protein; the Sec23 subunit is essential for linking COPII vesicle formation to anterograde transport events [55]. Finally, Glyma.12G018500 was annotated as exportin 1A, which is essential for various developmental pathways in plants [56].

Conclusions
We identified 37 QTLs for yield traits using two RIL populations grown in three environments. Some of these QTLs corresponded to those previously reported in other studies. Notably, most of the QTLs identified in this study are clustered in seven locations in the genome (Gm01, Gm03, Gm08, Gm10, Gm11, Gm12 and Gm16). These seven regions contain multiple QTLs controlling different traits. Additional studies on the multi-effect QTLs at these seven positions should be conducted in the future.