The addition of submergence-tolerant Sub1 gene into high yielding MR219 rice variety and analysis of its BC2F3 population in terms of yield and yield contributing characters to select advance lines as a variety

ABSTRACT A cross was made between MR219 (high yielding but submergence intolerant) and Swarna-Sub1 (submergence tolerant) to produce submergence-tolerant rice variety using the marker-assisted backcrossing (MABC) method to protect the farmers of low-lying land from flash floods during rain. Knowledge of yield and yield contributing factors plays a vital role in the selection process of a variety. This experim ent was designed to determine the genetic diversity among recently produced different lines of BC2F3 population and also to compare all the lines with MR219 to find the best one. Agronomical, yield and yield contributing data were taken, while genotypic and phenotypic coefficients, variance components and heritability were estimated. Introgression of the target gene, Sub1, was done using tightly linked marker, and also background recovery was measured using simple sequence repeat (SSR) markers in different generations. The observed recurrent parent genome (RPG) recovery of BC2F2 generation was 95.37%, which indicates high-level similarity between the recurrent parent (MR219) and the resulting lines. Thirty newly developed lines of BC2F3 population, resulting backcross of MR219 and Swarna-Sub1, were planted with four replications following randomised complete block design (RCBD). Newly developed lines were grouped into four clusters based on traits with UPGMA dendrogram and cluster analysis to select the 10 best plants. This study will help the future researchers to select the best plants of a breeding programme after introgression of a gene considering phenotype performances to develop new varieties.


Introduction
Flash floods can result in yield losses of rice up to 100%, depending upon different factors of submergence-prone ecosystems. An economic loss up to 1 billion US dollar has been estimated in South and South-East Asia due to loss of yield caused by flash floods.
[1À3] In a survey of rice breeders' concerns in south and south-east Asia [4], 51% of respondents rated flash flooding as one of the three most important abiotic constraints on rice yields (the others being drought and salinity). Traditional varieties adapted to these submergence-prone environments are, however, low yielding due to their poor tillering ability, long droopy leaves, susceptibility to lodging and poor grain quality. Improved varieties that can combine high yield attributes with submergence tolerance are needed. Most rice cultivars cannot survive if the plants are completely submerged for more than 7 days. [5] The existing modern varieties are not well adapted to submergence.
Submergence tolerance is controlled by a single gene Sub1 that accounts almost 70% of phenotypic variation which is mapped [6À8] on a FR13A cultivar, and can be divided into three major sub-genes, Sub1A, Sub1B and Sub1C. Submergence tolerance was conferred by the overexpression of Sub1A and down-regulation of Sub1C in a Japonica cultivar to monitor the effect, and was found to be tolerant. [9] Marker-assisted backcrossing (MABC) is the most appropriate method for the incorporation of a desired quantitative trait loci (QTL) into the genetic background of a mega variety. [10,11] The MABC approach represents a clear advantage over conventional breeding, because this strategy results in the development of the ideal genotype in a short period of time, which cannot be done through conventional breeding. [12] It is necessary to determine the morphological characters of newly developed genotype to get a preliminary assessment of the whole population, which could determine its future use. [13] Variation among genotypes must be determined and grain physical or morphological properties should be considered as a vital factor for measuring it. [14] To determine the genetic diversity and variation among genotypes, researchers used their morphological and physiological characteristics as a general approach. [13] The overall success of a breeding programme can be determined by the yield and yield contributing characters of newly developed genotypes. Thirty newly developed homozygous submergence-tolerant lines were studied to determine their yield and yield contributing characters. The present study was designed to determine the genetic variations among newly developed lines by quantitative traits and also to estimate a broad sense of heritability and genetic advance of the studied lines.

Plant materials
Malaysian rice variety MR219, which is intolerant to submergence, was crossed with a submergence tolerant variety, Swarna-Sub1, to produce a new variety which is tolerant and also high yielding. Swarna-Sub1 was used as a donor for the gene responsible for the tolerance Sub1. Newly developed 30 homozygous lines of UPM3-BC 2 F 2 -34 were tested for yield and yield contributing characters. The whole experiment was conducted in normal field (non-submerged) condition. Submergence tolerance test was conducted on BC 2 F 2 population.

DNA extraction
Fresh leaves from 2 weeks old seedlings were collected and DNA was extracted using the modified cetyltrimethylammonium bromide (CTAB) method. [15] Quantification of DNA was done by running electrophoresis at a voltage of 80 V for 30 min on 1% agarose gel and visualization was done on Molecular Imager® Gel Doc TM XRC System (BIO RAD, CA, USA). DNA samples were diluted with Tris-EDTA (TE) buffer to a final concentration of 50 ng and kept at ¡20 C for further use.

Molecular marker analysis
Tightly linked markers, specific for Sub1 gene, were tested to find polymorphism between the parents. A total of 385 simple sequence repeat (SSR) markers without the foreground markers were used between two parents which is widely spread across the whole genome for background selection.

Foreground selection
Primer RM8300 [10] was used in foreground as tightly linked marker, because of its clear codominant nature and capability of producing easily scorable bands. Marker distance from Sub1, nature of codominance and capability of producing scorable bands were the main criteria for choosing markers.

Background selection
Molecular markers not linked to Sub1, polymorphic between two parents and covered all chromosomes including Chromosome 9 (carrier of Sub1) were used for the selection of background to determine the recovery of recipient genome. Evenly spaced SSR markers were selected for every chromosome. The microsatellite markers that revealed fixed (homozygous) alleles at nontarget loci at one generation were not screened at the next backcrossing generation. In case of BC 1 F 1 and BC 2 F 1 generation, 66 polymorphic markers were used for background survey. Additional 19 SSR markers were tested in BC 2 F 2 generation for estimating the amount of recipient genome (Supplementary material).

Experimental design and management practices
The experiment was designed in randomised complete block design (RCBD) with four replications. MR219 was used as control variety and one seedling per hill was planted. Complete NÀPÀKÀS fertilizer was applied. Urea @ 100 kg/ha was applied both at 20 and 40 days after transplant (DAT). Moluscicide was used just after transplanting as an essential protective measure of the newly transplanted seedling against the attack of golden snails. Special care was taken for the protection from rodent and insects. Other cultural management practices were carried out as required.

Data collection
Phenotypic data were taken considering 11 attributes of the agronomical properties of grain quality and yield: plant height, number of tillers per hill, panicles per hill, panicle length, days to maturity, percentage of filled grain, grain length, grain width, days of 50% flowering, yield per hill and 1000 seed weight. The method of measurement used in this experiment is shown in Table 1.

Statistical analysis
The analysis of variance was conducted with the help of SAS 9.2. The mean differences were judged with Tukey's test at a 5% level of significance.
Genetic parameters were estimated to determine genetic variation among genotypes and to assess genetic and environmental effects on various traits. Genetic parameters were calculated using necessary equations. [16À18] Brief descriptions of those parameters are given below: Genotypic variance: where MSG is the mean square of genotypes, MSE is the mean square of the error, and r is the number of replications.
Phenotypic variance: where s 2 g is the genotypic variance and s 2 e is the mean square of the error.
Error variance: where MSE is the mean square of the error. Phenotypic and genotypic coefficient of variation: The estimates of phenotypic (PCV) and genotypic coefficient of variation (GCV) were obtained as explained by Choudhary and Singh [19] as follows: where s 2 p is the phenotypic variance, s 2 g is the genotypic variance and ÀX is the mean of trait.
Heritability estimate: where s 2 g is the genotypic variance and s 2 p is the phenotypic variance.
Expected genetic advance (GA): where K is a constant, ffiffiffiffi s 2 P p ÀX is the phenotypic standard deviation, h 2 B is the heritability and ÀX is the mean of traits.

Cluster and principal component analysis
In this study, the analysis was done using NTSYS-PC software (Version 2.1) based on Jaccard's similarity coefficient. The UPGMA algorithm and SAHN clustering were applied to determine genetic relationships among the genotypes. Principle component analysis (PCA) analysis of 30 lines and MR219 were calculated by EIGEN and PROJ modules of NTSYS-pc and Minitab software (version 15).

Results and discussion
Foreground selection

Background selection
Initially, in BC 1 F 1 generation, background selection was carried out with 66 primers. The percentage of recovery of background markers which are heterozygous for the parents ranges from 65.55% to 77.8% in the selected plants. Double recombinant was found in BC 2 F 1 generation from the population of P12-2. Double recombinants were confirmed from the best plants of BC 1 F 1 population and they were also tested with remaining background primers. Plants 2À49 had the recipient allele recovery for all the chromosomes except chromosomes 3 and 4, without considering foreground markers. The maximum percentage of recurrent parental recovery was 95.4% on plant numbers 2À49. A graphical map was constructed using all polymorphic markers shown in Figure 2.

Recorded data analysis
The ANOVA table for morphological traits (Table 2) showed significant differences among lines for all of the traits under study, except grain length, grain width and 1000 filled grain weights. Means are presented in Table 3; because of the non-significant interaction effects, only the main effect has been explained.

Plant height
Significant difference was shown in plant height, where most of the lines were different, but not significantly different to the value of MR219. L1 was found to be the longest (108.28 cm) and L23 and L3 (107.62 cm) were the closest to L1. L22 was the shortest plant with a height of 89.1 cm (Table 3).
Days to maturity A significant difference was found among maturity periods of the lines, but not significantly different to the value of MR219. L25 shoed the longer period (124.4 days) to maturity, which is followed by L12 (119.6 days) and L13 (119.4 days) and the shortest period of maturity was shown by L28 (112 days). The rest of the lines showed different values than those of MR219 ( Table 3).

Number of tillers per hill
In case of number of tillers per hill, significant difference was observed between the lines, where L28 showed significant difference from the values of MR219. L7 (26.4) showed the highest value followed by L24 (25.8). A minimum number of tillers were observed in L28 (15). Remaining lines had different values, which were statistically similar to that of MR219 (Table 3).   (Table 3).
Panicles per hill L28 (20.8) showed a significantly different value than MR219. L7 (35.2) showed the highest value, which is followed by L4 (33.8) and L14 (33). The lowest value was observed in L28. Remaining lines had different values, but most of them were not significantly different with the recipient parent MR219 ( Table 3).

Percentage of filled grains
Filled grains percentage is an important criterion for the selection of a variety. Although there were significant differences between the values, most of them were found to be statistically similar to the value of MR219. There were significant differences observed in the population as L29 (78.6%) was the highest value followed by L20 (76.2%) and L2 (74.8%). The lowest value was observed in L30 (58.2%) ( Table 3).

Grain length
There were no statistical differences between the values observed in lines with the value of MR219, which indicates that this character is controlled by genetic factors and it can be said that the genetic properties of the newly developed lines were similar to MR219 (Table 3).

Grain width
Statistical differences between the values were not observed in line with the value of MR219. This character is controlled genetically and it can also be concluded that the genetic properties of the newly developed lines were similar to MR219 (

seed weight (g)
There was no significant difference observed between the lines. In this study, significant variations of traits among the lines and also with MR219 have been observed. Ullah et al. [20] noticed significant difference among the traits, while studying 10 Boro rice variety of Bangladesh, considering 10 morphological traits. Highly significant variation of traits among 40 rice accessions were also observed in another study. [21] In a similar kind of study, researchers observed highly significant variation between the morphological traits of rice. [22,23] Abarshahr et al. [24] report highly significant variation on 30 genotypes of rice and the same kind of result was observed by Chandra et al. [25] while studying 19 and 14 quantitative traits of 57 accessions of upland rice.
Tiller number is an important feature for the selection of lines as decrease of tiller number will eventually result in decrease on total yield. The highest tiller number was recorded with L7 (26.40) and the lowest with L28 (15) also suggest this findings. In case of yield/hill, the highest value was recorded on L7 (72.78) and the lowest value was on L28 (59.02). The highest and lowest values were observed because of difference in tiller number per hill and also rate of filled grain percentage. A similar kind of research was conducted by the previous researchers with similar conclusions. [26,27] Phenotypic coefficient of variation (PCV), genetic coefficient of variation (GCV) and the estimation of genotypic heritability Genotypic variation, phenotypic variation and heritability estimation were calculated and are shown in Table 4. Different types of variation were observed among different quantitative traits. In case of PCV, the highest value was shown by number of tillers/hill (22.20%) which is close to the value of panicle/hill (17.56%) and panicle length (12.84%). The lowest value was shown by days to maturity (2.87%). GCV showed variation in results in this case: number of tillers/hill (19.45%) is the highest value followed by panicle/hill (14.84%) and panicle length (11.05%). The lowest value was shown by days to maturity (2.45%).
The extent of variability for all traits under study needs to be measured for the development of a crop through a breeding programme as all the traits showed highly significant variation between all lines. Whether a trait is going to transfer from parent to offspring depends on its heritability, which plays a vital role in the selection process. [28] A plant breeder designed a breeding programme based on the observation found on GCV, PCV, GA (genetic advance) and heritability.
It was observed that some traits, i.e. plant height, days to maturity, number of panicles per hill, total number of tillers per hill, number of filled grains per hill, yield per hill and days to 50% flowering showed higher PCV than GCV, while variability of traits were studied to the range of PCV and GCV. It suggests the influence of environment on the expression of such traits. Iftekharuddaula et al. [29] reported higher PCV value than GCV for panicle/square meter and yield/ plant. Kavitha and Reddy [30] suggested higher degree of PCV than GCV for the traits, because of a higher magnitude of interaction of genotypes within the environment.
Some traits, such as plant height, days to maturity, number of panicles per hill, total number of tillers per hill, number of filled grains per hill, yield per hill and days to 50% flowering showed high PCV and GCV, which is similar to the study of Pandey and Anurag [31] and Habib et al. [32], where they observed higher PCV and GCV values in case of number of filled grains per panicle and yield per hill. Akhtar et al. [33] and Zahid et al. [34] observed high GCV and PCV values in the case of tiller per hill. In aromatic rice genotype, high PCV and GCV were also recorded. [35] High values of PCV and GCV were also noticed with some traits like number of unfilled grain per panicle and number of spikelet. [36] In the case of heritability, several traits, i.e. plant height, days to maturity, tiller numbers per hill, panicles per hill, percentage of filled grains and days to 50% maturity showed more than 70% heritability (Table 4). Yield per hill showed 82%. High heritability indicates the greater chance of genetic transfer of the traits of interest to the next generation. It can be concluded that these traits inherited easily in the studied population as they contain high heritability and those traits were little affected by environmental factors.
Iftekharuddaula et al. [29] reported high heritability for days to maturity, number of filled grains per panicle and 1000 grain weight, as in this study, we observed high heritability in the case of yield per hill, total number of tillers per hill, panicles per hill, panicle length, number of filled grains per panicle, and days to maturity. Akhtar et al. [33] and Habib et al. [32] also reported similar kind of results for days to maturity and the number of filled grains per panicle. Higher heritability value in the case of yield per hill was also reported by Pandey and Anurag [31]. Ghosh and Sharma [36] reported high heritability for total number of spikelet per panicle, yield per hill, days to 50% flowering, flag leaf length, and 100 grain weight. Plant height was noticed as a moderately heritable trait.
The distinction between the mean genotypic values of the selected population with the original population, from which those were selected, was measured by genetic advance. In this study, number of panicles per hill, number of filled grains, total number of tillers per hill, number of and yield per hill showed high genetic advance. Pandey and Anurag [31] also reported high genetic advance for the total number of spikelet.
PCV and GCV, together with heritability, provide clues of the expected genetic advance from the selection. [16] High heritability estimates based on phenotypic performance have proven to be useful in superior genotypes selection. However, genetic advance, together with estimation of heritability, would be more effective in predicting the effect for the selection of the most superior individual. [18] In this study, yield per hill, total number of spikelet per panicle, and number of filled grains per panicle had high heritability along with high genetic advance. Chaudhury and Das [37] reported similar findings for number of filled grains per panicle and yield per hill.
Heritability and genetic advance showed high values in the number of filled grain per panicle, number of spikelet per panicle and yield per hill, which indicates the prevalence of other genes for the expression of the traits. In segregating generation, selection through these characters would be effective. [38] Some traits, i.e. number of filled grain per panicle, number of spikelet per panicle and yield per hill, showed higher GCV, PCV, GA (%) and heritability, which indicates very low environmental influence over those traits and are transmissible through breeding programmes, which in turn indicates that selecting on the basis of those characters would be effective.

Cluster analysis
Euclidean distance was calculated between the selected homozygous individuals and also the recurrent parent MR219 from the standardized tabulated data. Using these values, a standard UPGMA dendrogram was constructed ( Figure 3). Thirty lines with MR219 were grouped into four clusters based on 11 traits at 1.59 dissimilarity coefficients. The cut-off point was 1.48 and it was set for the convenience of discussion.
In the cluster analysis, it can be observed that maximum lines were grouped with the recurrent parent MR219 and as a result of higher dissimilarities, the rest of the lines were organized into different groups (Table 4) Based on morphological traits, all lines and MR219 were grouped into four clusters at a dissimilarity coefficient of 1.59 (approx.), which indicates the level of diversity among the traits. This result shows the effectiveness of quantitative traits on grouping of rice genotypes. Ahmadikhah et al. [39] grouped 58 rice varieties into four cluster groups considering 18 morphological traits where the genetic distance was 0.75. The four groups are namely A, B, C, and D, where group A consisted of one member, groups B consisted of 14, group C consisted of 20 and D consisted of 23 members.
Veasey et al. [40] classified 20 rice populations into 10 clusters considering 20 traits, where the largest group was the last (seven members) and the smallest were groups number one, two and seven with only one member.
Principal component analysis PCA helps to verify the cluster analysis results about different groups found in the dendrogram. Three-dimensional (3D) plots found from PCA are often useful for the proper understanding of the studied population. In case of multivariate analysis, like cluster analysis and PCA, one result suggesting the other makes the result more accurate. In PCA analysis, similar genotypes were found together in the same group (Figure 4). PCA analysis suggested the same type of result observed in cluster analysis (Table 5) and also in 3D plots shown in Figure 4. 65.3% variation in all of the measured traits showed a high correlation in the PCA study (Table 6). Eigen vectors analysis suggested that 21.8%, 18.5%, 13.7% and 11.3% of the total variation of the studied traits could be explained by the first four principal components.
According to the results, PC1 depicted a total variation of 21.8% (Table 6). Seven traits among 11 morphological traits under study were negatively and remaining and 1000 seed weight (¡0.030) were negatively associated. With the decrease of positively correlated traits and the increase of negatively correlated traits, first PC increases. This indicated that, if one trait among the negatively correlated traits increases, it will influence the others, so that they would increase as well. PC1 could be used as a standard for the measurement of the quality of the positively correlated traits. Tiller number (¡0.509) was the most strongly correlated traits of PC1.
18.5% of the total variation was explained by PC2. Seven traits were found to be correlated negatively and remaining traits were correlated positively among the 11 traits in the study. The negatively correlated traits were days to maturity (¡0.187), plant height (¡0.183), panicles per hill (¡0.027), filled grain (¡0.511), grain length (¡0.019), grain width (¡0.12) and 1000 seed weight (¡0.0527). Positively correlated traits were tiller number per hill (0.113), panicle length (0.105), yield per hill (0.187) and days of 50% flowering (0.561). In PC2, we can conclude that, its value decreases with the increase in the values of traits which are positively correlated, but increases with the negatively correlated traits. From the correlation value, it can be concluded that filled grain% (¡0.51) was the most strongly correlated trait of PC2 (Table 6) PC3 explained 13.7% of the variation and four traits were found positively correlated. Those include panicle length (0.111), grain length (0.588), filled grain% (0.199) and days to maturity (0.129). Negatively correlated traits were plant height (¡0.273), tiller number per hill (¡0.1844), grain width (¡0.607), panicle number per hill (¡0.260), yield per hill (¡0.034), 1000 seed weight (¡0.159), days of 50% flowering (¡0.115) and filled grain (¡0.018). With the increase of negatively correlated traits and the decrease of positively correlated traits, PC3 increases. Panicle length was most strongly correlated to this PC3 with the value of 0.111 (Table 6).
11.3% of the total variation was expressed by the fourth PC which consisted of six negatively and five positively correlated traits. Yield per hill (0.099), panicles per hill (0.159), grain length (0.102), grain width (0.127) and 1000 seed weight (0.20) were the positively correlated    (Table 6). PCA was performed to clarify the morphological differences in the results of the few Eigen vectors, which can explain overall diversity among the genotypes. 65.3% of the total variation was explained by the first four principal components, 21.8% of the variation was explained by PC1, 18.5% of the variation was explained by PC2, 13.7% of the variations were explained by PC3 and 11.3% of the variation was explained by PC4. The first 10 principal components accounted for 67% of the total variation. Caldo et al. [41] implied a strong correlation among the traits. Lasalita-Zapico et al. [42] reported a total variation of approximately 82.7% among 32 upland rice varieties, where PC1 explained 66.9% of the variation and PC2 explained 15.87% of the variation.

Conclusions
Submergence tolerant-resistant lines were developed by crossing MR219 with a submergence-tolerant donor Swarna-Sub1 through MABC using MR219 as a recurrent. Sub1, the gene responsible for submergence tolerance, was successfully inserted into the genetic background of the recipient parent and checked for confirmation. In case of morphological characters, yield per hill, total number of tillers/hill and the number of filled grains had high heritability and genetic advance. All of the above were considered as general criteria for the selection. It is very useful for the researchers to select a crop considering economically important traits with very little environmental influence. All the studied lines showed similar homozygosity with the standard variety. It was noticed that the values of grain length, grain width, yield per hill, grain weight, filled grains, and panicle length were higher or equal to MR219. The newly developed lines that performed lower than MR219 were few, which indicates a tendency towards MR219. Some better plants were identified based on the measured traits through cluster analysis and could be selected for future research. On the basis of the assessed traits and cluster analysis, the following plants L7, L13, L24, L12, L26, L10, L30, L14, L27 and L9 could potentially be used as parents that produce higher yields, submergence tolerance and good morphological traits.

Disclosure statement
No potential conflict of interest was reported by the authors.