DNA molecular markers in plant breeding: current status and recent advancements in genomic selection and genome editing

ABSTRACT With the development of molecular marker technology in the 1980s, the fate of plant breeding has changed. Different types of molecular markers have been developed and advancement in sequencing technologies has geared crop improvement. To explore the knowledge about molecular markers, several reviews have been published in the last three decades; however, all these reviews were meant for researchers with advanced knowledge of molecular genetics. This review is intended to be a synopsis of recent developments in molecular markers and their applications in plant breeding and is devoted to early researchers with a little or no knowledge of molecular markers. The progress made in molecular plant breeding, genetics, genomic selection and genome editing has contributed to a more comprehensive understanding of molecular markers and provided deeper insights into the diversity available for crops and greatly complemented breeding stratagems. Genotyping-by-sequencing and association mapping based on next-generation sequencing technologies have facilitated the identification of novel genetic markers for complex and unstructured populations. Altogether, the history, the types of markers, their application in plant sciences and breeding, and some recent advancements in genomic selection and genome editing are discussed.


Introduction
Information about the genetic variations present within and between various plant populations and their structure and level can play a beneficial role in the efficient utilization of plants [1]. The evolutionary background, process of gene flow, mating system and population density are important factors used in the detection of structure and level of these variations [2]. To investigate the diversity and other important characteristics, different types of agronomic and morphological parameters have been used successfully. During the last three decades, the world has witnessed a rapid increase in the knowledge about the plant genome sequences and the physiological and molecular role of various plant genes, which has revolutionized the molecular genetics and its efficiency in plant breeding programmes.

Genetic markers
Genetic markers are important developments in the field of plant breeding [3]. The genetic marker is a gene or DNA sequence with a known chromosome location controlling a particular gene or trait. Genetic markers are closely related with the target gene and they act as sign or flags [4]. Genetic markers are broadly grouped into two categories: classical markers and DNA/molecular markers. Morphological, cytological and biochemical markers are types of classical markers and some examples of DNA markers are restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), simple sequence repeats (SSRs), single-nucleotide polymorphism (SNP) and diversity arrays technology (DArT) markers [5].

Morphological markers
Morphological markers can visually distinguish qualities like seed structure, flower colour, growth habit and other important agronomic traits. Morphological markers are easy to use, with no requirement for specific instruments. They do not require any specialized biochemical and molecular technique. Breeders have used such type of markers successfully in the breeding programmes for various crops. Main disadvantages of morphological markers are: they are limited in number, influenced by the plant growth stages and various environmental factors [6]. Since ancient times, humans have successfully used various morphological markers to investigate the variation for utilization in plant breeding [7].

Cytological markers
Markers that are related with variations present in the numbers, banding patterns, size, shape, order and position of chromosomes are known as cytological markers. These variations reveal differences in the distributions of euchromatin and heterochromatin. For example, G bands are produced by Giemsa stain, Q bands are produced by quinacrine hydrochloride and R bands are the reversed G bands. These chromosome landmarks can be used in the differentiation of normal and mutated chromosomes. Such markers can also be used in the identification of linkage groups and in physical mapping [5].

Biochemical markers
Biochemical markers, or isozymes, are multi-molecular forms of enzymes which are coded by various genes, but have the same functions [8]. They are allelic variations of enzymes and thus gene and genotypic frequencies can be estimated with biochemical markers. Biochemical markers have been successfully applied in the detection of genetic diversity, population structure, gene flow and population subdivision [9]. They are co-dominant, easy to use and cost effective. However, they are less in number; they detect less polymorphism and they are affected by various extraction methodologies, plant tissues and different plant growth stages [10].

Molecular markers/DNA markers
Molecular markers are nucleotide sequences and can be investigated through the polymorphism present between the nucleotide sequences of different individuals. Insertion, deletion, point mutations duplication and translocation are basis of these polymorphisms; however, they do not necessarily affect the activity of genes. An ideal DNA marker should be co-dominant, evenly distributed throughout genome, highly reproducible and having ability to detect higher level of polymorphism [10].

Classification of molecular markers
Molecular markers are classified into various groups on the basis of: (1) mode of gene action (co-dominant or dominant markers); (2) method of detection (hybridization-based molecular markers or polymerase chain reaction (PCR)based markers); (3) mode of transmission (paternal organelle inheritance, maternal organelle inheritance, bi-parental nuclear inheritance or maternal nuclear inheritance) [11].
Different types of DNA molecular markers have been developed and successfully applied in genetics and breeding activities in various agricultural crops. The following section provides some brief information related with molecular markers based on their method of detection. Comparisons of the important characteristics of most commonly used molecular markers are given in Table 1. Hybridization-based markers (RFLP) RFLP was the first molecular marker technique and the only marker system based on hybridization. Individuals of same species exhibit polymorphism as a result of insertion/deletions (known as InDels), point mutations, translocations, duplications and inversions. Isolation of pure DNA is the first step in the RFLP methodology. This DNA is mixed with restriction enzymes which are isolated from bacteria and these enzymes are used to cut DNA at particular loci (known as recognition sites). This results in a huge number of fragments with different length. Agarose or polyacrylamide gel electrophoresis (PAGE) is applied for the separation of these fragments by producing a series of bands. Each band represents a fragment having different lengths. Base-pair deletions, mutations, inversions, translocations and transpositions are the main causes for the variation resulting in the RFLP pattern. These variations lead to the gain or loss of recognition sites, resulting in fragments of various length and polymorphism. The restriction enzymes will not cut the fragment if a single base-pair variation occurs in the recognition site. However, if this point mutation occurs in one chromosome but not the other, it is called heterozygous for the marker, as both bands are present [12].

PCR-based markers
The PCR technique was developed by Cary Mullis in 1983, as a technique which could amplify a small quantity of DNA without the application of any living organisms [13]. Denaturation, annealing and extension are the most important steps involved in PCR reactions. For more information about PCR and its protocol, see the article of Joshi and Deshpande [14].

PCR primers
The primer is a small part of DNA or RNA from which synthesis of DNA starts. The efficiency of a primer plays a vital role in the sensitivity and efficiency of PCR [15]. The primer efficiency depends on the following main factors: (1) primer-template duplex association and dissociation during the annealing step and the extension temperature; (2) stability of the duplex to mismatched nucleotides; (3) efficiency of polymerase in the identification and extension of mismatched duplex. Primer length, GC%, melting and annealing temperature, 3' end specificity and 5' end stability are important features playing an important role in the efficiency of a primer. For a successful PCR, designing of a primer is a most crucial parameter. If all things are balanced except a primer, it will lead to no/false working of the PCR protocol. Primer length is also critical for a successful PCR and normally primers of 18-30 nucleotides in length are considered the best primers. Melting temperatures (T m ) in the range of 52-58 C provide good results. The GC content is the most important factor affecting the efficiency of a primer; 45%-60% is optimum GC% for a good primer [16].
Randomly amplified polymorphic DNA (RAPD) This technique was developed by Williams et al. [17] and Welsh and Mcclelland [18] independently. Amplification of genomic DNA is achieved by PCR using single, short (10 nucleotide) and random primer. During PCR, amplification takes place when two hybridization sites are similar to each other and in opposite direction. These amplified fragments are totally dependent on the length and size of both the target genome and the primer [5].
The selected primer should have minimum 40% GC content, as a primer having less than 40% GC content will probably not withstand the annealing temperature (72 C) where DNA elongation occurs by DNA polymerase [17]. For the visualization, the PCR product is then separated in agarose gel stained with ethidium bromide [19]. Polymorphism present either at or between primer binding sites can be detected in the electrophoresis by confirming the presence or absence of specific bands [5]. The quantity and quality of DNA, PCR buffer, magnesium chloride concentration, annealing temperature and Taq DNA (type of DNA polymerase) are some important factors affecting the reproducibility of randomly amplified polymorphic DNA (RAPD) markers [20].

AFLP
The limitations present in the RAPD and RFLP technique were overcome through the development of AFLP markers [21]. AFLP markers combine the RFLP and PCR technology, in which digestion of DNA is done and then PCR is performed [22]. AFLP markers are cost effective and there is no need of prior sequence information. In AFLP, both good-quality and partly degraded DNA can be used; however, this DNA should not contain any restriction enzymes or PCR inhibitors. For more information, see previous studies [23,24]. In AFLP, two restriction enzymes (a frequent cutter and a rare cutter) are used for the cutting of DNA. Each end of the resulting fragments is ligated with the oligonucleotides. Oligonucleotides are short nucleic acid fragments used for the ligation in PCR [12]. One end is specific for the rare cutter (6-bp recognition site) and the other one, for the frequent cutter (3-bp recognition site). This will lead to the amplification of only those fragments which have been cut by these cutters. For the development of primers, known sequences of adapters are used. Adapters are actually short, enzyme specific DNA sequences generally used for fishing an unknown DNA sequence [21]. After performing PCR, visualization is done in either agarose gel or polyacrylamide gel stained with AgNO 3 or by autoradiography [12].

SSRs or microsatellites
Microsatellites [25] are also called as SSRs; [26], short tandem repeats and simple sequence length polymorphisms [27]. SSRs are tandem repeat motifs of 1-6 nucleotides that are present abundantly in the genome of various taxa [28]. Microsatellites can be mononucleotide (A), dinucleotide (GT), trinucleotide (ATT), tetranucleotide (ATCG), pentanucleotide (TAATC) and hexanucleotide (TGTGCA) [29]. Microsatellites are distributed in the genome; however, they are also present in the chloroplast [30] and mitochondria [31]. Studies have also confirmed the presence of SSRs in protein-coding genes and expressed sequence tags (ESTs) [32]. SSRs represent the lesser repetition per locus with higher polymorphism level [33]. This high polymorphism level is due to the occurrence of various numbers of repeats in microsatellite regions and can be detected with ease by PCR [34]. Occurrence of SSRs may be due to slippage of single-strand DNA, recombination of double-strand DNA, transfer of mobile elements (retrotransposons) and mismatches. Common motifs present in SSRs are Mono: A, T; Di: AT, GA; Tri: AGG; Tetra: AAAC. Mainly the sequences which are flanking the SSRs are conserved and are used in the development of primers. Development of a genomic library and sequencing a segment of the studied genome will result in the development of these primers. For more information related to the development of SSRs, see the review by Kalia et al. [34]. The development of SSR markers involves the development of an SSR library and then detection of specific microsatellites. After this, the detection of favourable regions for primer designing is done and then PCR is performed. Interpretation and evaluation of banding patterns are performed and assessment of PCR products is performed for investigation of polymorphism [35]. SSR markers are considered a marker of choice, as they are co-dominant, with high reproducibility and greater genome abundance, and they can be used efficiently in plant mapping studies [26,34].

Chloroplast microsatellites (cpSSRs)
It is difficult to detect enough sequence variations due to lesser mutation rates that characterize the chloroplast genome. Contrary to this, cpSSRs provide higher polymorphism levels with easily genotyping, which has made them a very handful and popular marker for population genetic studies [30]. cpSSRs typically contain mononucleotide motifs which are repeated 8-15 times. The polymorphism level in cpSSRs is quite changing across species and loci. Two important features distinguishing the cpSSRs from nuclear microsatellites are (i) chloroplasts are inherited uniparentlly and (ii) the chloroplast chromosome is a non-recombinant molecule due to which all cpSSRs loci are linked [36]. CpSSRs have been successfully applied in agriculture and basic plant sciences [37] Mitochondrial microsatellites Plant mitochondrial DNA (mtDNA) is very dynamic, with the largest and the least gene density among eukaryotes. Its size ranges between 200 and 2500 kb and consists of different repeated elements and introns [38]. Plant mtDNA is larger as compared to the animal mtDNA and is characterized by molecular heterogeneity seen as groups of circular chromosomes which differ in size and abundance. The evolution rate of mtDNA markers is slow, which is regrettable for plant population biologists; these markers have very limited application in population genetics. mtDNA markers exhibit many limitations like the fact that they represent only a single locus; uncertainty in genealogical analysis can be increased due to increased probability of missing links in mitochondrial haplotypes and underestimation of genetic diversity [39].

RAMP (Randomly amplified microsatellite polymorphisms)
Microsatellite markers exhibit greater level of polymorphism with the drawback of being labour intensive. While RAPD markers are cost effective as compared to microsatellites, their level of polymorphism detection is low as compared to that of microsatellite markers. To overcome the imperfection of these two methods, randomly amplified microsatellite polymorphisms (RAMP) markers were developed [40]. This marker system involves an SSR primer which is utilized for the amplification of genomic DNA in the absence or presence of RAPD primers. SSR primers are radiolabeled consisting of a '5' anchor and '3' repeats. The resulting products are resolved using submarine agarose electrophoresis [41]. The melting temperature of this marker system is maintained 10-15 C higher for the anchored primers as compared to the RAPD ones, which helps in the efficient annealing of the anchored primer [40]. RAMP markers are cost effective, reflect higher polymorphism and have wide distribution in the genome. They have been successfully applied in various plants for molecular characterization [40,41].
Sequence-related amplified polymorphism (SRAP) Li and Quiros [42] developed this method mainly for the amplification of open reading frames (ORFs). This marker system is based on amplification using two primers. The primers used for this marker system are 17-18 nucleotides long. They use the CCGG sequence in the forward primer and AATT in the reverse primer, and the annealing temperature in the first five cycles is set at 35 C during PCR. The reaming 35 cycles are run at 50 C annealing temperature. The PCR amplified product is then loaded on gel electrophoresis and DNA bands are visualized through autoradiography. Sequence-related amplified polymorphisms (SRAPs) are dominant in nature and DNA fragments are scored by the presence or absence of a band. This is a simple and efficient marker system which is widely used in a range of fields, including map construction, genomic and cDNA fingerprinting [41]. SRAP is a dominant marker system which has been successfully applied to investigate the genetic variations in different taxa [43].
Inter simple sequence repeat (ISSR) This technique was developed by Zietkiewicz et al. [44]. It is based on amplification of DNA segments located in between two identical but oppositely oriented microsatellite repeat regions, at a distance which allows amplification. Primers used in this technique are also known as microsatellite and they might be di-, tri-and tetra-or penta-nucleotide repeats. Normally long primers having a size of 15-30 bases are used in this technique. The primers used in Inter simple sequence repeat (ISSR) may be unanchored [45] or more typically they are anchored at the 3 0 or 5 0 end having 1 to 4 degenerate bases, which are extended into the flanking sequences. ISSR allows the successful usage of high annealing temperature (about 45-60 o C); the amplified products are 200-2000 bp long and can be visualized through agarose or PAGE [46,47]. Segregating by simple Mendelian laws of inheritance, they are characterized as dominant markers [44,48]; however, they can also be used in the development of co-dominant markers [49]. ISSRs are simple, easy to understand as compared to RAPD and there is no need of prior knowledge of DNA sequences [50,51]. However, they are dominant markers and they have less reproducibility with homology of co-migrating amplification products [11].

Retrotransposons
Transposons are mobile genetic elements capable of changing their locations in the genome. Transposons elements were discovered in maize almost 60 years ago [52,53]. There are two classes of transposable elements.
Class I known as retro-elements, such as retrotransposons. Retrotransposons may be short interspersed nuclear elements or long interspersed nuclear elements and they are the mRNA-encoded element. In this class, a new copy of transposon is produced after each transposition event; however, the original copy remains intact at the donor site. Class II contains DNA transposons and their locations change by the cut-and-paste method in the genome [53]. Retrotransposons are an important class of repetitive DNA constituting 40%-60% of the entire plant genome [53,54]. Retrotransposons belong to class I of transposon elements and they transpose through an RNA intermediate, which is not present in class II transposable elements [52]. Retrotransposons are grouped into two subclasses on the basis of their structure and transposition cycle. Long terminal repeats (LTRs) retrotransposons (LINE; long interspersed nuclear elements) and non-LTR retrotransposons (SINE; short interspersed nuclear elements). These two subclasses can be differentiated based on the presence or absence of LTRs at their ends [55]. LTR retrotransposons are widely distributed in the plant genome and in many crop plants, nearly 40%-70% of their DNA contains LTR retrotransposons [56,57]. On the basis of integration, target site duplications of 4-6 bp are often produced by LTR retrotransposons. LTR retrotransposons contain ORFs, POL and GAG, as they are widely distributed within plant genomes [58]. LTR retrotransposons are further divided into Ty1/copia and Ty3/gypsy retrotransposons on the basis of encoded gene order [59]. Class II of transposable elements is further divided into terminal inverted repeat (TIR) and non-TIR subclasses [60]. As transposon elements have great abundance and wide dispersion in the genome, they are an ideal source for the development of molecular markers [61]. The following are some important retrotransposon-based molecular markers.

IRAP
Inter-retrotransposon amplified polymorphism (IRAP) is a retrotransposon-type marker developed by Kalendar et al. [62]. Sequences present between two adjacent LTR retrotransposons are amplified by the IRAP system through the application of primers which are complementary to the LTR sequence 3' end. The orientation of these LTR sequences can be (1) tail-tail, (2) head-head and (3) head-tail [63]. Identical sequences are present in different strands that are separated by small inter-genic distances in headto-head arrangement and are transcribed away from each other. However, in those with tail-to-tail orientation, the arrangement is opposite to the head-to-head one and they are transcribed towards each other. Both 5' and 3' primers are used for head-to-tail LTRs, while a single primer is used for those with head-to-head or tail-to-tail arrangement. Agarose gel is used to resolve the IRAP product [64]. A single IRAP reaction can produce many amplicons having different sizes ranging between 300 and 3000 bp [65].

REMAP
Retrotransposon microsatellite amplification polymorphisms (REMAP) is an important retrotransposon-based marker commonly used to analyse the genetic diversity. The REMAP protocol is similar to IRAP; however, in REMAP, SSRs (microsatellites) are used in conjunction with specific primers of LTR during PCR [62,64]. During REMAP PCR, those primers are selected for microsatellite loci which contain a repeat motif anchored nucleotide at the 3' end aiming to avoid the primer slippage between individual SSR motifs [59].

Retrotransposon-based insertion polymorphism (RBIP)
This technique was developed by Flavell et al. [66]. In it, the presence or absence of retrotransposon sequences is investigated, which can be used as molecular marker.
In this technique, DNA amplification is achieved through a primer having 3' and 5' end regions flanking the retrotransposons insertion site. Detection of the presence of insertion is achieved through the development of primer from LTR. Sequence information about the regions flanking the retrotransposon insertion sites is needed in this technique and it results in the typing of a single locus as compared to other retrotransposon-based markers [65]. Agarose gel electrophoresis is used for the detection of polymorphism [67]. Tagged microarray marker, which is based upon fluorescent microarray marker scoring, is used for high-throughput retrotransposon-based insertion polymorphism (RBIP) analysis [68].

Inter-primer binding site (iPBS)
The requirement for prior knowledge about the sequence of LTR is a big problem while using all retrotransposonbased markers. To obtain such information, cloning and sequencing of LTR is performed. To solve this problem, primer binding sites (PBSs) of retrotransposons are used in this technique. A tRNA complement is present in all LTR retrotransposons and retroviruses. PBSs are their binding sites adjacent to the 5' LTR and are highly conserved. Reverse transcription starts when the tRNA binds its 3' terminal sequences with the primer binding site. The role of primers is to bind in this area and amplification of diverse sequences is performed. Mostly retrotransposons are mixed, inverted, nested or truncated in the chromosomal sequences and their amplification can be achieved by using a conservative PBS primer. LTRs present in fragments having retrotransposons as their internal part are present with the other retrotransposons and result in close occurrence of PBS sequences with each other. PBS is a universal method, as they occur in all LTR-based retrotransposon sequences [69]. Recently, inter-primer binding site (iPBS) markers have emerged as the most important and universal method for the identification of genetic diversity and relationships in various plants [61,69,70].

Cleaved amplified polymorphic sequences (CAPS)
Cleaved amplified polymorphic sequence markers (CAPS) originally named as the PCR-RFLP markers due to combination of RFLP and PCR [71]. In this technique, target DNA is amplified using PCR and then its digestion is performed with restriction enzymes [72,73]. Agarose gel or acrylamide gel is used for the visualization of CAPS products. The primers used in this technique are developed from sequence information present in a databank of genomics or cloned RAPD bands or cDNA sequences. CAPS markers are versatile and the possibility to find DNA polymorphism can be increased by combining CAPS with single-strand conformational polymorphism, SCAR, AFLP or RAPD [67]. CAPS markers are co-dominant markers and have been used in genotyping, map-based cloning and molecular identification studies [74,75].

SCAR (Sequence-characterized amplified regions)
Sequence-characterized amplified region (SCAR) markers were first developed in 1993 by Paran and Michelmore in lettuce for downy mildew resistance genes [76]. SCAR markers are more specific and more reproducible as compared to RAPD [77]. SCAR markers are co-dominant and mono-locus markers and are mostly applied for physical mapping [78]. The procedure for the development of SCAR markers includes purification of PCR fragments followed by designing of SCAR primer [76][77][78][79]. Polymorphic bands are detected by using agarose gel and then the nucleotide sequence of the selected fragment of DNA is investigated. Analysis of the sequence of this polymorphic DNA is made by comparing it with the known DNA sequences available at the NCBI (National Center for Biotechnology Information) database for sequence uniqueness. Then this nucleotide sequence of polymorphic DNA is utilized for the synthesis of specifics SCAR primers [78].

Sequence-based markers
Sequencing is a technique in which nucleotide bases and their order is identified along the DNA strand [80], and molecular markers which are based on the identification of a particular sequence of DNA in a pool of unknown DNA are known as sequence-based markers. The development of this technology resulted from the fact that hybridization-based markers are less reliable and polymorphic. The advent of the sequencing technologies like next-generation sequencing (NGS) and genotyping by sequencing (GBS) revolutionized the plant breeding through development of SNPs resulting in high polymorphism [81]. Various types of sequencing technologies have been developed so far and have been reviewed briefly by Heather and Chain [82]. However, some important sequencing methods are also described as follows.

Sanger method of sequencing
The plus-and-minus method was the first method of DNA sequencing. It was used by Sanger and Coulson [83]. The basic principle of this method is that singlestranded DNA molecules which show length differentiation of a single nucleotide can be separated from each other with the help of PAGE. This method is also known as Sangers's dideoxy sequencing method, as it uses modified bases known as dideoxy nucleotides [82]. During the early studies, bacteriophage T4 DNA polymerase and DNA polymerase I from Escherichia coli were used in this method [84,85]. The resultant polymerization products were loaded onto acrylamide gels and resolved through ionophoresis. However, this method contains a lot of limitations; hence, after two years, Sanger and his team introduced a new technique for sequencing in which oligonucleotides were sequenced by polymerization by enzymes [86]. This method facilitates the maximum measurement of variations. It has high reproducibility and requires a low quantity of DNA. However, this method is costly, time consuming, with low genome coverage and detects less polymorphism below the species level [80].

Pyrosequencing
Pyrosequencing is a synthesis principle-based sequencing technique [87] in which phosphate is identified during the synthesis of DNA [88]. In this method, the primer used for sequencing is hybridized with a single-stranded DNA template which is biotin-labelled and is combined with specific enzymes [89]. Deoxynucleoside triphosphates are added separately in the reaction mixture using four cycles. The reaction begins with the polymerization of nucleic acid where PPi (pyrophosphate), which is inorganic in nature, released. As the nucleotides are added, the reaction is accompanied by continuous release of inorganic phosphate and this released PPi is in equal amount to the incorporated nucleotide. Initially, the activity of DNA polymerase was monitored by this technique. Solid phase sequencing [90] and liquid phase sequencing [91] are two types of pyrosequencing.

Next-generation sequencing (NGS)
The development in the sequencing techniques increased the demand for extensive throughput sequencing at a low cost. This demand led to the development of NGS and currently this technique produces millions of sequences. This technique has the ability to produce several hundreds of millions to several hundreds of billions of DNA bases per run [92]. Many organizations have developed this technique successfully and they provide their services commercially, such as Illumina MiSeq and HiSeq 2500 [93], Roche 454 FLX Titanium [94] and Ion Torrent PGM [95]. These NGSs resulted in low prices with covering whole genome more precisely [96]. Similar methodology is used in all NGS techniques for the preparation of template DNA, where fragments of DNA are randomly sheared and ligated at both ends with universal adapters. This sequencing is performed in constant channel and one or more nucleotides are incorporated, resulting in the release of a signal that is detected by a sequencer [97]. Some advantages of NGS are (1) NGS is more accurate to older sequencing methods and (2) low in cost with high throughput. (3) Recently, this technique is being used in whole genome sequencing in order to investigate the maximum numbers of SNPs and for consideration of diversity present within the species, construction of linkage/halophyte maps and in genome-wide association studies (GWAS) [98]. (4) Sequencing of older DNA samples is also performed by NGS and this technique has strengthened the field of metagenomics [99].
Genotyping by sequencing (GBS) GBS is a simple and multitudinous technique successfully used nowadays. This technique was developed in the Buckler lab under the Illumina NGS platform. Modernization in the NGS technique has lowered the sequencing costs, assuring the successful application of GBS for large genome species having great magnitude of diversity [98]. On the basis of ion PGM system usage, there are two types of GBS techniques: (1) Digestion of restriction enzyme: this method is mainly used in marker-assisted selection (MAS) programmes for the identification of new markers and here no particular SNPs are identified.
In this method, prior to the ligation of adapters, DNA is digested with one or two specific restriction enzymes.
(2) Multiplex enrichment PCR: In this technique, specific PCR primers are selected for the amplification of points of interest. In contrast to the digestion in the restriction enzyme method, a complete set of SNPs are identified for a genome section. GBS was basically developed to investigate the high-resolution association in maize and now it is used for many other species having a complex genome. Main advantages of GBS are (I) less cost as compared to the other techniques, which made GBS a novel technique for the identification of SNPs in different species and crops. (II) This technique provides satisfactory results in the characterization of germplasm, population studies and breeding of diverse crops [100]. (III) GBS produces a great magnitude of SNPs which are used in genotyping and genetic analysis [101]. (IV) It lowers the handling of samples and (V) includes less PCR and purification sets [81].
On the basis of the sequencing techniques outlined above, the following sequence-based markers have been developed.

Single-nucleotide polymorphism (SNP)
Single base-pair changes present in the genome sequence of an individual are known as SNPs. SNPs may be transitions (C/T or G/A) or transversions (C/G, A/T, C/A or T/G) on the basis of the nucleotides substitution. Normally, in mRNA, single base changes are present, including SNPs that are insertion/deletions (InDel) in a single base. A single-nucleotide base is the smallest unit of inheritance and SNP can provide the simplest and maximum number of markers. SNPs are present in abundance in plants and animals and the SNP frequency in plants ranges between 1 SNP in every 100-300 bp [102]. SNPs are widely distributed within the genome and can be found in coding or non-coding regions of genes or between two genes (intergenic region) with different frequencies [102]. A large number of methods for SNP genotyping have been developed based on different techniques of allelic discrimination and detection platforms. Among these, RLFP (SNP-RFLP) is the simplest and easiest method and the CAPS marker technique also can be applied in the SNP detection. If binding sites for restriction enzymes are present on one allele, while other alleles have no binding site, their digestion will result in fragments of various length. Identification of SNPs is achieved through the analysis of sequence data stored in databases. Different kinds of SNPs genotyping assays have been developed based on different molecular mechanisms. Among them, primer extension, invasive cleavage, oligonucleotide ligation and allele-specific hybridization are most important [103]. Various recent high-throughput genotyping methods like NGS, GBS and chip-based NGS, allele-specific PCR makes SNPs as the most attractive markers for genotyping [67].

Diversity array technology (DArT Seq)
It is a technique that provides a great opportunity for the genotyping of polymorphic loci (in several hundreds to several thousands), which are distributed over the genome. It is highly reproducible microarray hybridization technology. There is no need of previous sequence information for the detection of loci for a trait of interest [104,105]. The most important benefit of this technique is that it is highly throughput and very economical. To discover polymorphic markers by this technology, a singlereaction assay can genotype several thousand genomic loci. As little as 50-100 ng genomic DNA is sufficient for the genotyping purpose. For the scoring and discovery of markers, an identical platform is utilized. After the discovery of a marker, there is no need of specific assays for genotyping, except starting polymorphic markers assembly into an array of a single genotype. These polymorphic markers within the genotyping arrays are commonly used for genotyping [106]. The advantages and disadvantages of different genetic markers are described in Table 2.

Evolution and phylogeny
In the past, initial studies related to evolution were totally dependent on the geographical and morphological changes among the organisms. Advancements in the techniques of molecular biology offer extended information related to the genetic structure [107]. For the reconstruction of a genetic map, in order to get full information about the phylogeny and evolution, molecular markers are being used on a large scale nowadays [108][109][110][111]. Molecular studies related to phylogeny are largely dependent on chloroplast genome sequence data due to their simple and stable genetic nature, making them ideal markers in the evaluation of plant phylogeny [112,113].

Investigation of heterosis
Heterosis describes the greater performance of progeny (F 1 ) over the mean of the two crossed parents. If the effect in F 1 is greater than that in its parents, such heterosis is known as positive heterosis; while where the effect in F 1 is lower than in its parents, such type of heterosis is known as negative heterosis [114]. Various studies have been conducted by using molecular markers in various crop plants such as wheat [115], maize [116] and rape seed [117], to investigate the genetic diversity and heterosis. Molecular markers like SSRs have been used in the investigation of diversity and heterosis in rice [118]. Recently, SSR markers were applied in order to investigate the heterotic groups and patterns in rice [119]. Some studies have used transcriptome analysis to analyse the genes involved in heterosis [120,121].

Identification of haploid/diploid plants and cultivars genotyping
Haploids are plants having a single set of gametophytic chromosomes and diploids are plants with two copies of each homologous chromosome [122]. These haploid/ double-haploid (DH) plants are very important because they are used as a mapping population for quantitative trait loci (QTL) mapping [123] and in other breeding and genetic studies. DH plants are very important in the integration of physical and genetic maps and thus allow the accurate detection of candidate genes of interest [124,125]. The R1-nj (Navajo) anthocyanin colour marker has been successfully applied for the identification of haploids [126]. Similarly, SSR and SNP markers have been applied to detect DH and genotypes of isogenic lines and hybrids [127][128][129].

Genetic diversity assessment
Recent advancements in molecular markers and genome sequencing offer great opportunity to investigate the genetic diversity in a very big germplasm [111,130,131]. Genetic diversity assessment is very helpful in the study of the evolution of plants and their comparative genomics, helping to understand the structure of different populations [132][133][134]. Genetic markers have been successfully applied in the determination of genetic diversity and the classification of genetic material [135][136][137]. DArT markers and SNPs markers are the most commonly used markers for the determination of genetic diversity in various crops [138].

Utilization of molecular markers in backcrossing for a gene of interest
Introgression is a technique in which some genes of interest are transferred from plant genetic resources (PGR) to crop varieties. In this technique, some desired traits are selected from exotic germplasm and transferred into crop plants by backcrossing [139]. MAS has played an important role in the usage of wild genes and their transfer into crop plants. Many genes of interest from wild plants have been transferred in nearly all economically important cultivated plants. Mainly SSR markers are used for this purpose. Two SSR markers have been successfully used to transfer the Lgc-1 locus related to low gluten level in japonica rice with 93-97% selection efficiency [140]. Barley yellow mosaic virus is an important disease in barley and rym4 and rym5 are genes incurring resistance to this disease and variety of markers have been developed for the selection of these genes [141].

Genetic mapping
Genetic mapping employs methods for identification of the locus of a gene as well as for determination of the distance between two genes. Gene mapping is considered as the major area of research in which molecular markers are used today. The principle of genetic mapping is chromosomal recombination during meiosis which results in the segregation of genes. Markers present close to the gene of interest on the same chromosome are known as linked markers.

QTL mapping
Most agricultural traits of economic interest are polygenic and quantitative in nature and are controlled by many genes on the same/different chromosome. The chromosomal regions having genes for these quantitative traits are referred to as QTL. QTL mapping is a method in which molecular markers are utilized to locate the genes that affect the traits of interest. Such traits are divided into two groups: one is quantitative and the second one is qualitative traits. Discontinuous variations can be shown by qualitative traits, while continuous variation occurs in quantitative traits. For QTL study, molecular markers are very important and considered as an ideal tool for the purpose; they can be used for MAS as well [142]. Some important steps in QTL mapping include the selection of two diverse parents having allelic variations that affect the trait of interest. After the phenotyping of the mapping population, polymorphic markers are used to obtain the genetic data. Then the genetic map is constructed and some statistical programs are applied to identify molecular markers linked with the trait of interest [4]. The QTL mapping methodology is described in Figure 1.

QTL mapping populations
For QTL mapping, two diverse parents should be selected and they should be diverse enough to exhibit an adequate level of polymorphism. Near-isogenic lines (NILs), DHs, backcrosses (BCs), recombinant inbred lines (RILs) and F 2 populations can be used as the mapping population [143]. Practically 50-250 individuals are selected in a mapping population but for high-resolution and fine mapping, a larger size of the mapping population is required [144,145].

Selection of markers for QTL mapping
Different types of markers like RFLP, AFLP, ISSR, SSR, ESTs, DArT and SNPs have been commonly used for the construction of linkage maps in several plants [11]. Normally, for genetic mapping studies, 100-200 markers have been used for linkage maps construction [144]. The marker number, however, varies according to the studies and directly depends on the species genome size, as larger genome sized species require a larger number of markers. However, with the advent of NGS, several thousands of DNA markers are now utilized for high-resolution genetic mapping [145,146].

Genetic/linkage map construction
The linkage map is a road map that describes the position and relative genetic distance between markers Figure 1. QTL mapping methodology. [143]. QTL mapping is based on marker segregation via chromosome recombination during meiosis, in which those markers which are tightly linked with each other will be transferred together more commonly during recombination as compared to those which are away from each other. This recombination frequency is used to calculate the recombination fractions. Through the segregation analysis, the actual distance and relative order of markers can be calculated [4]. Odds ratios (the ratio of linkage versus no linkage) are used for the calculation of the linkage between markers. This value is called LOD, or logarithm of odds [147]. For the construction of linkage maps, LOD values of >3 are considered ideal [4].

Factors affecting the QTL detection
The detection of QTLs in a segregating population is affected by several factors. Among these; genetic properties of QTL, environmental factors, experimental errors in phenotyping and size of population are main factors affecting the QTL detection [146]. The environment directly affects the expression of quantitative traits and when some experiments are conducted on the same sites for various seasons, it helps to detect the effects of environments on the QTL having influence on the traits of interest [156]. The population size directly influences the QTL mapping studies. A larger sized population results in the more precise mapping and also facilitates the detection of the QTLs with less pronounced effects [148,157]. The experimental errors include the errors arising from imprecise phenotyping and genotyping. Nonaccurate phenotypic data and errors in genotypic data influence the distance between markers [158]. Several QTLs have been described in the literature for different traits of interest but most of the QTLs are false due to errors in phenotyping at multi locations and involvement of different factors in field experiments.

QTL validation
After the QTL detection, it is necessary to validate that particular QTL. For this purpose, diverse populations will be developed by crossing different parents in order to check the presence of a particular QTL in other populations with different genetic background. NILs are commonly used for the confirmation and validation of QTL [4]. NILs have been used to precisely evaluate the effect of different pollen sterility loci in rice [129]. Confirmation of QTL provides the information about the marker to be used or not for MAS [159].

QTL cloning
Large numbers of QTLs have been isolated in plants and are mostly cloned by positional cloning. Positional cloning is also known as map-based cloning. Map-based cloning is mainly applied for detection of the genetic basis responsible for a mutant phenotype [160]. Mapbased cloning facilitates the allocation of a QTL having a very small genetic interval and to detect the distance on the DNA sequence. In QTL mapping, mapping populations are developed and molecular markers are then used to assign the shortest genetic distance and to detect the distance on the DNA sequence. Advancement in the sequencing technologies saves time and helps in the detection of accurate and tightly linked QTLs. These new sequencing techniques provide precise results and save more time as compared to map-based cloning. However, these high-throughput techniques are costly, as they require high initial cost. Some important steps in positional cloning are development of NILs or F 2 or a backcross population. Phenotyping of these populations is performed and their screening is done with different molecular markers; however, newly developed methods, i.e. NGS and MassARRAY System, can help in the more and fast detection of SNPs [81]. Fine mapping is performed for the construction of a genetic map with the polymorphic marker to locate a very small genetic interval. Larger numbers of individuals are used in fine mapping to increase the recombination rate, which results in a decreased interval up to 0.16 cM. Nearly 3000-4000 plants should be used as the mapping population and 600 plants as first-pass mapping population in order to achieve higher level of recombination [160]. Such markers should be selected which are tightly linked. A physical map is constructed as the genetic resolution reaches the 0.1-cM level. Anchoring of the genetic map to the physical map is achieved by the utilization of markers near to the QTL [160,161]. A candidate gene is selected and sequenced to design specific primers for PCR amplification of the candidate gene [162].

Chromosome walking
The interval between a QTL and a marker can be decreased by chromosome walking/genome walking. Chromosome waking is a method of positional cloning mainly performed for the identification and isolation cloning of a particular allele. This is a very efficient technique used in the identification of unknown regions flanking a known DNA sequence. During the chromosome walking procedure, first, large insert libraries are developed and then positive clones are identified with a series of cloning steps so that walking should be achieved towards the gene of interest. Different types of chromosome walking techniques have been developed, including inverse PCR [163] and vectorette PCR [164]. In these techniques, restriction enzymes are used for the digestion of genomic DNA and then genomic DNA is ligated. Then PCR is performed for the amplification of flanking regions where the ligated product is utilized as template. For the detection and isolation of promoter elements, genome walking kits are now available on the market [165]. The main disadvantage of map-based/ positional cloning is that it is a very time-consuming and laborious technique.
Advantages and drawbacks of QTL mapping QTL mapping is used to detect the genes which control the trait of interest [144]. It is very useful for the genome-wide scan for QTLs detection in plants.
Diseases are a big concern in agriculture and genes responsible for generation of resistance to these diseases can be detected by QTL mapping [166]. Some important drawbacks of QTL mapping include less allelic diversity, lower number of recombination events [167], being time consuming in case of mapping population development [168] and specificity of the detected QTLs to a given population [169].

Association mapping (AM)
Association mapping (AM) is significant association of molecular markers with a phenotypic trait. Statistically, AM is the covariance between the polymorphism present in the marker and the trait of interest [170,171]. It is more time saving as compared to linkage mapping and provides greater mapping resolution with a higher number of recombination events. AM facilitates the identification of a greater number of alleles due to availability of more genetic variations with larger background; historically measured phenotypic data can also be used for AM [172,173].
Why association mapping? Linkage mapping, known as bi-parental mapping, is a classical mapping technique used to study the linkage in several plant species over 20 years [174]. The major limitations of QTL mapping are described above and these limitations could be overcome by the introduction of linkage disequilibrium (LD)-based AM [175,176].

Linkage disequilibrium (LD)
Non-random association of alleles at different loci is known as LD. LD describes the increased or decreased (non-equal) frequency of haplotypes in a population. LD can be described as P AB 6 ¼ P A £ P B [177], where A and B are two alleles present at different loci; P AB describes the frequency of haplotypes having both alleles at the two loci; P A and P B show the frequency of haplotypes having a single A or B allele, respectively. LD is also known as gametic phase disequilibrium or gametic disequilibrium [166]. In 1917, LD was first defined by Jennings and quantified in 1964 by Lewtonin (reviewed in [178]). It is necessary to obtain knowledge about the LD patterns for the genomic areas of the targeted organism. Similarly, there should be prior knowledge about the specificity of the extent of LD present between various populations. The square of the correlation coefficient (r 2 ) and the disequilibrium coefficient (D 0 ) are two widely used statistical methods for measuring the LD. GOLD [179], R and TASSEL [180] are the most commonly used software applications to describe the structure and pattern of LD.
Factors affecting LD Genetic and demographic factors are responsible for generating haplotypic blocks [177]. Recombination and mutation are responsible for the significant LD. New mutations, autogamy, epistasis, genetic isolation, population size, selection, kinship and genomic rearrangements are responsible for the increase in LD. LD decreases with the higher rates of mutation, recombination, gene conversion and recurrent mutations [177].
General methodology of association mapping AM involves the selection of individuals from a natural population having a wide range of genetic diversity. Complete and precise phenotyping is performed for various traits of interest preferably in different locations and environments for many years. After genotyping with favourable markers, the structure of populations and their kinship are determined. Different statistics like D, D' or r 2 are performed for the quantification of LD [181]. Finally, the phenotyping and genotyping data are associated by using some statistical software programmes. TASSEL is the most widely used software for AM [180]. The methodology of AM is shown in Figure 2.

Types of association mapping
Generally, AM is divided into two categories: (i) candidate-gene-based AM and (ii) genome-wide association study.
Candidate-gene-based association mapping. It is a very useful technique where scientists study the correlation present between a trait of interest and the DNA polymorphism present in a gene. Candidate genes are generally genes having direct or indirect effect on the trait of interest with known biological functions [181,182]. Biologically relevant candidates are selected with their trait dissection and they are ordered according to their evolutionary data obtained from their physiological, chemical and genetic studies [183]. This technique needs the detection of SNPs present between lines and within specific genes. The simplest method to investigate the candidate gene depends on the re-sequencing of amplicons. The exon, promoter and introns with 5'/3' untranslated regions are accountable factors in the investigation of candidate gene SNPs. The amount of SNPs per unit length requires for the detection of significant association which can be described by the pace of LD decay for a specific candidate gene locus [184]. The candidate-gene technique has been successfully used for the characterization and cloning of QTLs in the last many years. This technique has been successfully used for the development of many tightly linked genes into functional markers (FMs) [185].
Genome-wide association study (GWAS). Recent advancements in the field of sequencing and genotyping have made possible the GWAS in various species. This is a powerful technique mainly used to study the genetics of natural variations and traits of interest. Now several organizations have developed GWAS platforms commercially. Normally, inbred lines are used for GWAS and, after the genotyping of these lines, multiple times of phenotyping are performed [186]. For the detection of QTLs, a large size of population (up to tens of thousands of individuals) is used to obtain high resolution. Millions of SNPs are produced through GWAS and the SNP number also increases, as more and more advancement in technology is coming [187]. This technique facilitates greater resolution, ability to investigate the haplotype blocks small in size which are significantly correlated with quantitative trait variations [188] and is a very cost-effective method with high throughput [186]. GWAS has been performed in nearly all economically important crops, like maize, sorghum, millet and rice [189].

Nested association mapping (NAM)
This technique was first proposed by Yu et al. [190]. In nested association mapping (NAM), various designed mapping families are connected with each other. NAM is a technique that combines the effectiveness of both association and linkage mapping and has been applied successfully in the determination of FMs in various plants. Some important steps in NAM include the phenotyping for various traits of interest, followed by complete sequencing/genotyping of diverse founders/parents or dense genotyping. Then different markers are applied on both parents and progenies for genotyping in order to investigate the transfer of high-density maker information from parents to progenies. Finally, genome-wide analysis is performed for the association of phenotypic data with genotypic data [190]. This technique can be used effectively in the identification of FMs [191].

Multi-parent advanced generation inter-cross (MAGIC)
This technique provides higher rate of recombination and enhanced mapping resolution as compared to biparental mapping by interrogating several alleles. The main idea behind the development of the MAGIC populations is to enhance the intercrossing level and to increase the genome shuffling [192]. Advanced intercrossed lines are used as populations for MAGIC and are developed through performing random and subsequent inter-crosses in a population, which is developed when two inbred lines are crossed [193]. MAGIC populations are very beneficial in different breeding programmes and can be used as permanent mapping populations in the determination of more accurate QTLs as well as directly or indirectly in the development of a variety [194].

Marker-assisted selection (MAS)
MAS is a technique in which phenotypic selection is made on the basis of the genotype of a marker [4]. MAS is a molecular breeding technique that helps to avoid the difficulties concerned with conventional plant breeding. It has totally changed the standard of selection [144,182]. Plant breeders mostly use MAS for the identification of suitable dominant or recessive alleles across a generation and for the identification of the most favourable individuals across the segregating progeny [195]. Some important steps involved in MAS are described in Figure 3.

Important MAS schemes
Important schemes used for MAS are: (1) marker-assisted backcrossing; (2) gene pyramiding; (3) marker-assisted recurrent selection; (4) genomic selection. Marker-assisted backcrossing (MABC). Backcrossing is a very old technique and its efficiency was improved when molecular markers were introduced. MABC is a backcrossing technique in which molecular markers are used [196]. MABC involves three levels. The first level is known as 'foreground selection' and markers are utilized in combination with or to substitute screening for the gene or QTL [197]. The second level is known as 'recombinant selection'. At this level, backcross progeny having target genes or QTL is selected and recombination is performed between linked flanking markers and the target locus. By recombinant selection, the size of the donor chromosome segment is reduced [198]. The third level of MABC is known as 'background selection'. At this level, backcross progeny having a large amount of recurrent parent genome is selected using markers which are unlinked with the target locus [199].
Marker-assisted recurrent selection (MARS). This is very handful technique in which molecular markers are applied at each generation in order to target all traits of interest; it was proposed in the 1990s [200]. In this technique, crossing is performed in selected individuals at every crossing and selection cycle. MARS is specially involved with the improvement of F 2 population that is achieved through one cycle of MAS (having phenotypic data with marker scores) followed by performing 2-3 cycles of marker-based selections (having marker scores only). It is a simple technique which can be applied easily without requiring any prior knowledge of QTLs, and the selection totally depends on the associations established between the marker and trait during the MARS programme [201].
Marker-assisted gene pyramiding. This is a technique in which multiple QTLs/genes for a single or multiple traits are transferred into a cultivar which is deficient for these traits. This technique is mainly applied to increase the level of resistance to particular diseases and insects through the selection of two or more genes simultaneously [202]. MAS has been successfully applied to pyramid many desired genes in various crops [203,204].

Functional/diagnostic markers (FMs)
FMs are also known as the perfect markers or diagnostic markers. Diagnostic/functional molecular markers provide a unique opportunity to screen large collections of germplasm for allelic diversity in short time with high accuracy and for traits having FMs. FMs are developed from polymorphic regions present within the genome that cause variation in phenotypic traits [133,147,205,206]. Some important advantages of FMs are that any population can be studied by FMs [168] and they are directly linked with the allele of locus of interest. As FMs are directly linked with the genes of interest and recombination between gene and marker is absent, false selection and loss of information in marker-assisted breeding are avoided [205]. FMs have ability to fix the alleles in a population more efficiently and selection is more balanced and controlled. FMs can be used for the construction of linked FM haplotypes and also for the validation of cultivars identity [169]. The most important steps involved in the development of FMs are described in Figure 4.

FMs in plant breeding
Mainly FMs have been successfully applied for the breeding of agronomic traits, quality traits and disease resistance in crop plants. Many FMs are available for different agronomic traits that provide an opportunity for plant breeders to select rare recombinants without wasting time and resources in the screening of large numbers of plants [207]. In wheat, 30 genes have been cloned and more than 97 FMs have been developed for various traits of interest, like disease resistance genes and processing quality, and these FMs are successfully being used for wheat breeding [208].

Targeting induced local lesions in genome (TILLING)
Targeting induced local lesions in genome (TILLING) was first developed by McCallum in the late 1990s while working on characterizing the function of two genes in Arabidopsis plants. It is a non-transgenic technique in reverse genetics and is satisfactorily applicable in most plants [211]. The first important step involved in TILLING is the development of a mutated population using a standard mutagen like ethyl methanesulfonate (EMS) [212]. Then identification of mutations in the targeted sequence is achieved by using various methods like high-performance liquid chromatography, mass spectrometry, array-based technologies and enzymatic mismatch cleavage [213]. After this, some bioinformatics tools like PARSESNP (Project Aligned Related Sequences and Evaluate SNPs) are applied for the analysis of these mutants. This technique is applicable for any species and is not affected by genome size and ploidy levels. A greater rate of point mutations can be achieved through this technique. High-throughput TILLING is time saving and provides precise identification of new alleles at a less cost [212]. The key steps involved in TILLING are described in Figure 5.

Virus-induced gene silencing (VIGS)
Virus-induced gene silencing (VIGS) is a viral vector methodology that exploits an RNA-mediated defence mechanism. When a virus infects a plant cell, it also activates the RNA-based defence system against this virus. This infection leads to viral RNA replication which results in the production of a dsRNA replication intermediate. This dsRNA replication intermediate results in the production of siRNA in the infected cell. After this, the siRNAs base pairs guide the RNase complex in such a way that it specifically targets the single-stranded (ss) target RNA which is alike to the dsRNAs [214]. VIGS is a virus vector technique which utilizes this defence system. Replication in the dsRNA intermediate would be processed in a way that siRNA present in the damaged cells would correspond to parts of the viral vector genome and also including any non-viral insert. Thus, when insertion is made in the host cell, the RNase complex is targeted by siRNA to the corresponding host mRNA and symptoms reveal the loss of function of the encoded protein in the infected plant [215]. In recent years, VIGS has been applied successfully in plant reverse genomics. It is a very simple, cost-effective and highthroughput method. Mainly, it is used in the identification of function loss of a gene of interest [214,216]. The general methodology of VIGS is shown in Figure 6.

Recent advancements in multiplexed functional/ linked markers
With the passage of time, advancements are coming consistently in markers technology. SNP markers have become the marker of choice after the development of NGS and have been applied in the genotyping of various crops. A large number of linked markers have been converted into FMs and successfully used in MAS programmes in different crops. However, most of these markers are present in uniplex form. In order to achieve more effective and precise results from MAS, uniplex assays are shifting to multiplex systems [146]. KASP (Kompetitive Allele Specific PCR) is a recent multiplexed technique used to convert uniplex to multiplex systems by combing several markers in a single assay. KASP TM was first developed by KBioscience, or LGC Genomics [217], in order to achieve in-house genotyping and was finally developed into a worldwide leading genotyping technology. It is a homogenous technology and its genotyping is based on fluorescence. This technique depends on allele-specific oligo extension and for the generation of signals, fluorescence resonance energy transfer is used. Plates with 96, 384 and 1536 wells can be used for genotyping by KASP. The success in assay designing in KASP is 98%-100% with 93%-94% successful conversions into a working assay. KASP is time saving and with low cost as compared to the GoldenGate® assay [218]. The KASP assay has been applied successfully mainly in wheat, maize, rice and in a few other crops. The KASP assay has been successfully applied with NGS to develop multiplexed trait lined markers in wheat [219]. Recently, 70 KASP assays have been developed and successfully validated in wheat and are significantly associated with various traits of interest in wheat crops [219].

Genomic selection (GS): a step forward from MAS
Genomic selection (GS) is an advanced form of markerassisted selection and was first developed by Meuwissen et al. [220]. It is a technique that has the ability to predict the genetic values of selected candidates depending on the genome-estimated breeding values (GEBVs) predicted from high density of markers that are distributed throughout the genome. GEBV is a prediction model that combines the phenotypic data with marker and pedigree data in order to increase the accuracy of prediction. As compared to MAS, GEBV is dependent on all markers including major and minor marker effects [221]. In this technique, genetic markers having the ability to cover the whole genome are selected and utilized in a way that all QTLs are in LD with at least a single marker [222]. Genomic selection of complex traits and highthroughput phenotyping have brought a revolution in breeding by enhancing the accuracy level of selection [223]. Important steps in GS are: (1) Development of a training population using diverse germplasm; (2) Phenotyping and genotyping of the training population; (3) Selection of individuals having superior GEBVs on the basis of their genotypic data; (4) Progeny of the genotypes which are used as study material in the testing population are taken as input for the GS model and give GEBVs; (5) Individuals with maximum GEBVs are again selected; (6) Selected individuals are used as parents of the next offspring for continuous selection and breeding [220,224]. The general methodology of GS is described in Figure 7.

High-throughput phenotyping
With the rapid increase in the world population, the demand for food is also increasing and there is a need to develop high-yielding varieties with more resistance to biotic and abiotic stress. There is a need to precisely correlate genotype with phenotype [225]. For precise phenotyping, high-throughput phenotyping platform (HTPP) was introduced. HTPP is successful in the precise acquiring of comprehensive measurement of plant attributes which provide accurate information about the traits of interest [226]. Advanced cameras, sensors, robotics and computers are used to collect precise data [227]. Similarly, development is also coming in the field  of HTPP, making it possible to obtain precise data for various complex traits [228].

Genomic selection and genome editing together: new way in crop improvement
With the advancements in the field of genetic engineering, many techniques have been evolved to modify a single locus of a target organism. This dream comes true with the development of CRISPR (clustered regularly interspaced short palindromic repeat), a gene-editing technology. Genome editing has revolutionized plant breeding and has been applied successfully in different economically important crops. This technique facilitates the direct improvement of less favourable alleles into more favourable alleles. For the production of improved crop varieties, it is necessary to utilize genome selection and genome editing collectively. Genome editing shortens the time when backcrossing is done between elite varieties and exotic germplasm. This exotic germplasm serves as the encyclopaedia for the ancient alleles that are referenced for the development of modern varieties having resistance against biotic and abiotic stress. For the recombination of alleles that are already adapted, GS is then applied [229].

Genome editing (CRISPR)
CRISPR is a genome-editing technique applied successfully in various plants [230]. Cas9 is a recent advancement in the genome-editing technology and is becoming the technique of choice due to its many advantages, like its being easy to use, genome-editing versatility and ability to cleave methylated loci [231,232]. CRISPR RNAs and Cas protein are the two most important parts in the CRISPR technique. CRISPR RNA (crRNA) and trans-encoded CRISPR RNA (tracrRNA) are two short RNAs that can cleave a particular target site with the help of Cas9 endonuclease (the most explored Cas protein). sgRNA, known as single guide RNA, results when crRNA and tracrRNA are fused artificially [233]. When sgRNA is combined with Cas protein, this leads to the formation of RNA-guided endonuclease that mediates the cleavage at a particular sequence in the genome [234]. On the basis of this Cas protein, the CRISPR-Cas system is grouped into three types; I, II and III. Cas1 and Cas2 are two different proteins which are commonly present in all three types. Type I is present in both archaea and bacteria, while type II is only present in bacteria; however, type III is most commonly present in archaea but also in some bacteria [235]. Genome editing has been performed successfully in model plants like Nicotiana tabacum [236], Arabidopsis [237] and some economically important crops like maize [238] and wheat [239].

Mechanism
Acquisition, expression and interference are the three steps which are used by the CRISPR-Cas system to identify and target the pathogen genetic material. Identification and consolidation of foreign DNA is performed within the CASPR locus as a spacer during acquisition. During the acquisition of DNA fragments, a Protospacer having a short stretch (2-5) of conserved nucleotides (PAMs) is used as the identification motif. The AT (adenine-thymine) leader side of the CRISPR array, a 30-bp single copy of spacer is inserted and duplicated [240]. During the expression step, a long pre-crRNA is transcribed from the CRISPR locus, while tracrRNA and Cas proteins (Cas1, Cas2, Cas9 and Cas4/Casn2) are applied for its processing into crRNAs [241]. The Cas protein complex is guided towards the particular target area of foreign DNA by crRNA for cleavage during the interference step, thus facilitating the immunity against the attack of pathogens [240,242].

Reasons for the underutilization of molecular markers in crop plants
DNA markers were developed in the 1980s and after the development of the first PCR-based markers in the 1990s, a large number of markers have been developed and have been applied for various aspects [243][244][245][246]. However, wise utilization of these markers has begun over the previous few years [198]. After the development and advancement in the marker technology, huge numbers of research papers are being published annually. However, a large proportion of these papers fail to exert their effect on practical level breeding [243]. Similarly, QTL mapping results in the generation of large numbers of publications providing information about newly identified QTLs. These QTLs have been developed during research programmes and there is a need to apply these linked markers after careful validation and to develop functional diagnostic markers that could lead to successful breeding programmes benefiting the farmer fields [198].

Conclusions
The last 30 years have witnessed a continuous development in the molecular markers technology from RFLP to SNPs and a diversity of array-technology-based markers. Advancements in the sequencing technologies have led to the development of NGS platforms that are low cost with high throughput. In spite of the presence of these highly advanced molecular genetic techniques, we are still not achieving our goals. The main reason behind this lies in inaccurate phenotyping. High-throughput phenotyping techniques solve these problems by using light, cameras, sensors, computers and highly modified devices for the collection of very precise phenotypic data, which is a core requirement to achieving our breeding goals successfully. CRISPR technology has revolutionized the plant breeding and genetics and researchers are focusing on editing the genomes of all economically important plants. The coming years are likely to see continued innovations in molecular marker technology to make it more precise, productive and cost effective in order to investigate the underlying biology of various traits of interest.