High resolution melting curve analysis revealed SNPs in major cannabinoid genes associated with drug and non-drug types of cannabis

ABSTRACT Cannabis sativa L. has a long history of cultivation as food, fibre, medicine and recreational drugs. Production of high tetrahydrocannabinol (THC) plants for narcotic use (drug type) is illegal and under control in most countries. In contrast, cultivation of low THC plants (fibre type, also known as ‘industrial hemp’) is promoted in many countries. The determination of C. sativa L. chemotypes is based on the major cannabinoids content, THC, cannabidiol (CBD) and cannabinol (CBN). The THC:CBD content ratio is a candidate marker for differentiation of the fibre and drug type of cannabis. The ability to accurately characterize the cannabinoid chemical phenotype (chemotype) is crucial for the development of specific C. sativa cultivars for pharmacological, hemp fibre or seed end-use. High resolution melting (HRM) curve analysis is used as a rapid and effective mechanism for detection of single-nucleotide polymorphisms in plants. In this report, we developed a HRM protocol for differentiation of drug and non-drug cannabis plants. According to the results, HRM analysis based on single-nucleotide polymorphisms in THCAS gene is an accurate method to differentiate the drug type of cannabis which could be used for the control of legal and illegal cannabis cultivation.


Introduction
Cannabis (Cannabis sativa L.) has a long history of cultivation as a food, hemp fibre, seed and seed oil; in addition, the dried flowering tops and leaves are used for medicinal purposes and as a drug [1,2]. Cannabinoids are terpenophenolic secondary metabolites produced by all C. sativa plants in stalked trichomes [3,4]. The biosynthetic pathways of different cannabinoids have been clarified in previous studies [5,6]. According to these studies, specific synthase enzymes catalyse the synthesis of different cannabinoids. Cannabigerolic acid (CBGA), as the common precursor of all the main cannabinoids, is converted through synthase enzymes to delta-9-tetrahydrocannabinolic acid (THCA), cannabidiolic acid (CBDA) and cannabichromenic acid (CBCA) [4,7].
A number of challenging attempts have been made to identify molecular genetic markers to distinguish 'drug' and 'fibre' cannabis plants [2,9,[11][12][13]. Recently, single-nucleotide polymorphism (SNP) markers have been generated based on the whole genome sequencing of C. sativa, which demonstrate different advantages, namely, being mostly bi-allelic, abundant throughout the genome, relatively stable during evolution and having a low mutation rate. Moreover, SNPs can be easily distinguished based on nucleotide presence, but not size. These characteristics, in combination with their high availability, make SNPs the most popular marker system in genetic analyses [14]. Towards type differentiation in cannabis, however, we are mostly looking for a robust, cheap, accurate and validated approach which can benefit from SNPs as well.
High resolution melting (HRM) is a powerful, fast, specific and useful technique for genotyping of SNPs in a large number of samples based on the generation of different melting curve profiles due to the presence of sequence variation in the double-stranded DNA. HRM screening relies on distinguishing even one SNP of any four types of single-nucleotide changes based on different melting temperature shifts and using a fluorescent dye intercalated in double-stranded DNA [15,16]. Besides different advantages of HRM, gene scanning and targeted genotyping are two major applications of this CONTACT Seyed Alireza Salami asalami@ut.ac.ir approach in molecular biology studies [17]. HRM analysis is being increasingly used for gene scanning because it is simple, cost effective, sensitive and specific [18][19][20]. The aim of this study was to develop a SNP assay to discriminate drug and non-drug samples of cannabis based on sequence variation in the THCAS and CBDAS genes.

Materials and methods
Eighty-seven cannabis accessions were used in the HRM assay. Seeds were provided by IPK (

Primer design and HRM analysis
Primers were designed to target polymorphisms within the THCA and CBDA synthase genes with the aim of creating an assay able to differentiate between active and inactive forms of genes. All THCA synthase sequences available from GenBank of the National Center for Biotechnology Information were aligned using Clustal X 2.0.3 [21]. The aligned THCA synthase sequences were divided into two groups: those from drug cannabis material and those from low tetrahydrocannabinol (THC) hemp material. The primers were designed to amplify the THCAS gene with a length of 214 bp (F: 5 0 -GGAAGAA-GACGGCTTTCTCA-3 0 , R: 5 0 -GCAGTGTACCAAAGTTCATA-CAT-3 0 ) from position 1127 to 1341 bp. Similarly, a primer pair was designed to amplify the CBDAS gene with a length of 222 bp (F: 5 0 -GGCAGAACGGTGCTTT-CAAG-3 0 , R: 5 0 -CCCAACTACATATGTACCATAAC-3 0 ) from position 1124 to 1346 bp ( Figure 1). The melting temperature of the primers was adjusted to 60 C.
A Real-time PCR assay was carried out using 5x HOT FIREPol®EvaGreen® HRM Mix (Solis BioDyne Co., Estonia) on a Light Cycler 96 (Roche Co., Germany). q-PCR was performed in a 20 mL reactions containing 4 mL of HRM mix, 30 ng of genomic DNA and 0.2 mmol/L of each primer. A negative control was also included in each reaction. The PCR conditions were as follows: 15 min at 95 C as initial activation, followed by 40 cycles of 95 C/ 15 s, 60 C/20 s and 72 C/20 s. The melting curve was obtained in continuous cycles, performed as follows: 95 C/60 s, 40 C/60 s, 65 C/1 s and continuously 95 C/1 s, with an increment of 0.3 C/s. The data were analysed using the Light Cycler® 96 SW 1.1 software.

Sequence analysis
The HRM-PCR products of different groups were sequenced using Big Dye terminator on an ABI sequencer and were edited using the Geneious R7.1.4 software.

Results and discussion
The THCAS and CBDAS major genes in the cannabinoids pathway were studied by HRM analysis. Since the sequence of the CBDAS gene is very similar to the one of the THCAS genes (87.9% homology) [2], we designed primers 'a' and 'b' to eliminate producing the CBDA synthase gene fragment and 'c' and 'd' to eliminate producing the THCA synthase gene fragment ( Figure 1).
The obtained melting curves of THCAS clearly differentiated all 87 samples into two distinct groups I and II ( Figure 2). The first group was comprised of 85 accessions (accessions 1-13, 15-33 and 35-87) and the second group was comprised of only accessions 14 and 34, which were labelled as drug type and wild-type, respectively. Interestingly, a wild-type accession was categorized into the drug group, and four drug-labelled accessions number 37, 48, 55 and 56 were grouped with fibre accessions based on THCAS gene scanning (Figure 3). The cannabis accessions used in this study were initially characterized into drug and non-drug types based on their THC:CBD ratio, which we know is not the only crucial index to distinguish cannabis types because of many copy number variations of THCAS genes in the genome, high similarity between THCAS and CBDAS genes and also SNPs associated to active and inactive forms of THCAS. On the other hand, we found a few cannabis genotypes in Iran with anatomy exactly matching a commercial fibre cultivar, but with a high THC:CBD ratio (unpublished data). It is also possible that they might have been mislabelled. Sawler et al. [22] report that such outliers may be due to sample mix-up or their classification as hemp or marijuana may be incorrect. Alternatively, it may be speculated that these samples may be true outliers and represent exceptional strains that are genetically unlike others in their group [22].
As a next step in our study, THCAS gene nucleotide and predicted amino-acid sequences from two groups were aligned, respectively. The results showed that SNP c.1224 T > G is a transversion of a thymine (T) into guanine (G) (the active/drug form to inactive/non-drug form), corresponding to an aspartic acid residue and a glutamic acid residue, respectively.
The second identified SNP, c.1232 C > T, is a transition of a cytosine (C) into thymine (T) (the active/drug form to  inactive/non-drug form), corresponding to an alanine residue and a valine ,residue respectively. The third SNP, c.1272 G > T, is a transversion of a guanine (G) into thymine (T) (the active/drug form to inactive/non-drug form), corresponding to a glutamic acid residue and an aspartic acid residue, respectively.
Although all these substitutions are conservative, in some cases, substitutions may lead to changes in protein function. It is unclear as to why these two forms would function differently. One possibility might be due to post-translational modifications. Other undetermined factors may also influence the cannabinoid biosynthesis and, accordingly, the production of THC. The gaps within the C. sativa genome that remain to be investigated could potentially lead to more knowledge about cannabinoid biosynthesis.   Figure 5). The comparative analysis of the sequencing results indicated presence of four SNPs. There were a total of five combinations of nucleotide substitutions in the different HRM groups for CBDAS but this was not able to differentiate the drug type of accessions from the fibre type.
There are difficulties in distinguishing the drug type from the non-drug type of cannabis plants based on morphological and chemical analysis in the early vegetative stage [2]. On the other hand, differentiation between drug and non-drug type of cannabis plants is one of the most important prerequisites in cannabis breeding programmes. Although there are precise techniques such as gas chromatography-mass spectrometry, gas chromatography and high-performance liquid chromatography for THC and cannabidiol (CBD) quantitative analysis, these techniques are expensive and they are not available in all laboratories. Moreover, the quantification of THC and CBD in young cannabis plants might not be accurate due to the absence or low abundance of these chemicals [13]. Therefore, specific PCR markers  based on DNA polymorphisms in major cannabinoid genes can serve as a powerful tool to distinguish and study genetic variations in cannabis samples [2,13,23].
For example, the sequence-characterized amplified region (SCAR) marker B190/B200 was tested by de Meijer et al. [9] on individual plants of a F2 population and the S2 from the original cross parents. They examined pure THC, pure CBD or heterozygous CBD/THC. The THC phenotype was associated with the amplification of a band of approximately 190 bp, whereas pure CBD plants showed a band of »200 bp. In most cases, heterozygous plants showed both fragments, indicating a co-dominant nature of the marker.
Kojoma et al. [2] compared the sequences of the THCAS gene of drug and fibre genotypes, revealing two distinct forms of the THCAS gene capable to differentiate drug and fibre genotypes. These gene variants differ in 37 amino-acid substitutions.
Pacifico et al. [11] also developed a set of markers for the differentiation of the THCA and CBDA synthase genes and compared these with the SCAR marker from de Meijer et al. [9]. These markers were able to separate chemotypes I, II and III successfully.
Rotherham and Harbison [12] developed a SNP marker resulting from an A-to-G substitution (transition) at the 887 position of the THCAS gene for differentiation of 'drug' and 'fibre' cannabis plants. Thichak et al. [13] developed the same PCR marker, by using multiplex PCR for identification of drug type and fibre type of 100 hemp samples. They developed a multiplex PCR method for simultaneous amplification of two specific molecular markers from hemp samples for identification of hemp types with 93% relative accuracy.
In the present study, we used a robust and accurate HRM approach to distinguish between drug and nondrug type of cannabis based on SNPs variation. So far, the HRM technique has been used to genotype different plant species, such as Hordeum vulgare L. [24], Medicago sativa L. [25], Zea mays L. [26], Oryza sativa L. [27], Lupinus albus L. [28] and Brassica rapa L. [29]. HRM differentiated drug type accessions from others based on nucleotide sequence changes that corresponded to amino-acid differences between the active and inactive form of THCAS. As described by Kojoma et al. [2], the sequencing results of the THCAS gene in drug and fibre accessions revealed two distinct forms of the THCA synthase gene, one belonging to drug type accessions and the other one related to fibre type accessions. These divergent THCA synthase sequences represent alleles coding for an active and an inactive form of the THCAS.
The SNPs found in the CBDAS gene, however, did not provide any additional identification capability for fibre type cannabis. It is possible that there are active and inactive forms of the CBDA synthase gene sequence which the present SNP assay could not differentiate.

Conclusions
The present study confirmed that the THCAS gene allowed discrimination between drug type and fibre type of cannabis among 87 individual accessions in an SNP assay based on HRM analysis. These results demonstrated the power, sensitivity and specificity of this particular HRM assay, for identification of drug type genotypes based on a combination of a number of SNPs within the amplified THCAS gene fragment. Conversely, amplification of the CBDAS fragment was not appropriate enough in the HRM approach, since the number of SNPs found did not allow the discrimination of the accessions under study.