Estimation of polymorphisms in the drug-metabolizing enzyme, cytochrome P450 2C19 gene in six major ethnicities of Pakistan

ABSTRACT Interindividual differences in cytochrome P450 (CYP) 2C19 activity may result in variations in the therapeutic response to drugs metabolized by this enzyme. Differences at gene level may translate into protein level with consequent impairment of the enzyme activity. As a result patients with such genetic differences might experience undesirable effects or no effect at all. The aim of the present study was to find out the prevalence of allelic and genotype frequencies of low activity variants of CYP2C19 genes in healthy individuals from six distinct ethnicities of Pakistan. Blood sample was taken from healthy volunteers following informed consent. Isolation of the DNA was followed by the PCR amplification and restriction fragment length polymorphism. Selected samples were sequenced by Sanger sequencing. The frequency of major alleles was 84.93% for CYP2C19*2 and 91.85% for CYP2C19*3, while minor allele was present at 15.06% for CYP2C19*2 and 8.14% for CYP2C19*3. For CYP2C19*2, the frequency of *1*1 genotype was 75.80%, *1*2 was 18.27%, and *2*2 was 5.92% whereas for CYP2C19*3, The frequency of *1*1 genotype was 84.19%, *1*3 was 15.30%, and *3*3 was 0.49% in the Pakistani population. A substantial variation in genotype and allelic frequencies was observed in various ethnicities. Our study demonstrates that a significant Pakistani population has at least one minor allele, which indicates a large number of patients potentially being affected by these variations. Especially, a significant genotype frequency of PM suggests implication for the treatment response and severity/frequency of adverse effects in patients receiving drugs metabolized by CYP2C19.


Introduction
Cytochrome P450 (CYP) is a superfamily of phase 1 enzymes that metabolizes the majority of the clinically available drugs in addition to the synthesis and metabolism of many endogenous substances [1]. This superfamily comprises about 57 enzymes, but only about a dozen carry out the majority of biotransformation reactions and responsible for the metabolism of an overwhelming majority of drugs in the market [2]. Since these enzymes metabolize several different therapeutic agents, polymorphisms in their genes result in variability in their enzymatic activities, which subsequently leads to interindividual variation in the efficacy and adverse effects of the drug [3]. On the other hand, CYP enzymes are also bioengineered to enhance their function such as enhancing steroid hydroxylation efficiency by CYP106A2 [4].
CYP2C19 is an important member of the CYP450 superfamily ( Figure 1). About one-tenth of all clinically used drugs are metabolized by this enzyme, which includes important classes of medications such as antiepileptics, antidepressants, proton pump inhibitors, antipsychotics, and antiplatelet drugs [1]. Interindividual variation in this gene is likely to yield nonresponder phenomenon or toxic effects in such individuals. The population can be divided into extensive metabolizers (EM) and poor metabolizers (PM) based on whether the enzyme activity is normal or reduced, respectively. Clopidogrel, an antiplatelet drug, is converted to its active metabolite by CYP2C19, and hence PM does not respond to this drug. Therefore, the US FDA has issued a black-label warning for clopidogrel, mentioning its diminished effectiveness in PM [5]. Another prodrug, proguanil, is metabolized to cycloguanil through CYP2C19, but PMs were found to lack cycloguanil, and these patients were at decreased cycloguanil protection and increased risk of Plasmodium infection [6,7].
CYP2C19 enzyme activity is shown to be reduced by more than a dozen alleles, but seven of them produce inactive enzymes. However, two alleles result in the most cases of low activity phenotypes, CYP2C19*2 and CYP2C19*3. CYP2C19*2 is the more common of the two alleles, which leads to the formation of an inactive spliced variant as a result of a G-to-A substitution in exon 5 [8]. CYP2C19*3 is found in exon 4, produces a stop codon with a G-to-A substitution, causing premature termination of transcription and a nonfunctional protein [9]. Both of these alleles provide significant alterations in drug response and adverse effects in PMs. About 50-90% of PM phenotypes can be explained based on these two alleles [10]. The frequencies of these two alleles are higher in the Asian population at 15% compared to Caucasian and African people at 2-5% (Scott et al., 2012). CYP2C19*3 is rarely present in the Caucasian population and is, therefore, considered to be exclusive to the Asian population [3].
Prior knowledge of the frequencies of mutant alleles found in a population may help revise the prescribing practices of the clinicians, including dose adjustments and alternate medications. However, no such data of CYP2C19 genetic variants are available for the Pakistani population. As mentioned previously, genetic variation in the CYP2C19 gene affect drugs safety and efficacy. Our hypothesis is that if there is enough generic variation in the CYP2C19 gene in the context of Pakistani population to warrant further genotypephenotype kinds of clinical studies. If this variation is low in Pakistani population, then there is not much incentive to carry out further clinical studies. But if the genetic variants that affect drugs safety and efficacy are present in high frequency in Pakistani population, then further studies would be useful as their outcome would affect a sizable fraction of Pakistani population. Therefore, in the current study, our aim was to find out the prevalence of CYP2C19*2 and *3 gene variants in the Pakistani population. We specifically investigated the samples drawn from the six major ethnic groups residing in Pakistan to determine the prevalence of wildtype CYP2C19*1, and CYP2C19*2, and CYP2C19*3 gene variants. One of our goals was to compare the frequencies of the variant alleles observed in the current study, with the global and regional populations.

Ethical compliance
This study was conducted at the College of Pharmaceutical Sciences, Shifa Tameer-e-Millat University Islamabad, Pakistan. The study was approved through an approval letter by the institutional review board (IRB#699-147-2016). All participating volunteers were briefed about the study and written informed consent obtained prior to commencing the study. The study was carried out as per the World Medical Association Declaration of Helsinki guidelines.

Sampling & DNA extraction
Samples for this study were collected from various cities of Pakistan. A total of 405 samples were obtained from healthy volunteers belonging to Pathan, Sindhi, Punjabi, Balochi, Urdu-speaking, and Seraiki ethnic groups residing in Pakistan. Ethnicity was self-reprted by the participants. EDTA containing vaccutainers were used to collect approximately 5 ml blood from the each volunteer. Gene Jet Genomic DNA extraction kit was used to extract DNA form the blood. The quality and quantity of the isolated DNA was determined through spectrophotometer and by running it on the agarose gel.

Genotyping
CYP2C19*2 genotyping was done through PCR followed by restriction fragment length polymorphism. An oligonucleotide pair was used which when bound to their complementary sequences on the genomic DNA, would encompass the CYP2C19*2 site. PCR cycles consists of initial denaturation of 10 minutes at 95°C, followed by 35 cycles of denaturation, anealing, and extension, which was followed by a 7-minutes extension at 72°C. The final PCR product was visualized by running it on a 2% agarose gel. After restriction fragment length polymorphism, the three CYP2C19*2 genotypes were identified by running the digested DNA on the 2% agarose gel again.
Similarly, CYP2C19*3 genotyping was performed by using a separate set of oligonucleotides spanning intron 3 and 4 of the CYP2C19 gene containing CYP2C19*3 allele site. Thermal conditions of this PCR were similar to the ones used for CYP2C19*2 genotyping. Restriction fragment length polymorphism yeided digested products which were run on the 2% agarose gel to determine the CYP2C19*2 genotyping in each sample. More than 40% of the samples were sequenced to confirm the genotyping results.

Statistical analysis
Data obtained from restriction fragment length polymorphism and sequencing were used to determine alleleic and genotype frequencies of the CYP2C19*2 and CYP2C19*3. These allelic and genotype values are provided along with 95% confidence interval values which were calculated as per the following formula, formula (CI = p± (1.96 x SE), SE = qrt [p(1 -p)/n], p = proportion, n = sample size). Difference between the observed and expected frequencies was calculated by Chisquared test which was also employed to calculate the deviation as per the Hardy-Weinberg equation.

Results
Our study is the first one to investigate the genetic polymorphisms in CYP2C19 gene in the healthy individuals of the six distinct ethnic populations of Pakistan-the world 5 th most populous country. Our results show that the minor allele was present at a frequency of 15.06% for CYP2C19*2 and 8.14% for CYP2C19*3. Our study shows that  a significant fraction of the Pakistani population, and its certain ethnic segments possess high frequency of reduced function polymorphisms. These findings have safety and efficacy implications for our population and encourage further genotypephenotype studies in hospital and clinical settings involving individual drugs/diseases.

Allelic and genotype frequencies of CYPC19*2
The prevalence of the CYPC19*2 allele in the various ethnicities of Pakistani population are depicted in Table 1 and Figure 2. The minor allele was found at a frequency of 15.06% in the Pakistani population ( Table 1). The Punjabi population exhibited the highest prevalence of the CYPC19*2 followed by Balochi, Urdu, and Sindhi populations. The lowest prevalence of the CYPC19*2 was found in the Pathan population.

Allelic and genotype frequencies of CYPC19*3
Frequencies of the CYPC19*3 allele in the Pakistani population are shown in Table 2. This variant was found at a lower frequency than the CYPC19*2 (8.14% vs. 15.06%). The Punjabi population displayed a higher prevalence of this polymorphism compared to other ethnicities, although the highest frequency of the allele was found in Balochi ethnicity (15%). In the Pakistani population, the prevalence of *1*1 genotype was 84.19%, *1*3 was 15.30%, and *3*3 was 0.49%. The highest frequency of *1*1 genotype was found in the Sindhi and Seraiki population at 96% (Supplementary Table 2 and Figure 4). The highest frequency of heterozygous genotype (*1*3) was found in the Balochi ethnic group at 30%. Homozygous genotype (*3*3) was observed only in the Seraiki population, which was the only population that did not display any heterozygous genotype. No other ethnic groups displayed a *3*3 genotype in the Pakistani population ( Figure 3). Tables 3 and 4 show the comparisons of alleleic frequencies of CYP2C19*2 and *3 respectively, in    Table 3). CYP2C19*3 is not found in European and South American populations but the alleleic frequencies foound in the Hans Chinese were similar to Pakistani freequencies but various South Asian populations displayed significantly different prevalence of this allele (Table 4).

Discussion
Pakistan, as stated by the statistics bureau, ranks 6 th in the most populated countries of the world [12]. Out of the total 94% of Pakistan's population, the Punjabi population takes up 38.78% making it the largest ethnic group in this country, followed by the Pashtun, Sindhi, Siraiki, Urdu-speaking, and Balochi populations at 18.24%, 14.57%, 10.53%, 7.57%, and 3.57%, respectively [13].
Despite being home to a large and diverse society, no significant amount of pharmacogenetics research has been carried out in Pakistani population. This study aims to partially address the above-stated issue by outlining the frequencies in the two most significant genetic variants in the CYP2C19 gene. The current study finds the prevalence of the CYP2C19*2 allele at about 15% in the healthy Pakistani particpants which is in a similar range as found in Denmark, Norway, Sweden, Germany, Portugal, Australia, Saudi Arabia, Tanzania, Ethiopia, and Zimbabwe [14][15][16][17][18][19][20][21][22][23]. Egypt and Canada previuosly showed the lowest prevalence of the minor allele at 11% [3,24]. Compared to the Pakistani population, CYP2C19*2 allele was more prevalent in Korean, Chinese Taiwanese, Japanese, North Indian, Venda, African American, and Iranian populations [16][17][18]25,26] while the Filipino population has the highest prevalence of CYP2C19*2 allele [17]. Among various ethnicities, Punjabi samples revealed the highest frequency of the CYP2C19*2 minor allele follwed by the Balochi ethnicity while the Pathan population showed the lowest prevalence of CYP2C19*2 allele. The lowest prevalence of wildtype genotype was found in the Punjabi ethnicity an dth ehighest in the Pathan population ( Figure 3).
The prevalence of the CYP2C19*3 allele in the Pakistani population was about 8.14% and was similar to the one found in the Korean and Japanese populations [17,18]. The Egyptian population exhibited the lowest prevalence of the minor allele [3], while the highest frequency is reported from the Korean population at 11.6%. However, no CYP2C19*3 allele is reported from Canada, Poland, Denmark, Germany, Portugal, Australia, Saudi Arabia, Zimbabwe, North Indian, Venda, and African American. The Baloch ethnic group revealed the highest frequency of the minor allele whereas the lowest prevalence was reported from the Urdu speaking ethnicity. The Sindhi and Seraiki ethnicities displayed the highest wild type genotype whereas the Baloch ethnicity exhibited the lowest wild type genotype. The highest heterozygous genotype is also reported from the Baloch population.
The Pakistani population is a combination of various populations due to the several invasions in the first and second mellenia. Partly owing to the recent British domination of the sub-continet in the 18th to 20th century and the Arab influence from the 8 th century, the Pakistani population shows genetic influences from several different populations. The large population size also lends to greater genetic diversity. Several regional populations such as Asian, Middle Eastern and Caucasions blend in the Pakistani population [27]. The genetic structure of various Pakistani ethnic groups has ben elucidated by several internationl human genome consortia through the identification and analysis of multiple gene variants [11,28]. Some studies suggest that the genetic make up of the Pakistani population is akin to both Caucasion as well as South Indian populations [29] while others indicate that these ethnicities are similar to European populations [30,31]. It has been suggested that the contribution of several ethnicities such as Aryan, Arab, Persian, Turkish, Kurdish, Dravidian in the ancestry of the Baloch population is the reason for its extreme divergence from other regional ethnicities seen in the present study [32]. A recent study reported frequencies of CYP2C19*2 polymorphism in various ethnic Pakistani populations and found that CYP2C19*2 is prevalent at 29% [33]. However, their investigation did not include two of the most populous ethnicities, namely Balochis and Urdu speaking. They also did not report the frequencies of CYP2C19*3 polymorphism. In contrast to their study, we confirmed our results by direct Sanger sequencing more than a quarter of our samples. Another recent study reported frequencies of CYP2C19*2 polymorphism in the Pakistani population, comprising 155 healthy individuals from one city of Pakistan. However, about 60% of their study participants belonged to the Urduspeaking population, while the next highest ethnicity was Sindhi at 6.5%. The frequency of CYP2C19*2 was reported at 31% [34].
Rehman and colleagues reported CYP2C19 polymorphisms in 100 cardiac patients taking clopidogrel [35]. They reported a very high frequency of CYP2C19*2 polymorphism, but this may be due to possible association of this polymorphisms with cardiac diseases. They also did not report any CYP2C19*3 data. Another investigation reported frequencies of CYP2C19*2 polymorphism in breast cancer patients on triple therapy, including cyclophosphamide, fluorouracil, and doxorubicin. They reported a 27.2% frequency for CYP2C19*2 in 67 breast cancer patients [36]. Another study which was conducted on 527 hypertensive patients and 530 unrelated healthy controls reported a frequency of 75 and 25% of wild type and mutant allele in hypertensive patients, and 64.2 and 35.8% respectively in controls [33]. The frequency of CYP2C19*2 was in a similar range, as was found in our study. The observed difference may be due to the difference in the participants recruited for each study.

Limitations of the study
Limitations of our study include a relatively low sample size. Sample size could have been increased to get more reliable results. One limitation of our study is that the two SNPs (CYP2C19*2, and *3) were considered independently. Therefore, it is theoreticaly possible that in a heterozygous situation (e. g *1*2), *1 may be *3 instead of *2. This possibility could not be tested in our study. There were no phenotypes (e.g. drug responders, drug concentration sin the blood, etc.) tested in our study. A genotype-phenotype association analysis would have been useful. However, we plan to conduct such studies in the future as we get more resources.

Conclusions
Based on the information that we have gathered, this study will be the first to report the frequency of CYP2C19*2 and *3 gene polymorphisms amongst the healthy population of several distinct ethnicities of Pakistan. It is known that the CYP2C19 enzyme metabolizes approximately 10% of drugs available in the market [2]. Since the unit doses of the drugs dispensed eah year in the Pakistan is approximately 2.6 billion [37], the CYP2C19 enzyme is expected to metabolize more than 260 million unit doses. Current study demonstrates that a significant Pakistani population has one minor allele (about 30%), which indicates a large number of patients (78 million) potentially being affected by these variations. To further clarify the genotype-phenotype correlation, we recommend further research to be carried out on individual drugs metabolized by the CYP2C19 enzyme. Once a physician knows a patient's metabolizing status, this information may, presumably, lead to more safe and effective prescribing and better performance of drugs.