Prioritization of genetic variants predisposing to coronary heart disease in the Bulgarian population using centenarian exomes

Abstract Coronary heart disease (CHD) is a major cause of mortality and morbidity in Europe. CHD is usually caused by atherosclerosis. Despite extensive studies that have identified a large number of genetic variants, strong evidence of association with CHD are not easily replicable in different populations. Two DNA pools were constructed: one with 32 Bulgarian centenarians and one with 61 young healthy Bulgarian individuals. The pools were whole-exome sequenced and variants were annotated and quality filtered (89,810 filtered variants). Allele frequencies were estimated and Fisher’s exact test was used to evaluate the significance of allele frequency differences between the two pools. Two publicly available databases, Ensembl and DisGeNET, were used as a source of 2025 CHD-associated variants (CHD-AVs). These single nucleotide polymorphisms were screened in our data and 158 variants in 133 genes were found. ToppGene pathways analysis of genes called in both pools discovered participation of 37 genes in 9 significantly over-represented pathways. Eight variants in these 37 genes have significantly higher frequency in young individuals, and are nominated for CHD association. Variants called only in the young individuals pool (13 variants) are also nominated for association with CHD. Based on sufficiently unambiguous literature data, from the nominated variants, we prioritize five as associated with CHD in the studied Bulgarian sample. Centenarian genomes can be used to provide additional information regarding the clinical relevance of genetic variants reported as associated with CHD.

The aim of this study was to evaluate the carriership of genetic risk factors for predisposition to CHD in healthy Bulgarian subjects in order to prevent the development and complications of the disease.
The healthy individuals are from two different age groups, young and centenarians. The fact that centenarians have reached an extreme age indicates that their genome is devoid of pathological variants, including CHD-associated variants (CHD-AVs). For these reasons, their genomes could be considered a "gold standard" for healthy aging. It can be presumed that variants which are more frequent in the general population compared to centenarians may be associated with pathological phenotypes decreasing life expectancy. Multifactorial etiology of CHD points that known CHDassociated variants can have different effect in different populations, depending also on the environmental factors. Understanding the intra-population genetic differences between centenarians and healthy young subjects can indirectly shed light on the phenotypic effect of risk genetic variants in the given population and environmental background.
The CHD risk genetic variants in Bulgarians were determined by nomination and prioritization of CHD-AVs reported in publicly available databases and found in Bulgarian exome data. Variants were nominated for association with CHD based on their presence in genes involved in significantly over-represented pathways and statistically higher frequency among healthy individuals compared to centenarians; or based on their exclusive presence among healthy individuals. Nominated variants were then prioritized as associated with CHD based on sufficiently unambiguous literature data.

Ethics statement
The project was approved by the Ethics Committee of the Medical University of Sofia in accordance with the national and international legislation. Each DNA donor was informed about the aims of the project and signed written informed consent prior to sample collection.

Subjects
The study included a group of 32 centenarians from Bulgaria aged from 100 to 106 years (mean age 102.4) and an ethnically matched control group composed of 61 young healthy individuals from 21 to 25 years old (mean age 21.9).

Study design
Centenarians were interviewed about their lifestyle, diet, tobacco smoking, alcohol consumption, physical activity, social contacts, medical history and other important factorspositive mood, preserved memory, periods of stress, financial problems, other longevous family members.

DNA extraction and sequencing
DNA was extracted from buccal swab samples from the centenarians and blood samples from the healthy subjects. We constructed two pools: of Bulgarian centenarians and healthy young individuals with equimolar amounts of DNA from each participant. Pool sequencing was chosen as a cost-effective method for population-based studies providing reliable estimation of allele frequencies [32][33][34]. The pools were wholeexome sequenced using BGI v4 chemistry on a BGISEQ-500 platform (by BGI Genomics) at a mean 250x coverage which is required for the detection of low frequency alleles. The obtained VCF files were annotated using the web-based ANNOVAR platform [35]. The detected variants were subjected to robust filtering based on the number of individuals per pool > 30, genotype quality > 99, mapping quality > 60, number of reads per variant allele > 2, total depth of coverage > 30.

Data analysis
Ensembl genome browser [36] and DisGeNet [37] databases were searched for coronary heart disease associated variants. The functional significance of gene assemblages was tested using the web-based platform ToppGene [38]. Pathway-based analyses allow interpretation of genes with respect to the molecular function of the product and the biological processes in which their proteins are involved [39]. This approach is informative follow-up for whole exome sequencing (WES) studies, as it provides insight into the biological basis of a given phenotype. In order to minimize the false positive discoveries, the p-values were adjusted according to Benjamini and Hochberg [40], and according to Benjamini and Yekutieli [41].

Centenarians interview data
The collected lifestyle data from the interview show in general that Bulgarian centenarians are not vegetarians (98%), most of them consume salty (84%) and sweet foods (86%), animal fats (40%). They consume fish, but not regularly, and their diet does not differ from what is typical for Bulgarians. Almost all centenarians have normosthenic habitus. It is important to note that 76% of centenarians have long lived relatives, which reflects the heritability of longevity. Another feature of almost all centenarians (93%) is their positive thinking: they are optimistic, with preserved cognitive functions and memory. Only 7% of them are smokers, but do not smoke regularly, opposed to 30% in the general population according to the National Statistical Institute (NSI) of Bulgaria. Centenarians do not consume alcohol daily or at all (63% compared to 30% in the general population). They have regular (38% vs. 9% according to NSI) or moderate physical activity (farming activity, home activity, long walks) (62% vs. 40%).

CHD-AVs found in bulgarian exome data
After filtering of whole-exome sequencing data, the total number of annotated variants detected in the Bulgarian pools was n ¼ 89,810, of which in both pools n ¼ 72,791, only in controls n ¼ 8766 and only in centenarians n ¼ 8253. Ensembl and DisGeNET databases list 1479 and 760 variants, respectively, as associated with CHD, yielding 2025 unique variants. Of these, 158 variants (in 133 genes) were found in Bulgarian exome data: 139 (in 114 genes) in both pools; 13 (in 13 genes) in young healthy individuals only and 6 (in 6 genes) in centenarians only.

Nomination of CHD-AVs
ToppGene functional enrichment analysis and candidate gene prioritization was performed using the 114 genes found in both pools. This analysis revealed 37 genes involved in nine significantly overrepresented pathways: 1. Pathway of folate cycle/metabolism; 2. Lipid digestion, mobilization, and transport; 3. Metabolism of lipids and lipoproteins; 4. Metabolism of vitamins and cofactors; 5. Lipoprotein metabolism; 6. LDL-mediated lipid transport; 7. Methionine cycle/ metabolic; 8. Hemostasis; 9. Remethylation of homocysteine metabolism -cobalamin dependent (Table 1; Figure 1). These pathways could be grouped into 3 major categories: a) lipid and lipoprotein metabolism, b) hemostasis and c) vitamin and cofactor metabolism. Disturbances in the biochemical processes in these pathways could lead to the development of CHDs.
The 37 genes from the overrepresented ToppGene pathways contained 47 variants in our samples ( Table  2). Eight of these variants in 7 genes (rs5443 in GNB3, rs1129293 in PIK3CG, rs693 in APOB, rs20455 in KIF6, rs1801131 and rs1801133 in MTHFR, rs174546 in Table 1. Over-represented nine ToppGene pathways involving 37 genes out of 133 genes carrying CHD-AV in Bulgarians. FDR, false discovery rate; B&H, according to Benjamini and Hochberg [40]; B&Y, according to Benjamini and Yekutieli [41]; genes carrying variants with statistically higher frequencies in the control pool compared to the centenarian pool are given in bold.

FADS1
, and rs2000813 in LIPG) showed statistically higher frequencies in the control pool compared to the centenarian pool. The thirteen variants found only in young individuals can be speculated to have a possible negative impact on longevity. Toppgene analysis of their genes found no significantly overrepresented pathways.
The workflow of CHD-AV nomination and prioritization is summarized in Figure 2.

Prioritization of variants called in both pools
Based on the significantly higher allele frequency in the young individuals, 8 variants in the genes from these pathways were nominated for CHD association: rs5443, rs1129293, rs693, rs20455, rs1801131, rs174546, rs1801133 and rs2000813. After a literature review, we prioritize 5 variants that have well established association with the pathogenesis of CHD.
The prioritization of rs5443 in the GNB3 gene, encoding G protein beta-3 subunit, is supported by literature data for its influence in the lipid metabolism and its role as an independent risk factor for hard coronary events [42,43].
Another nominated variant is rs693 in the APOB (apolipoprotein B) gene, which is one of the most commonly studied genes for familial hypercholesterolemia. This variant shows significant association with increased levels of circulating atherogenic lipoproteins, thus conferring increased risk for CHD [44,45].
Among the nominated variants there are two in the MTHFR (methylenetetrahydrofolate reductase) gene: rs1801131 and rs1801133. The MTHFR enzyme is a major participant in the folate metabolism that interlinks with various processes of DNA synthesis, repair and methylation [46]. These variants are frequently found in linkage; they both lead to reduced enzyme activity of the product. The rs1801133 variant seems to have significantly greater influence, but the linkage hinders the discrimination of their individual effects [47]. The variant rs1801133 has been identified as a risk factor for CHD, possibly and partly through dyslipidemic mechanisms [48]. Taking the above stated arguments into account, we prioritize both rs1801131 and rs1801133 for association with CHD.
The rs174546 variant is a prominent single nucleotide polymorphism (SNP) in the FADS1 gene encoding fatty acid desaturase 1, which catalyzes a rate-limiting step in the metabolism of dietary n-3 and n-6 polyunsaturated fatty acids (PUFA) [49]. The variant is associated with altered FADS1 expression leading to significant changes in the serum fatty acid ratios and increased plasma triacylglycerol levels [50,51]. Based on the literature data, we accept this variant as associated with CHD.
Despite the significantly higher allele frequency in the young individuals, the remaining 3 nominated variants (rs1129293, rs20455 and rs2000813) were not prioritized for CHD predisposition. The rs1129293 in the PIK3CG (phosphoinositide-3-kinase, catalytic, gamma) gene is reported to be associated with poor responsiveness to clopidogrel and decreased plasma HDL-cholesterol without metabolic or inflammatory modifications [52]. It thus seems to be more relevant to drug response than to the risk of CHD. The rs20455 variant in KIF6 (Kinesin Family Member 6) is a potential pharmacogenetic variant, useful for predicting the response to statin therapy in individuals with dyslipidemia [53]. Though some studies have identified an association with CHD, it does not seem to be associated with the pathogenesis per se, as pooled and meta-analyses fail to replicate the positive finding [54,55]. The rs2000813 in LIPG (lipase G) gene is associated with normal/moderately decreased enzyme activity leading to normal/increased HDL-C levels which are associated with protective effect against atherosclerotic cardiovascular disease [56][57][58].

Prioritization of variants called in the young healthy individuals Pool only
None of the 13 variants from Ensemble/DisGeNET called in the young healthy individuals pool only was prioritized to be associated with CHD in Bulgarians.
Literature review of these 13 variants gives contradictory results about their functional significance. One study has found significant association of the variant rs2230806 in the ABCA1 gene with susceptibility to CHD [59], whereas other studies establish this variant to have a protective role against CHD risk [60,61]. There are no supportive literature data of the role of rs2271570 in the C4orf33 gene as a CHD risk factor. The ROS1 gene is a proto-oncogene receptor tyrosine kinase, and the T allele of rs529038 was not found in centenarians, probably due to its oncogenic-related negative effect on longevity. Data from a meta-analysis did not identify an association of rs1801157 in the CXCL12 gene with CHD [62]. For the rs10509681 variant in the CYP2C8 gene, there is no sufficient data for association with CHD. BDNF encodes a protein that plays a role in the regulation of synaptic plasticity and neuronal connections, and the variant rs6265 is reported as benign. There are insufficient studies on the association of rs14259 in PSMD9, rs1805015 in IL4R, rs255049 in DPEP3, rs2015086 in CCL18, rs2074158 in DHX58, rs1135889 in FBF1 and rs132875 in NUP50 with predisposition to CHD.

Variants called in the centenarian Pool only
The six variants found only in centenarians are as follows: rs5065 in NPPA, rs1058793 in ATP10D, rs9449444 in IBTK, rs2066714 in ABCA1, rs11838267 in C1S, and rs11881940 in HNRNPUL1. They are not expected to reduce life expectancy. Two are classified as benign in ClinVar, whereas the remaining 4 are not reported at all. None of these 6 variants could be nominated for association with CHD in the Bulgarian population.

Conclusions
CHD is determined by complex interactions between genetic and environmental factors, both of which vary across populations. The clinical relevance of reported CHD-AVs is complicated by population specific Figure 2. Workflow of prioritization of coronary heart disease associated variants in the present study.
differences in their frequency and phenotype associations. Out of 2025 CHD-AVs from Ensemble/DisGeNet, only 158 were found in Bulgarian exomes of healthy individuals. Only 5 of these were prioritized for association with CHD after applying the following criteria: participation of these variants' genes in significantly overrepresented pathways, significantly higher allele frequency in the young individuals compared to the centenarian pool, and sufficient support of our results from other studies.

Disclosure statement
No potential conflict of interest was reported by the authors.