Integrative genomic analysis of N6-methyladenosine-single nucleotide polymorphisms (m6A-SNPs) associated with breast cancer

ABSTRACT Due to the important role of N6-methyladenosine (m6A) in breast cancer, single nucleotide polymorphisms (SNPs) in genes with m6A modification may also be involved in breast cancer pathogenesis. In this study, we used a public genome-wide association study dataset to identify m6A-SNPs associated with breast cancer and to further explore their potential functions. We found 113 m6A-SNPs associated with breast cancer that reached the genome-wide suggestive threshold (5.0E-05), and 86 m6A-SNPs had eQTL signals. Only six genes were differentially expressed between controls and breast cancer cases in GEO datasets (GSE15852, GSE115144, and GSE109169), and the SNPs rs4829 and rs9610915 were located next to the m6A modification sites in the 3ʹUTRs of TOM1L1 and MAFF, respectively. In addition, we found that polyadenylate-binding protein cytoplasmic 1 might have a potential interaction with rs4829 (TOM1L1) and rs9610915 (MAFF). In summary, these findings indicated that the SNPs rs4829 and rs9610915 are potentially associated with breast cancer because they had eQTL signals, altered gene expression, and were located next to the m6A modification sites in the 3ʹUTRs of their coding genes. However, further studies are still needed to clarify how genetic variation affects the epigenetic modification, m6A, and its subsequent functions in the pathogenesis of breast cancer.


Introduction
Breast cancer is the most frequently diagnosed cancer and the leading cause of cancer-related death among women worldwide [1,2]. Numerous potential risk loci and susceptible genes have been identified by genome-wide association studies (GWASs) in breast cancer [3,4]. Presently, highpenetrance genes (BRCA1, BRCA2, CDH1, STK11, PTEN, and TP53) and moderate-penetrance genes (ATM, CHEK2, PALB2, and BRIP1) are known to genetically predispose individuals to the development of breast cancer [5]. Recently, a growing number of studies have suggested that genetic variations, especially single nucleotide polymorphisms (SNPs), contribute to breast cancer susceptibility [6,7]because SNPs located in protein-coding regions, as well as multiple SNPs located in noncoding regions, can alter gene expression and participate in epigenetic regulation.
N6-methyladenosine (m 6 A) modification is the most abundant internal modification of messenger RNA (mRNA) and non-coding RNA, such as microRNA, long non-coding RNA, and circular RNA in eukaryotes [8,9]. Emerging evidence has shown that m 6 A is involved in RNA metabolism and is dynamically regulated by methyltransferases, demethylases, and binding proteins [10,11]. Furthermore, it was confirmed that m 6 A plays an important role in breast cancer [12,13]. For example, the m 6 A demethylase FTO promoted breast cancer progression by inhibiting BNIP3 13 , and the HBXIP-elevated methyltransferase METTL3 promoted breast cancer progression by inhibiting the tumor suppressor let-7 g [12]. Recent studies have also suggested that SNPs may affect m 6 A by altering the RNA sequences of the target sites or crucial nucleotides [14] and also contribute to the progression of breast cancer. Importantly, m 6 A-SNPs have the potential to regulate gene expression and mRNA stability, consequently leading to disease [15,16]. Yang et al. found that rs2416282 contributed to esophageal squamous cell carcinoma (ESCC) risk by regulating YTHDC2 expression [16]. Recently, another study found that the SNP rs5746136 affected m 6 A modification and influenced the expression of SOD2 in bladder cancer [15], and some METTL14 SNPs were similarly related to neuroblastoma [17]. Zhuo also reported that WTAP SNPs may be genetic modifiers in hepatoblastoma [18]. Ren et al. reported that the rs8400 AA genotype was correlated with increased expression of ALKBH5 and may be a risk factor for hepatoblastoma in the clinical stage III + IV subgroup [19]. In glioma, studies found the m 6 A-SNPs, rs7766006 (WTAP), rs3738067 (YTHDF2), and rs9939609 (FTO), to increase cancer risk, whereas rs2293595 (YTHDC1), rs3813832 (YTHDC1), and rs8047395 (FTO) reduced the risk [20].
Given that m 6 A-SNPs play a pivotal role in cancer [18,[21][22][23] and the effect of m 6 A -SNPs in breast cancer is unclear, identifying m 6 A -SNPs associated with breast cancer is necessary and could provide a new annotation for the pathogenic mechanism of breast cancer risk loci identified by GWAS. Therefore, this study aimed to identify m 6 A-SNPs associated with breast cancer using a public GWAS database and the m 6 AVar database, and to demonstrate their potential functions.

Screening for m 6 A -SNPs associated with breast cancer
The m 6 AVar database provides information for the annotation, visualization, and exploration of variants associated with m 6 A [24,25]. In the m 6 AVar database, variants were defined at three confidence levels, namely high confidence level when RNA modification sites were directly obtained from the m 6 A-Label-seq, DART-seq, m- 6 A-REF-seq, or miCLIP experiment; medium confidence level when RNA modification sites were obtained from the MeRIP-Seq or m 6 A-Seal-seq experiment; and low confidence level when RNA modification sites were obtained by prediction of the whole transcriptome.
The intersection of SNPs, which reached the genome-wide suggestive threshold (5.0E-05), between the breast cancer GWAS and m 6 Avar databases was used to explore m 6 A-SNPs associated with breast cancer. In this study, breast cancer GWAS data were downloaded from the website (ftp://ftp.ebi.ac.uk/pub/databases/gwas/ summary_statistics/MichailidouK_29059683_ GCST004988/), and the m 6 A-SNP list was obtained from the m 6 Avar database (http:// rmvar.renlab.org/download.html).

Expression quantitative trait loci (eQTL) analysis of m 6 A-SNPs
eQTL analysis can be used to evaluate the potential function of m 6 A-SNPs with eQTL signals in transcription regulation, such as altering protein binding, changing motifs, and affecting deoxyribonuclease. Hence, eQTL analysis was performed to investigate the breast cancer-associated m 6 A-SNPs that could affect RNA modification using the HaploReg browser (https://www.encode project.org/software/haploreg/) [26,27].

Gene Ontology (GO) enrichment analysis
GO enrichment analysis was performed using the Metascape tools (https://metascape.org/gp/index. html#/main/step1). GO biological process, GO molecular function, and GO cellular component were used to classify these genes according their functional annotation [28].

Prediction of m 6 A-SNPs affecting RNA modification
To determine whether the m 6 A-SNPs affected RNA modification and predicted changes in m 6 A modification, we used the sequencebased RNA adenosine methylation site predictor (SRAMP), which is an online m 6 A modification prediction tool [31]. Further, the USCS browser (GRCh37/hg19; https://genome.ucsc.edu/) includes abundant assemblies and annotations of the genomes of vertebrates and model organisms. Therefore, we used it to analyze the potential functions of m 6 A-SNPs [32,33].

Identification of m 6 A-SNPs associated with breast cancer
To identify m 6 A-SNPs associated with breast cancer, the intersection of the GWAS and m 6 AVar databases was analyzed. Of the 11,637,797 SNPs and 1,401,814 m 6 A-SNPs in the breast cancer GWAS and m 6 A Var databases (Figures 1 and 2), we identified 113 m 6 A-SNPs, which reached the genome-wide suggestive threshold (5.0E-05), to be associated with breast cancer (Table S1).
To check whether these m 6 A-SNPs have eQTL effects, the HaploReg browser was used, finding  that 86 of these breast cancer-associated m- 6 A-SNPs had eQTL signals. Subsequently, we found that of the 86 m 6 A-SNPs with eQTL signals, 31 belonged to the high confidence category, 20 to the medium confidence category, and 35 to the low confidence category. Additionally, among these 86 m 6 A-SNPs, 72 were loss-of-function SNPs and 14 were gain-of-function SNPs (Table S1).
According to GO enrichment analysis, we found that the molecular functions of these m 6 A-SNPs were associated with rRNA binding, calcium-dependent protein binding, DNA-binding transcription activator activity, and RNA polymerase II-specific binding. In addition, the functions of the genes with these m 6 A-SNPs were enriched in various biological processes, such as the activation of protein kinase A activity, mammary gland duct morphogenesis, and autophagosome maturation (Figure 3).
The results showed that five m 6 A-SNPs belonged to the high or medium confidence category, including rs1802212 (TOM1L1), rs4829 (TOM1L1), rs2267372 (MAFF), rs9610915 (MAFF), and rs1424916 (TNS1) ( Table 1). Based on the SRAMP website, we found moderately-highly convincible m 6 A-modified  predicted peaks near TOM1L1 (rs4829) and a moderately convincible m 6 A-modified predicted peak near MAFF (rs9610915) (Figure 6). We next used the USCS browser to analyze the potential functions of the m 6 A-SNPs. An integrative analysis of the potential functions of rs4829 and rs9610915 is shown in Figure 7. The SNP rs4829 was located in the 3ʹUTR of TOM1L1 on chromosome 17 and rs9610915 was located in the 3ʹUTR of MAFF on chromosome 22 (Figure 7). This finding suggested that these two SNPs may affect the m6A modification site and regulate gene expression to participate in breast cancer pathogenesis.

Discussion
m 6 A plays an important role in breast cancer, but the effect of m 6 A-SNPs in breast cancer is unclear. Identifying m 6 A-SNPs associated with breast cancer will provide a new annotation for the pathogenic mechanism of breast cancer risk loci identified by GWAS; therefore, we identified m 6 A-SNPs associated with breast Figure 6. The genomic sequence of TOM1L1 and MAFF transcripts were used to predict the m 6 A modification on website (http:// www.cuilab.cn/sramp). There were moderately-highly convincible m 6 A-modified predicted peaks near TOM1L1 (rs4829) and a moderately convincible m 6 A-modified predicted peak near MAFF (rs9610915). cancer and explored their potential functions. We found 113 m 6 A-SNPs that were associated with breast cancer, and 86 m 6 A-SNPs that had eQTL signals.
Additionally, there were 31 m 6 A-SNPs belonging to the high confidence category; further, 75% of the m 6 A-SNPs were distributed in intron regions, 20% were distributed in the 3ʹUTR region, and 5% were distributed in the 5ʹUTR region. However, there were no SNPs of m 6 (Table S1). This finding indicated that SNPs of these 18 m 6 A regulators may not affect breast cancer.
We also found that six corresponding genes of eight m 6 A-SNPs were differentially expressed in breast cancer, and the SNPs rs4829 and rs9610915 were located next to the m 6 A modification site in the 3ʹUTR of TOM1L1 and MAFF, respectively, suggesting that these two SNPs may affect the m 6 A modification site and regulate gene expression to play a role in breast cancer. Studies have shown that TOM1L1 is associated with breast cancer. For example, Chevalier et al. found that TOM1L1 amplification may enhance the metastatic progression of ERBB2-positive breast cancers by regulating the ERBB2-driven proteolytic invasion [34,35]. There is also evidence that MAFF regulates cancer pathogenesis. For instance, Martínez-Hernández et al. found 17 variants of MAFF in chronic myeloid leukemia (CML), and the rs9610915 SNP of MAFF was significantly associated with CML [36]. Further, RIP-chip GeneST from the ENCODE/SUNY Albany data suggested that polyadenylate-binding protein cytoplasmic 1 (PABPC1) might have a potential interaction with rs4829 (TOM1L1) and rs9610915 (MAFF) ( Figure  5). It is known that PABP1 participates in the initiation of translation and stabilization of mRNA and that PABPC1 can encode PABP1 protein and promote tumor progression of gliomas and hepatocellular carcinomas [37,38]. In breast cancer, studies have indicated that PABPC1 mediates SNHG14-induced oncogenic effects [39].

Conclusion
In this study, we used a comprehensive analysis of GWAS raw data and summary statistics combined with eQTL and differential gene expression analyses to identify breast cancer-associated m-6 A-SNPs. We found that two SNPs (rs4829 and rs9610915) had eQTL signals, altered gene expression, and were located next to the m 6 A modification sites in the 3ʹUTRs of their coding genes and that PABPC1 might have a potential interaction with these two SNPs. These findings indicated that the SNPs rs4829 and rs9610915 may be potentially associated with breast cancer. However, the relationship between these m 6 A-SNPs and breast cancer risk and their potential roles in the pathogenesis of breast cancer still require clarification.
(2) SNPs rs4829 and rs9610915 had eQTL signals, altered the expression, located next to the m6A 3'UTR modification site of their coding genes. (3) SNPs rs4829 and rs9610915 may be potentially associated with breast cancer. (4) PABPC1 might have a potential interaction with SNPs rs4829 and rs9610915.

Disclosure statement
No potential conflict of interest was reported by the author(s).