Skip to Main Content
485
Views
4
CrossRef citations to date
Altmetric
Pages 1079-1091
Received 01 Oct 2017
Accepted 20 Aug 2019
Accepted author version posted online: 12 Sep 2019
Published online: 16 Oct 2019
 
Translator disclaimer

Abstract

Studying the effects of groups of single nucleotide polymorphisms (SNPs), as in a gene, genetic pathway, or network, can provide novel insight into complex diseases such as breast cancer, uncovering new genetic associations and augmenting the information that can be gleaned from studying SNPs individually. Common challenges in set-based genetic association testing include weak effect sizes, correlation between SNPs in a SNP-set, and scarcity of signals, with individual SNP effects often ranging from extremely sparse to moderately sparse in number. Motivated by these challenges, we propose the Generalized Berk–Jones (GBJ) test for the association between a SNP-set and outcome. The GBJ extends the Berk–Jones statistic by accounting for correlation among SNPs, and it provides advantages over the Generalized Higher Criticism test when signals in a SNP-set are moderately sparse. We also provide an analytic p-value calculation for SNP-sets of any finite size, and we develop an omnibus statistic that is robust to the degree of signal sparsity. An additional advantage of our work is the ability to conduct inference using individual SNP summary statistics from a genome-wide association study (GWAS). We evaluate the finite sample performance of the GBJ through simulation and apply the method to identify breast cancer risk genes in a GWAS conducted by the Cancer Genetic Markers of Susceptibility Consortium. Our results suggest evidence of association between FGFR2 and breast cancer and also identify other potential susceptibility genes, complementing conventional SNP-level analysis. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

Supplementary Material

The supplementary materials provide the proof of Theorem 1 from Section 3.3, offer further details on how to calculate the exact p-value from Equation (5) in Section 3.4, demonstrate the accuracy of the p-value calculation from Section 3.4, give an alternative visualization of the rejection region plots from Section 4, list the exact simulation parameters and provide further power results from Section 5, show diagnostic QQ-plots from the analysis of Section 6, and evaluate the accuracy of the summary statistic correlation approximation using data from Section 6.

Additional information

Funding

This work was supported by the National Institutes of Health grants R35-CA197449, P01-CA134294, U19-CA203654, and R01-HL113338. The authors would like to thank the editor, associate editor, and referees for helpful comments that have improved the article.

Login options

Purchase * Save for later
Online

Article Purchase 24 hours to view or download: USD 44.00 Add to cart

Issue Purchase 30 days to view or download: USD 268.00 Add to cart

* Local tax will be added as applicable