Enhanced statistical tests for GWAS in admixed populations: assessment using African Americans from CARe and a Breast Cancer Consortium.
Pasaniuc B., Zaitlen N., Lettre G., Chen GK., Tandon A., Kao WHL., Ruczinski I., Fornage M., Siscovick DS., Zhu X., Larkin E., Lange LA., Cupples LA., Yang Q., Akylbekova EL., Musani SK., Divers J., Mychaleckyj J., Li M., Papanicolaou GJ., Millikan RC., Ambrosone CB., John EM., Bernstein L., Zheng W., Hu JJ., Ziegler RG., Nyante SJ., Bandera EV., Ingles SA., Press MF., Chanock SJ., Deming SL., Rodriguez-Gil JL., Palmer CD., Buxbaum S., Ekunwe L., Hirschhorn JN., Henderson BE., Myers S., Haiman CA., Reich D., Patterson N., Wilson JG., Price AL.
While genome-wide association studies (GWAS) have primarily examined populations of European ancestry, more recent studies often involve additional populations, including admixed populations such as African Americans and Latinos. In admixed populations, linkage disequilibrium (LD) exists both at a fine scale in ancestral populations and at a coarse scale (admixture-LD) due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered SNP association (LD mapping) or admixture association (mapping by admixture-LD), but not both. Here, we introduce a new statistical framework for combining SNP and admixture association in case-control studies, as well as methods for local ancestry-aware imputation. We illustrate the gain in statistical power achieved by these methods by analyzing data of 6,209 unrelated African Americans from the CARe project genotyped on the Affymetrix 6.0 chip, in conjunction with both simulated and real phenotypes, as well as by analyzing the FGFR2 locus using breast cancer GWAS data from 5,761 African-American women. We show that, at typed SNPs, our method yields an 8% increase in statistical power for finding disease risk loci compared to the power achieved by standard methods in case-control studies. At imputed SNPs, we observe an 11% increase in statistical power for mapping disease loci when our local ancestry-aware imputation framework and the new scoring statistic are jointly employed. Finally, we show that our method increases statistical power in regions harboring the causal SNP in the case when the causal SNP is untyped and cannot be imputed. Our methods and our publicly available software are broadly applicable to GWAS in admixed populations.