Global Biobank analyses provide lessons for developing polygenic risk scores across diverse cohorts
Wang Y., Namba S., Lopera E., Kerminen S., Tsuo K., Läll K., Kanai M., Zhou W., Wu KH., Favé MJ., Bhatta L., Awadalla P., Brumpton B., Deelen P., Hveem K., Lo Faro V., Mägi R., Murakami Y., Sanna S., Smoller JW., Uzunovic J., Wolford BN., Wu KHH., Rasheed H., Hirbo JB., Bhattacharya A., Zhao H., Surakka I., Lopera-Maya EA., Chapman SB., Karjalainen J., Kurki M., Mutaamba M., Partanen JJ., Brumpton BM., Chavan S., Chen TT., Daya M., Ding Y., Feng YCA., Gignoux CR., Graham SE., Hornsby WE., Ingold N., Johnson R., Laisk T., Lin K., Lv J., Millwood IY., Palta P., Pandit A., Preuss MH., Thorsteinsdottir U., Zawistowski M., Zhong X., Campbell A., Crooks K., de Bock GH., Douville NJ., Finer S., Fritsche LG., Griffiths CJ., Guo Y., Hunt KA., Konuma T., Marioni RE., Nomdo J., Patil S., Rafaels N., Richmond A., Shortt JA., Straub P., Tao R., Vanderwerff B., Barnes KC., Boezen M., Chen Z., Chen CY., Cho J., Smith GD., Finucane HK., Franke L., Gamazon ER., Ganna A., Gaunt TR., Ge T., Huang H., Huffman J., Koskela JT., Lajonchere C., Law MH., Li L., Lindgren CM., Loos RJF., MacGregor S., Matsuda K., Olsen CM., Porteous DJ., Shavit JA., Snieder H.
Polygenic risk scores (PRSs) have been widely explored in precision medicine. However, few studies have thoroughly investigated their best practices in global populations across different diseases. We here utilized data from Global Biobank Meta-analysis Initiative (GBMI) to explore methodological considerations and PRS performance in 9 different biobanks for 14 disease endpoints. Specifically, we constructed PRSs using pruning and thresholding (P + T) and PRS-continuous shrinkage (CS). For both methods, using a European-based linkage disequilibrium (LD) reference panel resulted in comparable or higher prediction accuracy compared with several other non-European-based panels. PRS-CS overall outperformed the classic P + T method, especially for endpoints with higher SNP-based heritability. Notably, prediction accuracy is heterogeneous across endpoints, biobanks, and ancestries, especially for asthma, which has known variation in disease prevalence across populations. Overall, we provide lessons for PRS construction, evaluation, and interpretation using GBMI resources and highlight the importance of best practices for PRS in the biobank-scale genomics era.