AbstractPolygenic scores (PGS) are individual-level measures that quantify the genetic contribution to a given trait. PGS have predominantly been developed using European ancestry samples and recent studies have shown that the predictive performance of European ancestry-derived PGS is lower in non-European ancestry samples, reflecting differences in linkage disequilibrium, variant frequencies, and variant effects across populations. However, the problem of how best to maximize performance within any one ancestry group given the data available, and the extent to which this varies between traits, are largely unexplored. Here, we investigate the effect of sample size and ancestry composition on the predictive performance of PGS for fifteen traits in UK Biobank and evaluate an importance reweighting approach that aims to counteract the under-representation of certain groups within training data. We find that, for a minority of the traits, PGS estimated using a relatively small number of Black/Black British individuals outperformed, on a Black/Black British test set, scores estimated using a much larger number of White individuals. For example, a PGS for mean corpuscular volume trained on only Black individuals achieved a 4-fold improvement on a corresponding PGS trained on only White individuals. For the remainder of the traits, the reverse was true; a PGS for height trained on only Black/Black British individuals explained less than 0.5% of the variance in height in a Black/Black British test set, compared to 3.9% for a PGS trained on a much larger training set consisting of only White individuals. We find that while importance weighting provides moderate benefit for some traits (for example, 40% improvement for mean corpuscular volume compared to no reweighting), the improvement is modest in most cases, arguing that only targeted collection of data from underrepresented groups can address differences in PGS performance.Author SummaryPolygenic scores (PGS) are individual-level measures that quantify the genetic contribution to a trait, such as height or blood pressure. Recent improvements in PGS performance have largely been limited to populations of European ancestry, reflecting the lack of ethnic diversity in genomic samples collected to date. Due to their potential negative impact on health inequalities, this lack of transferability across ancestries raises one of the most important technical and ethical challenges in the clinical utility and applications of PGS. Although there have recently been promising improvements in PGS performance for underrepresented groups, there remains a significant gap. In addition, while the growing availability of population-scale biobanks, such as UK Biobank, provides an opportunity to bridge part of this gap, the use of individual-level data within multiple-ancestry datasets is largely unexplored. In this paper, we evaluate the use of such datasets, combined with a novel reweighting approach, to improve predictive performance for underrepresented groups. We also consider how traits vary in terms of the best strategy for combining individuals of different ancestries when it comes to improving PGS performance. We find that, for a minority of traits, the optimal strategy for developing a PGS for Black or Black British individuals is to use only the small sample size available for this ethnic group1. For other traits, we find that reweighting has little effect and that the best strategy for the minority group is to use the largest training set, which contains only White individuals. Importantly, even given the optimal strategy, a large gap in PGS performance remains, indicating that only targeted collection of data from underrepresented groups can address differences in PGS performance.
Cold Spring Harbor Laboratory