Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Most patients with rare diseases do not receive a molecular diagnosis and the aetiological variants and causative genes for more than half such disorders remain to be discovered1. Here we used whole-genome sequencing (WGS) in a national health system to streamline diagnosis and to discover unknown aetiological variants in the coding and non-coding regions of the genome. We generated WGS data for 13,037 participants, of whom 9,802 had a rare disease, and provided a genetic diagnosis to 1,138 of the 7,065 extensively phenotyped participants. We identified 95 Mendelian associations between genes and rare diseases, of which 11 have been discovered since 2015 and at least 79 are confirmed to be aetiological. By generating WGS data of UK Biobank participants2, we found that rare alleles can explain the presence of some individuals in the tails of a quantitative trait for red blood cells. Finally, we identified four novel non-coding variants that cause disease through the disruption of transcription of ARPC1B, GATA1, LRBA and MPL. Our study demonstrates a synergy by using WGS for diagnosis and aetiological discovery in routine healthcare.

Original publication




Journal article



Publication Date





96 - 102


Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.


NIHR BioResource for the 100,000 Genomes Project, Erythrocytes, Humans, Rare Diseases, Adaptor Proteins, Signal Transducing, Phenotype, Alleles, Quantitative Trait Loci, Internationality, Databases, Factual, National Health Programs, State Medicine, GATA1 Transcription Factor, Actin-Related Protein 2-3 Complex, Receptors, Thrombopoietin, United Kingdom, Whole Genome Sequencing