Characterising the loss-of-function impact of 5’ untranslated region variants in 15,708 individuals
Whiffin N., Karczewski KJ., Zhang X., Chothani S., Smith MJ., Evans DG., Roberts AM., Quaife NM., Schafer S., Rackham O., Alföldi J., O’Donnell-Luria AH., Francioli LC., Armean IM., Banks E., Bergelson L., Cibulskis K., Collins RL., Connolly KM., Covarrubias M., Cummings B., Daly MJ., Donnelly S., Farjoun Y., Ferriera S., Gabriel S., Gauthier LD., Gentry J., Gupta N., Jeandet T., Kaplan D., Laricchia KM., Llanwarne C., Minikel EV., Munshi R., Neale BM., Novod S., Petrillo N., Poterba T., Roazen D., Ruano-Rubio V., Saltzman A., Samocha KE., Schleicher M., Seed C., Solomonson M., Soto J., Tiao G., Tibbetts K., Tolonen C., Vittal C., Wade G., Wang A., Wang Q., Watts NA., Weisburd B., Aguilar Salinas CA., Ahmad T., Albert CM., Ardissino D., Atzmon G., Barnard J., Beaugerie L., Benjamin EJ., Boehnke M., Bonnycastle LL., Bottinger EP., Bowden DW., Bown MJ., Chambers JC., Chan JC., Chasman D., Cho J., Chung MK., Cohen B., Correa A., Dabelea D., Daly MJ., Darbar D., Duggirala R., Dupuis J., Ellinor PT., Elosua R., Erdmann J., Esko T., Färkkilä M., Florez J., Franke A., Getz G., Glaser B., Glatt SJ., Goldstein D., Gonzalez C., Groop L., Haiman C., Hanis C., Harms M., Hiltunen M., Holi MM., Hultman CM., Kallela M., Kaprio J., Kathiresan S., Kim B-J., Kim YJ., Kirov G., Kooner J., Koskinen S., Krumholz HM., Kugathasan S., Kwak SH., Laakso M., Lehtimäki T., Loos RJF., Lubitz SA., Ma RCW., Marrugat J., Mattila KM., McCarroll S., McCarthy MI., McGovern D., McPherson R., Meigs JB., Melander O., Metspalu A., Neale BM., Nilsson PM., O’Donovan MC., Ongur D., Orozco L., Owen MJ., Palmer CNA., Palotie A., Park KS., Pato C., Pulver AE., Rahman N., Remes AM., Rioux JD., Ripatti S., Roden DM., Saleheen D., Salomaa V., Samani NJ., Scharf J., Schunkert H., Shoemaker MB., Sklar P., Soininen H., Sokol H., Spector T., Sullivan PF., Suvisaari J., Tai ES., Teo YY., Tiinamaija T., Tsuang M., Turner D., Tusie-Luna T., Vartiainen E., Watkins H., Weersma RK., Wessman M., Wilson JG., Xavier RJ., Vawter MP., Cook SA., Barton PJR., MacArthur DG., Ware JS.
Abstract Upstream open reading frames (uORFs) are tissue-specific cis -regulators of protein translation. Isolated reports have shown that variants that create or disrupt uORFs can cause disease. Here, in a systematic genome-wide study using 15,708 whole genome sequences, we show that variants that create new upstream start codons, and variants disrupting stop sites of existing uORFs, are under strong negative selection. This selection signal is significantly stronger for variants arising upstream of genes intolerant to loss-of-function variants. Furthermore, variants creating uORFs that overlap the coding sequence show signals of selection equivalent to coding missense variants. Finally, we identify specific genes where modification of uORFs likely represents an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in neurofibromatosis. Our results highlight uORF-perturbing variants as an under-recognised functional class that contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data in studying non-coding variant classes.