Clinical genetic testing approaches typically focus on the ~1.5% of the genome that codes directly for protein, and where deciphering the effect of an individual variant is relatively straightforward. Using this strategy, however, ~50% of rare disease is genetically unexplained. A critical portion of the remaining genomic sequence has important roles in regulating protein expression, however, prioritising likely disease-relevant variants and regions in this non-coding sequence presents a considerable challenge.
5’untranslated regions (5’UTRs) are encoded directly upstream of protein-coding regions and regulate both the stability of the mRNA and the rate at which it is translated into protein. Using 15,708 whole genome sequenced individuals from the Genome Aggregation Database (gnomAD) and cohorts of disease patients, we systematically explored the deleteriousness of variants in 5’UTRs. This analysis identifies a subset of 5’UTR variants that result in reduced protein translation and lead to dominant human disease.