Intergenic disease-associated regions are abundant in novel transcripts.
Bartonicek N., Clark MB., Quek XC., Torpy JR., Pritchard AL., Maag JLV., Gloss BS., Crawford J., Taft RJ., Hayward NK., Montgomery GW., Mattick JS., Mercer TR., Dinger ME.
Genotyping of large populations through genome-wide association studies (GWAS) has successfully identified many genomic variants associated with traits or disease risk. Unexpectedly, a large proportion of GWAS single nucleotide polymorphisms (SNPs) and associated haplotype blocks are in intronic and intergenic regions, hindering their functional evaluation. While some of these risk-susceptibility regions encompass cis-regulatory sites, their transcriptional potential has never been systematically explored.To detect rare tissue-specific expression, we employed the transcript-enrichment method CaptureSeq on 21 human tissues to identify 1775 multi-exonic transcripts from 561 intronic and intergenic haploblocks associated with 392 traits and diseases, covering 73.9 Mb (2.2%) of the human genome. We show that a large proportion (85%) of disease-associated haploblocks express novel multi-exonic non-coding transcripts that are tissue-specific and enriched for GWAS SNPs as well as epigenetic markers of active transcription and enhancer activity. Similarly, we captured transcriptomes from 13 melanomas, targeting nine melanoma-associated haploblocks, and characterized 31 novel melanoma-specific transcripts that include fusion proteins, novel exons and non-coding RNAs, one-third of which showed allelically imbalanced expression.This resource of previously unreported transcripts in disease-associated regions ( http://gwas-captureseq.dingerlab.org ) should provide an important starting point for the translational community in search of novel biomarkers, disease mechanisms, and drug targets.