The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15
INTRODUCTION Human genome: – Non-transcribed – Transcribed Coding Non-coding Only 5% are protein-coding genes dark matter? Nowdays: genes 1990s: genes
Non-coding RNA (ncRNA) Difficulties in defining a non-coding transcript Overlapping What fraction of all intergenic sequence in the human genome is transcribed into stable noncoding RNA products? What are their sequences and expression patterns? Function? Ribosomes, tRNA, snRNA… (Kapranov et al., Science 2007)
cDNA Sanger sequencing FANTOM project findings (early 2000s) – Majority of transcripts contain single exons (no splicing) – Poorly conserved – Low expression levels Do they have a real function?
Tiling Arrays This technology detects transcription using probes that are regularly spaced on the genome Transcribed dark matter was found by tiling microarrays to be even more abundant in human, mouse and other genomes Limitation: transcription outside known genes
RNA-seq Tens of millions of fragment reads are mapped to a reference genome sequence and intersected with existing or novel transcript and gene annotations Both the proportion of reads that contribute to transcribed dark matter (‘dark matter mass’) and the fraction of the genome sequence covered by such reads (‘dark matter coverage’) can be calculated.
RNA-seq Dark matter mass is thus relatively low, consistent with previous observations from cDNA sequencing and tiling arrays that ncRNAs. Dark matter coverage, on the other hand, is relatively high with over a quarter of all transcribed regions not overlapping known genes (Ponting et al., Hum. Mol. Gen. 2010)
RNA-seq Understimation: – Ambigous gene annotations – Antisense transcript – Containing of TE by non-coding transcripts
FUNCTIONALITY Together cDNA sequencing, tiling arrays and RNA-Seq approaches have identified thousands of long (.200 bp) intergenic ncRNA (lincRNA) loci in human and mouse genomes. Differential expression among different tissues Purifying selection? Low abundant lincRNAs may act in cis. In contrast, those lincRNAs with stable secondary structures and that act in trans perhaps are likely to be more abundant.
Chromatin modification (Qu and Andelson., Frontiers in Gen. 2012)
Transcriptional regulation (Qu and Andelson., Frontiers in Gen. 2012)
Post-transcripional regulation (Qu and Andelson., Frontiers in Gen. 2012)
lncRNAs in Human Disease p53 response through regulation in trans lnc-RNAs in GWAS: diabetes, gliomas, coronary diseases Missing heritability? lncRNAs specific drugs Use as biomarkers?
Conclusions We now know that the human genome contains thousands of lncRNAs, both genic and intergenic. This new class of non-protein coding RNAs (ncRNAs) lack functional ORFs, are modestly con- served and seem to negatively and positively regulate protein coding gene expression, in cis and trans. Diverse mechanisms of action have been observed Main goal: characterize their status and functionality the concept of a ‘gene’ will increasingly appear incomplete and overly simplistic.
References Derrien et al., The long non-codingRNAs: a new(p)layer in the dark matter. Frontiers in Genetics. January(2) Qu and Andelson, Evolutionary conservation and functional roles of ncRNA. Frontiers in Genetics. October(3) Ponting et al., Transcribed dark matter: meaning or myth?. Human Molecular Genetics. (19) Wilhem et al., Defining transcribed regions using RNA-seq. Nature Protocols. (5) Shapiro et al., The coding and non-coding architecture of the Caulobacter crescentus genome. PLOS Genetics. July (10) 2014 Cirulli et al. Screening the human exome: a comparison of whole genome and whole transcriptome sequencing. Genome Biology. (11) 2010.