Download presentation
1
mRNA-Seq: methods and applications
Jim Noonan GENE 760
2
Introduction to mRNA-seq
Technical methodology Read mapping and normalization Estimating isoform-level gene expression De novo transcript reconstruction Sensitivity and sequencing depth Differential expression analysis
3
mRNA-seq workflow Wang et al. Nat Rev Genet 10:57 (2009)
Martin and Wang Nat Rev Genet 12:671 (2011)
4
Illumina RNA-seq library preparation
Capture poly-A RNA with poly-T oligo attached beads (100 ng total) (2x) RNA quality must be high – degradation produces 3’ bias Non-poly-A RNAs are not recovered Fragment mRNA Synthesize ds cDNA Ligate adapters Amplify Generate clusters and sequence
5
Ribosomal RNA subtraction
RiboMinus
6
Mapping RNA-seq reads and quantifying transcripts
7
RNA-seq reads mapped to a reference genome
Normalization : Reads per kilobase of feature length per million mapped reads (RPKM) Quantify expression of known genes (counting) Gene model level composite of the whole gene vs constitutive Differences that we are seeing could be due to splicing methods for isoform level expression values Transcriptome reconstruction combination of Tophat,paired end tags What is a “feature?” What about genomes with poor genome annotation? What about species with no sequenced genome? For a detailed comparison of normalization methods, see: Bullard et al. BMC Bioinformatics 11:94 (2010). Robinson and Oshlack, Genome Biol 11:R25 (2010)
8
Quantifying gene expression by RNA-seq
Use existing gene annotation: Align to genome plus annotated splices Depends on high-quality gene annotation Which annotation to use: RefSeq, GENCODE, UCSC? Isoform quantification? Identifying novel transcripts? Reference-guided alignments: Align to genome sequence Infer splice events from reads Allows transcriptome analyses of genomes with poor gene annotation De novo transcript assembly: Assemble transcripts directly from reads Allows transcriptome analyses of species without reference genomes
9
Composite gene model approach
Map reads to genome Map remaining reads to known splice junctions Requires good gene models Isoforms are ignored
10
Which gene annotation to use?
11
Strategies for transcript assembly
Garber et al. Nat Methods 8:469 (2011)
12
Splice-aware short read aligners
Martin and Wang Nat Rev Genet 12:671 (2011)
13
Reference based transcript assembly
Martin and Wang Nat Rev Genet 12:671 (2011)
14
Transcript assembly programs
Martin and Wang Nat Rev Genet 12:671 (2011)
15
Cufflinks: ab initio transcript assembly
Step 1: map reads to reference genome Trapnell et al. Nat. Biotechnology 28:511 (2010)
16
Cufflinks: ab initio transcript assembly
Isoform abundances estimated by maximum likelihood Trapnell et al. Nat. Biotechnology 28:511 (2010)
17
Graph-based transcript assembly
Martin and Wang Nat Rev Genet 12:671 (2011)
18
Graph-based transcript assembly
Martin and Wang Nat Rev Genet 12:671 (2011)
19
Trinity: de novo transcript assembly
Grabherr et al. Nat Biotechnol 29:644 (2011)
20
What depth of sequencing is required to characterize a transcriptome?
Wang et al. Nat Rev Genet 10:57 (2009)
21
Considerations Gene length: Expression level:
Long genes are detected before short genes Expression level: High expressors are detected before low expressors Complexity of the transcriptome: Tissues with many cell types require more sequencing Feature type Composite gene models Common isoforms Rare isoforms Detection vs. quantification Obtaining confident expression level estimates (e.g., “stable” RPKMs) requires greater coverage
22
Transcript detection is biased in favor of long genes
Tarazona et al. Genome Res 21:2213 (2011)
23
Applications of mRNA-seq
Characterizing transcriptome complexity Alternative splicing Differential expression analysis Gene- and isoform-level expression comparisons Novel RNA species lincRNAs and eRNAs Pervasive transcription Translation Ribosome profiling Allele-specific expression Effect of genetic variation on gene expression Imprinting RNA editing Novel events
24
Alternative isoform regulation in human tissue transcriptomes
Wang et al Nature 456:470 (2008)
25
Diversity of alternative splicing events in human tissues
Wang et al. Nature 456:470 (2008)
26
Differential expression
Garber et al. Nat Methods 8:469 (2011)
27
Programs for identifying DE genes in RNA-seq datasets
Assumed distribution for count data URL DESeq Negative binomial www-huber.embl.de/users/anders/DESeq/ DEGseq Poisson edgeR baySeq Cuffdiff cufflinks.cbcb.umd.edu/
28
Differential expression:
Characterizing transcriptome dynamics during brain development Neuronal functions synaptic transmission cell adhesion Embryonic mouse cortex RNA-seq DEX Neuronal migration “Stemness” functions Cell cycle M phase Sox2, Oct4 Ayoub et al PNAS 1086:14950 (2011)
29
Differential expression:
Characterizing transcriptome dynamics during brain development Embryonic mouse cortex Differential isoforms RNA-seq DE isoforms Ayoub et al PNAS 1086:14950 (2011)
30
Novel RNA species: annotating lincRNAs
Guttman et al Nat Biotechnol 28:503 (2010)
31
Enhancer-associated RNAs (eRNAs)
Neurons treated with KCL Kim et al Nature 465:182 (2010)
32
Enhancer-associated RNAs (eRNAs)
Ren B. Nature 465:173 (2010)
33
How much of the genome is transcribed?
van Bakel et al. PLoS Biol. 8:e (2010)
34
Exploiting sequence information in RNA-seq reads
Majewski and Pastinen. Trends Genet 27:72 (2011)
35
Detecting variants that affect splicing
Pickrell et al . Nature 464:768 (2010)
36
mRNA-seq applications
Summary: mRNA-seq applications Quantify transcriptome complexity and compare across biological states Determine how transcriptomes are translated in different biological contexts Effect of genetic variation on gene expression Imprinting and RNA editing
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.