Download presentation
1
RNA-seq: the future of transcriptomics ……. ?
Disclaimer: Tiago Hori is not an expert on RNA-seq
2
RNA-seq or RNA-sequencing is not a complete novel idea.
Wang et al., 2009 RNA-seq or RNA-sequencing is not a complete novel idea. SAGE, long-SAGE, MPSS The recent developments in next-generation sequencing (NGS) have made whole transcriptomic analyses more accessible. Does it work? Comparison with microarrays Advantages and disadvantages How does it work? Challenges Are microarrays going to go extinct? Weapons of choice:
3
Marioni et al., 2008 There is a good correlation between microarray intensity and count data. There is also good correlation between Affymetrix fold-changes and Illumina-based RNA-seq fold-changes
4
The Pros and Cons of RNA-seq – do the benefits definitely outweigh the problems?
Advantages: Allows for not only the identification of differentially expressed genes, but also identification of differential allelic expression, SNPs, splice variants, new genes or isoforms. It is not limited to a set number of probes. It is NOT impacted by background signal or saturation that causes problems in studying high- and low-expression transcripts. Wang et al, 2009
5
The Pros and Cons of RNA-seq – do the benefits definitely outweigh the problems?
Disadvantages: Cost Dependent on a reference genome or transcriptome. * see Trapnell et al., 2010 – Nature Biotechnology (used 430 million paired-end reads to assemble a transcriptome de-novo Large amounts of data requiring large storage space and computational power Statistical methods are still in their infancy
6
How does it work? Agilent polyA selection NibleGen selection array
Generation of target cDNA (sequence specific, e.g. for allele discrimination) Helicos sequencing Ozsolak and Milos, 2011
7
How does it work? Oshlack et al., 2010
8
Mapping Challenges: Computational power required Exon junctions
Alleles and SNPs Two main methods: Based on hash tables (local alignment similar to BLAST) Based on prefix/suffix trie
9
BFAST BWA-SW Homer et al., 2009 Li and Durbin et al., 2010 One of the biggest challenge with mapping is to reduce the “RAM footprint” of the reference genome. This is accomplished by different ways of indexing the reference. The other challenge is to map accurately while allowing for variable reads (e.g. SNPs or error) to be mapped.
10
Data summarization: There are 3 main ways of summarizing your data:
Counts per exon Counts per transcript Counts per gene (Oshlack et al., 2010)
11
Is RNA-seq data absolute mRNA count?
Normalization: Is RNA-seq data absolute mRNA count? Within libraries: Length bias Sequencing efficiency Between libraries: Sequencing depth Over-representation of highly-expressed transcripts
12
Differential Expression detection:
Challenges: Requires biological replication but perhaps not technical replication. Count data is discrete rather than continuous. There is evidence the count data follow a negative binomial distribution similar to the Poisson distribution. Accounting for type I error (False-Discovery) Bioconductor packages: edgeR: Developed for SAGE uses a modified Fisher exact test for dispersed data (means and variance estimated using maximum likelihood) DESeq: Similar to edgeR but uses a different model to estimate means and variance (empirical estimation of mean-variance relationship) BaySeq: Empirical Bayes inference to test of differential expression
13
What do you do with data and what does it all mean?
Systems Biology: DAVID and other microarray techniques used for GO enrichment KEGG pathways What do you do with data and what does it all mean? Resources:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.