Presentation is loading. Please wait.

Presentation is loading. Please wait.

NESCENT : NGS : Measuring expression

Similar presentations


Presentation on theme: "NESCENT : NGS : Measuring expression"— Presentation transcript:

1 NESCENT : NGS : Measuring expression
Jen Taylor Bioinformatics Team CSIRO Plant Industry

2 Measuring Expression What & Why How In action
What is expression and why do we care? How Platforms / Technology Closed approaches – Microarray Open approaches - Sequencing Experimental Design Analysis Biases Bioinformatics Statistical Issues and Analysis In action Workshop – Detection of Differential Expression Case Studies in Plant functional genomics CSIRO. Nescent August Measuring Expression

3 What is expression / transcriptome ?
mRNA rRNA tRNA siRNA microRNA piRNA tasiRNA lncRNA DNA CSIRO. Nescent August Measuring Expression

4 Gonville & Caius College, Cambridge, UK.
Beyond the Genome: 1995 Human Genome sequencing begins in earnest “Mapping the Book of Life” First Draft Essential Completion = approx 140, 000 genes = 30, 000 – 40,000 genes ?? = 24, 195 genes !!!??? Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster) Gonville & Caius College, Cambridge, UK. CSIRO. Nescent August Measuring Expression

5 “The failure of the human genome”
“despite more than 700 genome-scanning publications and nearly $100bn spent, geneticists still had not found more than a fractional genetic basis for human disease “ Manolio et al., Nature, 2009 “The most likely explanation for why genes for common diseases have not been found is that, with few exceptions, they do not exist. …., if inherited genes are not to blame for our commonest illnesses, can we find out what is? “ Guardian, 2011 CSIRO. Nescent August Measuring Expression

6 Gene Number ≠ Complexity
Beyond the Genome: Gene Number ≠ Complexity Transcriptome Complexity Regulation Gene Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster) Gonville & Caius College, Cambridge, UK. CSIRO. Nescent August Measuring Expression

7 Why the expression ? High-throughput friendly Genome Predicts Biology
** Regulatory network Transcriptome Context dependent Proteome **Li et al., 2004 CSIRO. Nescent August Measuring Expression

8 Measuring Expression ? Comparisons Parts Description
Population - level Between genomes Parts Description Function? Interconnectedness? CSIRO. Nescent August Measuring Expression

9 Measuring Expression ? What are important members of a transcriptome?
mRNA polyadenylated, coding alternatively spliced Noncoding RNA (small RNA) varying lengths, functions (18 – 32 bases) microRNA, siRNA, piRNA, tasiRNA, long non-coding RNA “Dark” RNA transcription outside of annotated genes Non-polyadenylated Anti-sense transcription CSIRO. Nescent August Measuring Expression

10 Measuring Expression ? How does the transcriptome vary to give rise to phenotype ? Changes in Abundance Abundance = Rate of Transcription – Rate of Decay Changes in Function Availability for function – polyadenylation, silencing, localisation Suitability for function – alternate splicing CSIRO. Nescent August Measuring Expression

11 How to measure Expression
PLATFORMS / TECHNOLOGY CSIRO. Nescent August Measuring Expression

12 Measuring Expression : platforms
Closed systems – microarray Probes immobilised on a substrate profile target species in the transcriptome CSIRO. Nescent August Measuring Expression

13 CSIRO. Nescent August 2011 - Measuring Expression

14 Single and two colour arrays
Labelling Two colour Control Experimental Labelling Single colour Sample A Hybridisation Probe Library Array Manufacture Array Scanning CSIRO. Nescent August Measuring Expression

15 Array profiling Affymetrix Array Targets Arabidopsis Genome 24,000
C. elegans Genome ,500 Drosophila Genome , 500 E. coli Genome , 366 Human Genome U133 Plus 47,000 Mouse Genome 39, 000 Yeast Genome S.cerevisiae 5, 841 S. pombe 5, 031 Rat Genome 30, 000 Zebrafish 14, 900 Plasmodium / Anopheles P. faciparum 4,300 A. gambiae 14,900 Barley (25,500), Soybean (37, ,300 pathogen), Grape (15,700) Canine (21,700), Bovine (23,000) B.subtilis (5,000), S. aureus (3,300 ORFS), Xenopus (14, 400) CSIRO. Nescent August Measuring Expression

16 CSIRO. Nescent August 2011 - Measuring Expression

17 CSIRO. Nescent August 2011 - Measuring Expression

18 Closed System – Microarray
Pros High-throughput Targeted profiling Inexpensive – “population friendly” Analytical methods are standardised Negative “Closed system” , novel = invisible Difficult to see allelle-specific expression Biases due to hybridisation SNPs Competitive and non-specific hybridisation CSIRO. Nescent August Measuring Expression

19 Open systems – RNA Sequencing
Technology: Illumina SOLiD, IonTorrent 454 Pros: Transcript discovery Allelic expression High resolution abundance measures Cons: Analysis can be complex Expensive Sensitivity is sequencing depth dependent CSIRO. Nescent August Measuring Expression

20 RNA Sequencing Mortazavi et al., 2008
CSIRO. Nescent August Measuring Expression

21 RNASeq - Correspondence
Range > 5 orders of magnitude Better detection of low abundance transcripts Marioni et al., 2009 CSIRO. Nescent August Measuring Expression

22 Platform Choice / Sample Preparation Choice
What do you want to profile ? Polyadenylated PolyA RNA extraction Small RNA (< 100 bases) Size filtering by gel Strand-specific RNA – Protein Interactions RNA Immunoprecipitation (IP) CSIRO. Nescent August Measuring Expression

23 Differential Expression
RNASeq - Workflow Sample Total RNA PolyA RNA Small RNA Mapping to Genome Differential Expression SNP detection Transcript structure Secondary structure Targets or Products Library Construction Assembly to Contigs Sequencing Base calling & QC CSIRO. Nescent August Measuring Expression

24 Illumina RNASeq : TruSeq
CSIRO. Nescent August Measuring Expression

25 Small RNA sequencing Small RNA small RNA < 35bp 134
smallRNA separation: PAGE 134 110 75 25 small RNA < 35bp Control of: Adaptor removal (contaminants) Theoretical distributions (GC contents) Sequencing artefacts Low quality and low complexity sequence removal Base call reliability (PHRED score) Collapsing sequence redundancy Resolved issues: Adaptor version Illumina (one codon difference 3nt) ATCTCGTATGCCGTCTTCTGCTTG v1.5 TCGTATGCCGTCTTCTGCTTG SRA CSIRO. Nescent August Measuring Expression

26 Strand - specificity Using adaptors Using chemical modification
Ligation : 3’ and 5’ adaptors added sequentially dUTP : Addition and removal after selection SMART : addition of C’s on 5’ end Levin et al., 2010 CSIRO. Nescent August Measuring Expression

27 Levin et al., 2010 CSIRO. Nescent August Measuring Expression

28 Non-polyA methods Total RNA extraction
Ribosomal RNA and tRNA > 95-97% of total RNA Ribosomal reduction methods Subtractive hybridisation with rRNA probes Exonuclease cleave of rRNA NuGen – “proprietary combination of reverse transcriptase and primers in the Ovation RNA-Seq System” cDNA normalisation methods Partial digestion of any highly abundant species (Evrogen) CSIRO. Nescent August Measuring Expression

29 Platform Choice / Sample Preparation Choice
What do you want to profile ? Polyadenylated PolyA RNA extraction Small RNA (< 100 bases) Size filtering by gel Strand-specific RNA – Protein Interactions RNA Immunoprecipitation (IP) Non - PolyA rRNA reduction CSIRO. Nescent August Measuring Expression

30 EXPERIMENTAL DESIGN and ANALYSIS
CSIRO. Nescent August Measuring Expression

31 RNASeq Experimental Design
Issues: sequencing depth - how much ? number of replicates – how many ? Aims of the data : Transcriptome assembly / transcript characterisation Maximise depth Detection of differential expression (denovo or reference) Balance depth and replication CSIRO. Sequencing Depth V.S. Number of Replicates

32 Defining Replicates Technical Replicates Biological Replicates
Lane 1 Library 4 Multiplex Library 3 Library 2 Library 1 L1 L2 L3 L4 25% lane / sample Technical Replicates Biological Replicates Individual Individual 1 Individual 2 , Library 1 Library 2 Library 1 Library 2 Lane 1 Lane 2 Lane 3 Lane 4 Lane 1 Lane 2 Depth = 2 x 100% lane / sample 100% lane / sample CSIRO. Sequencing Depth V.S. Number of Replicates

33 CSIRO. Sequencing Depth V.S. Number of Replicates

34 Coverage Depth CSIRO. Sequencing Depth V.S. Number of Replicates

35 Number of Replicates # Reps 2 4 6 8 10 12 False P 0.03 False N 0.84
0.72 0.64 0.59 0.54 0.50 True P 0.16 0.28 0.36 0.41 0.46 True N 0.97 edgeR <= 0.01 , DESeq <= 0.01 More information in biological replicates than depth For differential expression CSIRO. Sequencing Depth V.S. Number of Replicates

36 RNASeq Analysis Overall Aim : Biases and Compositions Alignment
To get an accurate measurement of transcript abundance, structure and identity Biases and Compositions Alignment TopHat / Cufflinks Assembly ABySS CSIRO. Nescent August Measuring Expression

37 Assumptions Every transcript / k-mer has equal chance of being sequenced No. sequences observed ≈ transcript abundance Gene A = z Reads / million Gene B = y Reads / million z = 2 x y Gene A > Gene B CSIRO. Nescent August Measuring Expression

38 Length Bias Oshlack and Wakefield, 2009
CSIRO. Nescent August Measuring Expression

39 Alignment Bias CSIRO. Nescent August Measuring Expression

40 Alignment Bias CSIRO. Nescent August Measuring Expression

41 Sequencing Bias Hansen et al., 2010
CSIRO. Nescent August Measuring Expression

42 Bias Every transcript / k-mer has equal chance of being sequenced
No. sequences observed ≈ transcript abundance Gene A = z Reads / million / kb Gene B = y Reads / million / kb Weighting schemas (e.g. Cufflinks) : Mapability kmer / fragment frequencies CSIRO. Nescent August Measuring Expression

43 Bias Every transcript / k-mer has equal chance of being sequenced
No. sequences observed ≈ transcript abundance Sample A vs Sample B Gene A1 = z Reads per million Gene A2 = y Reads per million z = 2 x y CSIRO. Nescent August Measuring Expression

44 Read density variability
CSIRO. Nescent August Measuring Expression

45 RNASeq – Compositional properties
Depth of Sequence Sequence count ≈ Transcript Abundance Majority of the data can be dominated by a small number of highly abundant transcripts Ability to observe transcripts of smaller abundance is dependent upon sequence depth Fixed budget of reads CSIRO. Nescent August Measuring Expression

46 A simple example – compositional bias
Sequencing budget / depth: 4000 reads A D C B sample I Expected counts 1000 2000 Expected counts sample II A B CSIRO. Nescent August Measuring Expression

47 Soil diversity by phylogenetic analysis - Phylum level
454-sequence analysis of bacterial 16S rRNA gene ~410,000 sequences Recognized bacterial phyla A B C 0% 20% 40% 60% 80% 100% % distribution A. Richardson, CSIRO CSIRO. Nescent August Measuring Expression

48 RNASeq Bioinformatics Analysis
Aims: To get an accurate measurement of transcript abundance, structure and identity Biases and Compositions Relative abundances NOT absolute Alignment TopHat Assembly ABySS CSIRO. Nescent August Measuring Expression

49 RNA Sequencing analysis
Sequence Data Genome? Assembly Alignment Contigs Read Density Differential Expression SNPs Transcript Characterisation CSIRO. Nescent August Measuring Expression

50 RNASeq – Alignment Considerations
Reads with multiple locations Discard / Random Allocation Clustering - local coverage Weighting Reads Spanning Exons Make and align to exon junction libraries Denovo junction detection Summarisation of counts Exons Transcript boundaries Inferred read boundaries CSIRO. Nescent August Measuring Expression

51 TopHat Multimapping : ≤10 sites Assembly : consensus ‘island’ exon
Trapnell et al., 2009; Roberts et al., 2011 CSIRO. Nescent August Measuring Expression

52 TopHat / Cufflinks Heuristics : “Correct” errors in low coverage areas
Grabs 45 bp either side of islands to capture splice sites Collapse small islands Looks for junctions within larger islands, highly covered Cufflinks : calculates the probability of observing a certain fragment within a given transcript given surrounding fragments. Trapnell et al., 2009; Roberts et al., 2011 CSIRO. Nescent August Measuring Expression

53 Alignment Great if you have a fully annotated, reference
Okay.. If you have a partially annotated reference “Different” if you have a big bunch of ESTs Options: Align to a neighbouring genome or EST library Denovo transcriptome assembly Tools: ABySS, Mira, Trinity, HT-Seq, SAMtools CSIRO. Nescent August Measuring Expression

54 RNA Sequencing analysis
Sequence Data Genome? Assembly Alignment Contigs Read Density Differential Expression SNPs Transcript Characterisation CSIRO. Nescent August Measuring Expression

55 Denovo transcriptome assembly
ABySS MIRA Trinity Velvet AllPaths Soap-denovo Euler CABOG Edena SHARCGS VCAKE SSAKE CAP3 Will run on reasonable computer resources for large genomes (e.g. < 1 TB of RAM) Paired end data handling Platform flexible Handles haplotype complexity and polyploid genomes CSIRO. Nescent August Measuring Expression

56 Denovo transcriptome assembly
ABySS MIRA Trinity Velvet AllPaths Soap-denovo Euler CABOG Edena SHARCGS VCAKE SSAKE CAP3 Will run on reasonable computer resources for large genomes (e.g. < 1 TB of RAM) Handles paired end data Handles data from all platforms Handles haplotype complexity and polyploid genomes CSIRO. Nescent August Measuring Expression

57 Assembly – Kmer graphs K = 4 Miller et al., 2010
CSIRO. Nescent August Measuring Expression

58 Assembly – Kmer graphs Spurs Sequencing error Bubbles Sequencing error
Polymorphism Frayed Rope / Cycles Repeats Miller et al., 2010 CSIRO. Nescent August Measuring Expression

59 Assembly – Kmer graphs Spurs Sequencing error Bubbles Sequencing error
Polymorphism Frayed Rope / Cycles Repeats Miller et al., 2010 CSIRO. Nescent August Measuring Expression

60 ABySS & TransABySS User specifies k
Optimal k depends on sequencing depth CSIRO. Nescent August Measuring Expression

61 ABySS & TransABySS Sequencing depth is relative to transcript abundance Iterate over multiple k and merge Contigs contained within a large contig are “buried” CSIRO. Nescent August Measuring Expression

62 Assessing assembly quality ?
Comparisons between assembly algorithms Contig summary statistics Comparisons to known resources (e.g. ESTs) Trial on Rice Transcriptome: 120 Million 75 bp single end Illumina reads – embryo ABySS : Number of contigs = 6, 804 Contig length range = 38 – 2,818 [mean = 203] Database comparisons : Rice public cDNA sequences : 67, 393 Contigs with high quality matches to cDNA : 6,555 (96%) CSIRO. Nescent August Measuring Expression

63 RNASeq Bioinformatics Analysis
Aims: To get an accurate measurement of transcript abundance, structure and identity Biases and Compositions Relative abundances NOT absolute Alignment Assembly CSIRO. Nescent August Measuring Expression

64 STATISTICAL ISSUES CSIRO. Nescent August Measuring Expression

65 Measuring Expression – Statistical Issues
Data elements Normalisation Detection of Differential Expression CSIRO. Nescent August Measuring Expression

66 Count Data : of what ? CSIRO. Nescent August Measuring Expression

67 Count Data : of what ? Garber et al., 2011
CSIRO. Nescent August Measuring Expression Garber et al., 2011

68 Statistical analysis of RNASeq
Count data Distribution is positively skewed, not normal Between sample variability in counts - normalisation CSIRO. Nescent August Measuring Expression

69 Normalization is required
Two scenarios : 1. Different sizes of total reads (library size) 2. Fixed library size, subset of highly expressed reads in 1 sample. Both reduce sequencing budget available for the majority of transcripts CSIRO. Nescent August Measuring Expression

70 Normalisation Assume the majority of log ratios = 0 [No change]
TMM : Trimmed Mean of M values (log ratios) Adjust TMM to be equal between samples CSIRO. Nescent August Measuring Expression Robinson and Oshlack, 2010

71 DE genes with and without TMM normalization
CSIRO. Nescent August Measuring Expression

72 RNASeq data – Poisson Distributions
Poisson distributions are used when things are counted The probability of seeing n events in a fixed time or space The number of lions on a 1 day safari The number of raindrops on a tennis court The number of flying elephants in a year Requires λ : rate of events Variance = mean = λ CSIRO. Nescent August Measuring Expression

73 RNASeq data – Negative Binomial
RNASeq data is more variable than Poisson Variance > mean = λ Less prominent for large mean Over-dispersed Poisson Noise types Shot noise Unavoidable, prominent for low mean Technical noise Small, hopefully, can be managed Biological noise Sample differences CSIRO. Nescent August Measuring Expression

74 RNA Seq Variance also depends on the mean Anders, 2010
CSIRO. Nescent August Measuring Expression

75 Library normalisation
RNASeq Model The total counts for a transcript in sample j from condition c : Library normalisation Mean Value Fitted Variance (overdispersion) For a given gene , test for a difference in counts between conditions. Is mean c1 + mean c2 statistically different to mean c1 + mean c1? CSIRO. Nescent August Measuring Expression

76 RNASeq DE Testing DESeq – Anders and Huber, 2010
EdgeR – Robinson et al., 2009 – R BaySeq – Hardcastle and Kelley, 2010 – R DEGSeq – Wang et al., 2010 – R NBP - Di et al., 2011 LOX – Zhang et al., 2010 Infers expression measures allowing for incorporation of noise from different methodologies in the one experimental design CSIRO. Nescent August Measuring Expression

77 Measuring Expression What & Why How In action
What is expression and why do we care? How Platforms / Technology Closed approaches – Microarray Open approaches - Sequencing Experimental Design Analysis Biases Bioinformatics Statistical Issues and Analysis In action Workshop – Detection of Differential Expression Case Studies in Plant functional genomics CSIRO. Nescent August Measuring Expression

78 Thank you Acknowledgements Plant Industry Jennifer M Taylor
Contact Us Phone: or Web: Plant Industry Jennifer M Taylor Bionformatics Leader Phone: Acknowledgements Jose Robles Stuart Stephen Hua Ying Andrew Spriggs Alexie Pa NESCENT Funding Thank you


Download ppt "NESCENT : NGS : Measuring expression"

Similar presentations


Ads by Google