Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017

Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
RNA-Seq Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017

What is RNA-Seq? An experimental protocol that uses next-generation sequencing technologies to sequence the messenger RNA molecules within a biological sample in an effort to determine the primary sequence and relative abundance of each mRNA Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet. 12(10): Also known as “Whole Transcriptome Shotgun Sequencing” (WTSS)

Sequencing strategy Metabolite profiling Plant material
combination of ½-plate of 454 and 1 lane of 108PE Illumina sequencing excellent depth and coverage high-quality assemblies submission of total RNA samples improves quality control takes better advantage of sequencing facilities similar overall cost 76SE Illumina sequencing on selected species for comparative transcriptomics Plant material Biochemistry PIs Total RNA extraction Bioanalyzer (RNA quality) mRNA isolation cDNA libraries Genome Québec Innovation Centre 454 (1/2-plate) Illumina 1 lane 108PE Reference transcriptomes (75) repeat sequencing in rare cases of low-quality initial output Bioinformatics Innovation Centre Bioinformatics

RNA-Seq workflow intron
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 10(1):57-63.

RNA-Seq vs. microarray Characteristics RNA-Seq Microarray
Which transcripts? All in a sample Only those for which probes are designed Transcript sequence generation Yes No Low-abundance transcript detection Limited Abundance info source Count (of the reads aligned to gene) Fluorescence level (of the probe spot for gene) Resolution Base Probe sequence Background noise Low High Additional info Alternative splicing, transcriptome-level variation

RNA-Seq data analysis Map reads Bin reads to features Normalize counts
Lots of short reads Reference genome Map reads Table of mapped loci per read Feature annotation (exons, genes, transcripts) Bin reads to features Table of counts per feature Usually combined in a tool Normalize counts Table of normalized quantification values per feature Detect differentially expressed (DE) features DE features

Mapping reads Need a reference genome Issues Huge amounts of data
Reads spanning across exon junction Alternative splicing Reads mapping to multiple locations in the genome Huge amounts of data Most common mapping results format SAM: sequence alignment/map BAM: binary format of SAM Many tools Bowtie, SOAP, BWA, SHRiMP, mrFAST, mrsFAST, ZOOM, SSAHA2, Mosaik

Bowtie

Binning reads Need annotated features
Exons, genes, transcripts For each feature, the total number of reads mapped is produced Not directly comparable across features/samples yet Usually followed by normalization

Normalizing counts Why normalize? RPKM is most frequently used
Longer features have more reads mapped Deeper sequencing produces more reads RPKM is most frequently used Reads Per Kilobase per Million reads Defined as C/(LN) C = number of reads mapped to a feature L = length of the feature (in kilobases) N = total number of reads from the sample (in millions)

RPKM examples

Gene model predicted for fungus Trametes versicolor using
Augustus and RNA-seq hints Above is a screenshot of Gbrowse instance for fungal species Trametes versicolor for Genozymes project. Project is sequencing both DNA and transcriptome (RNA-seq) and COE is responsible for annotation. Example of gene predicted using ab intio predictor Augustus (Confident models) using hints from RNA-seq to check accuracy of prediction - Hints are built from short-read alignment of Illumina RNA-seq spliced reads onto the genome (Mapped Reads) - Splice reads show direct evidence of introns (next slide) - Hints are used with ab initio predictors (Augustus) during training and prediction stages

Splice Variants

“non-coding” RNA molecules
LincRNA-p21 Tran et al., In press

MIRA Assembly Contig: T_rep_c1201 Read members: 96 Length: 2429 bp
Example MIRA Assembly Contig: T_rep_c1201 Read members: 96 Length: 2429 bp Combined Assembly T_rep_c1201 is part of a 6 member contig 2 are partial transcripts assembled by PTA

Detecting Differential Expression
Compare quantification values across samples or across features Most tools summarize/normalize counts and suggest DE features Cufflinks/Cuffdiff, R packages (DESeq, edgeR, baySeq, TSPM), SAMtools DE features go through similar analysis to microarray data analysis (e.g. validation)

Cufflinks

Cufflinks Tutorial

Anaerobic biocorrosion in reactors filled with WP-LS medium

SSV1 Replication Cycle (UV Induced)

Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017

Similar presentations

Presentation on theme: "Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017

Similar presentations

Presentation on theme: "Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017"— Presentation transcript:

Similar presentations

About project

Feedback