An Introduction to Studying Expression Data Through RNA-seq

Slides:



Advertisements
Similar presentations
Reverse Transcription Ex vivo analysis of splicing assays
Advertisements

RNA-seq library prep introduction
The Past, Present, and Future of DNA Sequencing
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
12/04/2017 RNA seq (I) Edouard Severing.
Module 12 Human DNA Fingerprinting and Population Genetics p 2 + 2pq + q 2 = 1.
Walk-thru of CAGE exercise Also at /tag_analysis/ /tag_analysis/
RNA and PROTEIN SYNTHESIS
RNAseq analysis Bioinformatics Analysis Team
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
Ribosomal Profiling Data Handling and Analysis
Mutations Georgia Standard:
High Throughput Sequencing
mRNA-Seq: methods and applications
FROM GENE TO PROTEIN: TRANSCRIPTION & RNA PROCESSING Chapter 17.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Ji-hye Choi August Introduction (2006) ABRF-NGS (the Association fo Biomolecular Resource Facilities next-generation sequencing study)
Genetics of Cancer.
RNA and Protein Synthesis
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
1 The Interrupted Gene. Ex Biochem c3-interrupted gene Introduction Figure 3.1.
Expression of the Genome The transcriptome. Decoding the Genetic Information  Information encoded in nucleotide sequences contained in discrete units.
Verna Vu & Timothy Abreo
The iPlant Collaborative
Chapter 21 Eukaryotic Genome Sequences
HaloPlexHS Get to Know Your DNA. Every Single Fragment.
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
LECTURE CONNECTIONS 19 | Molecular Genetic Analysis and © 2009 W. H. Freeman and Company Biotechnology.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
The Polymerase Chain Reaction (DNA Amplification)
DNA LIBRARIES Dr. E. What Are DNA Libraries? A DNA library is a collection of DNA fragments that have been cloned into a plasmid and the plasmid is transformed.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
No reference available
Gene Regulation In 1961, Francois Jacob and Jacques Monod proposed the operon model for the control of gene expression in bacteria. An operon consists.
Lesson Four Structure of a Gene. Gene Structure What is a gene? Gene: a unit of DNA on a chromosome that codes for a protein(s) –Exons –Introns –Promoter.
Transcription and Translation. Central Dogma of Molecular Biology  The flow of information in the cell starts at DNA, which replicates to form more DNA.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
High throughput expression analysis using RNA sequencing (RNAseq)
Library QA & QC Day 1, Video 3
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Canadian Bioinformatics Workshops
Lesson: Sequence processing
RNA-Seq for the Next Generation RNA-Seq Intro Slides
Cancer Genomics Core Lab
Lesson Four Structure of a Gene.
Lesson Four Structure of a Gene.
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
Canadian Bioinformatics Workshops
Forensic DNA Analysis Protein Synthesis.
Gene expression estimation from RNA-Seq data
Sarah K. Whitley, William T. Horne, Jay K. Kolls 
Small RNA Sample Preparation
CHAPTER 12 DNA Technology and the Human Genome
RNA sequencing (RNA-Seq) and its application in ovarian cancer
Digital Gene Expression – Tag Profiling Sample Preparation
BF nd (Next) Generation Sequencing
Sequence Analysis - RNA-Seq 2
Sequence Analysis - RNA-Seq 1
Presentation transcript:

An Introduction to Studying Expression Data Through RNA-seq By Jason Van Houten

Outline Why do we study RNA? What is RNA-seq? Issues with RNA quality Brief overview on how to make RNA-seq libraries Choices to make (depth, paired-end, cost, strand specificity, ect…) Examples

Before we start… Please ask questions!! If I’m not clear or have a comment, don’t be afraid to stop me. There is more than one right way to do this type of analysis What I present is only a variation I am not the expert! I am always learning new things as well so if you have a thought or opinion don’t be afraid to chime in as well

Gene Expression The Central Dogma Each level can tell us something different GIVE CREDIT

Why Study RNA over DNA? Functional studies Genome may be constant but an experimental condition has a pronounced effect on gene expression e.g. Drug treated vs. untreated cell line e.g. Wild type versus knock out mice Some molecular features can only be observed at the RNA level Alternative isoforms, fusion transcripts, RNA editing Predicting transcript sequence from genome sequence is difficult Alternative splicing, RNA editing, etc. EDIT!!! NOT MY WORDS!! www.bioinformatics.ca

Why Study RNA over DNA? Interpreting mutations that do not have an obvious effect on protein sequence ‘Regulatory’ mutations that affect what mRNA isoform is expressed and how much e.g. splice sites, promoters, exonic/intronic splicing motifs, etc. Prioritizing protein coding somatic mutations (often heterozygous) If the gene is not expressed, a mutation in that gene would be less interesting If the gene is expressed but only from the wild type allele, this might suggest loss-of-function (haploinsufficiency) If the mutant allele itself is expressed, this might suggest a candidate drug target EDIT!!! NOT MY WORDS!! www.bioinformatics.ca

What is RNA-seq Whole Transcriptome Shotgun Sequencing High-throughput sequencing of cDNA to gain information about that samples RNA content. “Transcription Snap-shot” Know the expression levels of every gene in the genome at that particular point in time.

What is Next Gen Seq? Video! Briefly discusses library prep and how sequencing works

We start by asking a question Condition 1 (normal colon) Condition 2 (colon tumor)

We start by asking a question Condition 1 (normal colon) Condition 2 (colon tumor) What genes are turned on or off during these conditions? What about whole gene pathways? Change of expression of one gene effect the expression of many

RNA sequencing Overview Fragment,generate cDNA, add adapters, size select, PCR amplify Isolate RNAs Samples of interest Condition 1 (normal colon) Condition 2 (colon tumor) Sequence ends Map to genome, transcriptome, and predicted exon junctions EDIT and Give Credit 100s of millions of paired reads 10s of billions bases of sequence Downstream analysis www.bioinformatics.ca

Challenges of Studying RNA RNAs consist of small exons that may be separated by large introns Mapping reads to genome is challenging The relative abundance of RNAs vary wildly 105 – 107 orders of magnitude Since RNA sequencing works by random sampling, a small fraction of highly expressed genes may consume the majority of reads Ribosomal and mitochondrial genes RNAs come in a wide range of sizes Small RNAs must be captured separately PolyA selection of large RNAs may result in 3’ end bias RNA is fragile compared to DNA (easily degraded) www.bioinformatics.ca

mRNA Selection

mRNA Selection

mRNA Selection

Quality Agilent Bioanalyzer Very good RNA, RIN of 10 Still good, RIN of 8.9 Starting to get worse, RIN of 6.3

Quality RIN 3 RIN 2.2

Best Practice www.invitrogen.com RNA is highly susceptible to degradation by RNAse enzymes. RNAse enzymes are present in cells and tissues and can be carried on hands, labware, or even dust. They are very stable and difficult to inactivate. For these reasons, it is important to follow best laboratory practices while preparing and handling RNA samples. When harvesting total RNA, use a method that quickly disrupts tissue and isolates and stabilizes RNA Wear gloves and use sterile technique at all times Reserve a set of pipettes for RNA work. Use sterile RNAse-free filter pipette tips to prevent cross-contamination Use disposable plasticware that has been certified to be RNAse-free. All reagents should be prepared from RNAse-free components, including ultrapure water Store RNA samples by freezing. Keep samples on ice at all times while working with them. Avoid extended pauses in the protocol until the RNA has been reverse transcribed into DNA Use RNAse/DNAse decontamination solution to decontaminate work surfaces and equipment EDIT to make shorter www.invitrogen.com

Now we are ready to sequence!

Length of Reads/single vs. paired Longer reads gives you better alignment confidence Maximizes sequencing coverage on the flow cell Average number of sequences representing a particular region of the transcriptome Paired ends help deduce large insertions/deletions/rearrangements Drawback- It costs more

Depth The number of reads per sample/library More depth means more likely to see genes that are very low expressed ~200 million reads can be generated per lane on a flowcell Nat Rev Genet 2009. 10, 57-63

How much depth do you need? Depends on application Differential gene expression, variant detection 10x – 30x coverage If your interested in lower expressed genes, then you still might need more. For applications like transcriptome assemblies Much more depth needed So we choose how much we want to add to a lane for sequencing depending on how much depth we need.

Multiplexing Add a “barcode” to each sample/library then mix and sequence A string of unique nucleotides within the adapter Using barcode, sequenced reads can be traced back to their appropriate sample. Barcodes mixed Sequencing B3 B1 B2 B4

Cost In our lab, it only cost us about ~$30 a library to construct ourselves. Additionally, you have sequencing costs Depends on length(cycles)/paired end Depends on facility and machine $1100-2200 per lane HiSeq2000 Again, multiplexing reduces cost per sample

Advantages of RNA-Seq compared with other transcriptomics methods

Typical Differential Gene Expression Workflow Raw reads Filter Reads Assemble transcriptome Align to reference genome/transcriptome Count reads that map to genes Run statistical tests Evaluate genes that are differentially expressed

Strand Specificity BMC Genomics 2012, 13:721 BMC Genomics 2012, 13:721

FPKM (RPKM): Expression Normalization Fragments (Reads) Per Kilobase of exon model per Million mapped fragments C= the number of reads mapped onto the gene's exons (raw counts) N= total number of reads in the experiment L= the sum of the exons in base pairs (size of gene). Example 1: Large gene #1 with 100 reads and small gene #2 with 100 reads Gene1<gene2 Example 2: library 1 has half the depth of library 2. Gene 1 has 50 reads in library 1 and 100 reads in library 2 Expression for gene1 is the same

Conclusions We know what RNA-seq is! RNA quality is very important Library preparation Next Generation Sequencing RNA quality is very important 3’ bias Tips to protect Things to consider within the cost versus information balance Introduced some analysis

Acknowledgements I thank HHMI, the van der Knaap lab, Dr. Dean Fraga and everyone involved in this workshop for making this possible

Thank You! Questions?