Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.

Slides:



Advertisements
Similar presentations
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advertisements

Simon v2.3 RNA-Seq Analysis Simon v2.3.
Peter Tsai Bioinformatics Institute, University of Auckland
RNA-seq: the future of transcriptomics ……. ?
RNAseq analysis Bioinformatics Analysis Team
RNA-seq data analysis Project
Transcriptomics Jim Noonan GENE 760.
MCB Lecture #21 Nov 20/14 Prokaryote RNAseq.
RNA-seq Analysis in Galaxy
High Throughput Sequencing
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
RNAseq analyses -- methods
Lecture 11. Microarray and RNA-seq II
NGS data analysis CCM Seminar series Michael Liang:
Next Generation DNA Sequencing
Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.
RNA-Seq Analysis Simon V4.1.
Transcriptome Analysis
Eran Yanowski, Eran Hornstein’s: Monitor drug impact on the transcriptome of mouse beta cells (primary and cell-line) using Transeq/RNA-Seq Report.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
The iPlant Collaborative
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
The iPlant Collaborative
The iPlant Collaborative
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
First of all: “Darnit Jim, I’m a doctor not a bioinformatician!”
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Introductory RNA-seq Transcriptome Profiling
An Introduction to RNA-Seq Data and Differential Expression Tools in R
Cancer Genomics Core Lab
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
RNA Sequencing Day 7 Wooohoooo!
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Short Read Sequencing Analysis Workshop
RNA-Seq analysis in R (Bioconductor)
Lab meeting
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Kallisto: near-optimal RNA seq quantification tool
ChIP-Seq Data Processing and QC
Maximize read usage through mapping strategies
Alignment of Next-Generation Sequencing Data
Sequence Analysis - RNA-Seq 2
BF528 - Sequence Analysis Fundamentals
Computational Pipeline Strategies
Differential Expression of RNA-Seq Data
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING

Hi Name: David Oliver Advisor: Dr. Shtutman Research: Understanding the role of COPZ2 silencing in cancer progression using RNA-seq to identify transcriptional changes caused by the loss of COPZ2 and its encoded microRNA. Experience: Microarray analysis, multiple RNA-seq analyses including long-read (PacBio) and short-read (illumina) sequencing experiments.

Why RNA-seq What’s the question? ◦Differential Expression ◦Differential splicing Advantage over other technologies ◦Increased sensitivity ◦Increased reproducibility RNA-Seq vs Dual- and Single-Channel Microarray Data: Sensitivity Analysis for Differential Expression and Clustering. Alina Sîrbu, Gráinne Kerr, Martin Crane, Heather J. Ruskin. Published: December 10, 2012DOI: /journal.pone

Before You Start Consult a statistician Consult your sequencing core

Actually Doing RNA-seq Minimum Requirements ◦Have consulted a statistician and your sequencing core ◦Know that your question can be answered using sequencing technology and that the experimental design is appropriate.

Actually Doing RNA-seq Minimum Requirements ◦Have consulted a statistician and your sequencing core ◦Know that your question can be answered using sequencing technology and that the experimental design is appropriate. ◦> 10,000,000 reads per sample ◦Much more depth required for differential splicing ◦≥ 3 biological replicates

Actually Doing RNA-seq Minimum Requirements ◦Have consulted a statistician and your sequencing core ◦Know that your question can be answered using sequencing technology and that the experimental design is appropriate. ◦> 10,000,000 reads per sample ◦Much more depth required for differential splicing ◦≥ 3 biological replicates ◦Access to decent amount of computing power ◦Can be done on a laptop but it takes ~ 3 weeks (ask me how I know) ◦Basic knowledge of Unix system and R

Actually Doing RNA-seq Minimum Requirements ◦Have consulted a statistician and your sequencing core ◦Know that your question can be answered using sequencing technology and that the experimental design is appropriate. ◦> 10,000,000 reads per sample ◦Much more depth required for differential splicing ◦≥ 3 biological replicates ◦Access to decent amount of computing power ◦Can be done on a laptop but it takes ~ 3 weeks (ask me how I know) ◦Basic knowledge of Unix system and R ◦Or, know someone who is willing to help you.

Actually Doing RNA-seq Suggested Pipeline ◦Quality assessment: ◦FastQC ◦FastX toolkit ◦Alignment: ◦Bowtie2/Tophat2 ◦STAR ◦NovoAlign ◦Counting reads: ◦FeatureCounts ◦Gencode annotation ◦Differential expression analysis ◦edgeR ◦Manipulating sequencing files ◦Samtools, bamtools Total RNA or mRNA RNA-Seq RNA expression levels Align to genome NovoAlign BowTie2 Normalization/ Quantification edgeR Quality Filtering Raw Reads Biological System STAR fastQC Read Counting FeatureCount Gencode Target Genome

RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification

RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification Total RNA or mRNA RNA-Seq Quality Filtering Raw Reads Biological System fastQC

Check some quality markers FastQC ◦Basic tool for generating reports ◦Java based ◦Does not provide tools for correcting errors (FastX toolkit) ◦ Other tools ◦FASTX toolkit: For fixing some problems with datasets (adapter trimming, readthrough error correction, etc) ◦SAMstat: A tool for alignment QC

RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification Total RNA or mRNA RNA-Seq Align to genome Quality Filtering Raw Reads Biological System fastQC Target Genome

Getting the target genome

RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification

Build aligner-specific indexed genome This step is performed by the aligner and takes a variable amount of time depending on the type of index used and the size of the genome to be indexed.

RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification Total RNA or mRNA RNA-Seq Align to genome NovoAlign Bowtie2 Quality Filtering Raw Reads Biological System STAR fastQC Target Genome

Perform alignment tophat2 -p 12 --no-coverage-search --b2-N 1 --b2-L 32 --b2-i S,1,0.5 --b2-D b2-R 25 -o $RNAwork/ $RNAwork/Indexes/hg38_index $RNAwork/sample1.fastq Reads: Input : Mapped : (90.6% of input) of these: (14.1%) have multiple alignments (436 have >20) 90.6% overall read mapping rate.

RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification

Do some file manipulation Depending on which aligner you choose to use, the aligned sequences may be output as a BAM or SAM file. ◦Sequence alignment/map format (SAM) ◦Contains all the alignment information plus room for user-defined information about the alignments ◦Binary alignment/map format (BAM) ◦A binary version of the SAM file ◦Added benefit of being much smaller and quickly accessed by other software ◦Not all software can manage the conversion from BAM back to SAM To manipulate these formats i.e. sort, remove duplicates, remove unaligned sequences, use either samtools or bamtools

RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification Total RNA or mRNA RNA-Seq Align to genome BowTie2 Quality Filtering Raw Reads Biological System fastQC Read Counting Gencode Target Genome

Get the annotation file Annotation files are readily available from multiple sources ◦Gencode ( ) ◦Ensembl ( ) ◦Vega ( ) ◦RefSeq ( ) These annotation sources mainly vary in the number of non-coding RNAs which have been annotated. ◦RefSeq < Gencode < Ensembl < Vega We use Gencode

RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification Total RNA or mRNA RNA-Seq Align to genome BowTie2 Quality Filtering Raw Reads Biological System fastQC Read Counting FeatureCount Gencode Target Genome

Count Reads FeatureCounts ◦We used to use HTseq-Count which was quite nice but we’ve switched to FeatureCounts because it is much, much, much faster. ◦Also comes as an R package (bioc::Rsubread)

RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification Total RNA or mRNA RNA-Seq Align to genome BowTie2 Quality Filtering Raw Reads Biological System fastQC Read Counting FeatureCount Gencode Target Genome RNA expression levels Normalization/ Quantification edgeR

Perform normalization and quantification EdgeR: counts <- read.table(file = "All_counts.csv”) counts <- na.omit(counts) counts <- counts[-(which(rowSums(counts) == 0)),] ### start edgeR ### group <- factor(rep(c("DU145.miR1","DU145.miR148a","DU145.miR148b","DU145.miR152"), each =3)) y <- DGEList(counts = counts, group = group)### convert count matrix to a DGEList object design <- model.matrix(~0+group) ### Experimental design keep 10); y <- y[keep,] ### Remove genes with really low counts per million y$samples$lib.size <- colSums(y$counts) ### this re-calculates the library size after removing samples with low CPM y <- calcNormFactors(y)### calculate between sample normalization y <- estimateGLMRobustDisp(y, design)### calculate within sample normalizations (sort of) fit <- glmFit(y, design)### fit the “massaged data” to a generalized linear model ### perform Likelihood Ratio Test on each contrast ### lrt.du145.mir148a <- glmLRT(fit, contrast=c(-1,1,0,0,0,0,0,0)) lrt.du145.mir148b <- glmLRT(fit, contrast=c(-1,0,1,0,0,0,0,0)) lrt.du145.mir152 <- glmLRT(fit, contrast=c(-1,0,0,1,0,0,0,0)) ### generate a user-friendly output table ### tt.du145.mir148a <- topTags(lrt.du145.mir148a, n = Inf, sort.by = "none") tt.du145.mir148b <- topTags(lrt.du145.mir148b, n = Inf, sort.by = "none") tt.du145.mir152 <- topTags(lrt.du145.mir152, n = Inf, sort.by = "none")

Expected Results

Long Read (> 1kb) RNA-seq Long read analysis is performed with essentially the same workflow. For alignment, STAR or GMAP work equally well

Questions?