The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq visualization with cummeRbund.

Slides:



Advertisements
Similar presentations
Visualizing RNA-Seq Differential Expression Results with CummeRbund
Advertisements

Simon v2.3 RNA-Seq Analysis Simon v2.3.
Peter Tsai Bioinformatics Institute, University of Auckland
DEG Mi-kyoung Seo.
Data Analysis for High-Throughput Sequencing
OHRI Bioinformatics Introduction to the Significance Analysis of Microarrays application Stem.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
RNA-seq Analysis in Galaxy
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
An Introduction to RNA-Seq Transcriptome Profiling with iPlant
RNA-Seq Visualization
Introduction to RNA-Seq and Transcriptome Analysis
Customized cloud platform for computing on your terms !
Expression Analysis of RNA-seq Data
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
RNAseq analyses -- methods
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iCommands and Other Data Store Resources.
Introduction to RNA-Seq & Transcriptome Analysis
RNA-Seq Analysis Simon V4.1.
Transcriptome Analysis
RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.
An Introduction to RNA-Seq Transcriptome Profiling with iPlant.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Introduction to RNA-Seq
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE.
1 Identifying differentially expressed genes from RNA-seq data Many recent algorithms for calling differentially expressed genes: edgeR: Empirical analysis.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Atmosphere.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Numerical Measures of Variability
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop GWAS/QTL Apps Overview.
Introduction to RNAseq
Build an Automated Workflow Visual Workflow Creator Discovery Environment.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop …and Environments.
The iPlant Collaborative
An Introduction to RNA-Seq Transcriptome Profiling with iPlant (
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop BISQUE.
The iPlant Collaborative
No reference available
RNA-Seq visualization with CummeRbund
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Transforming Science Through Data-driven Discovery Genomics in Education University of Delaware – February 2016 Jason Williams, Education, Outreach, Training.
Canadian Bioinformatics Workshops
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Arrays How do they work ? What are they ?. WT Dwarf Transgenic Other species Arrays are inverted Northerns: Extract target RNA YFG Label probe + hybridise.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Statistics Behind Differential Gene Expression
Introductory RNA-seq Transcriptome Profiling
RNA Quantitation from RNAseq Data
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
RNA-Seq visualization with CummeRbund
RNA-Seq analysis in R (Bioconductor)
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Introductory RNA-Seq Transcriptome Profiling
Exploring and Understanding ChIP-Seq data
Assessing changes in data – Part 2, Differential Expression with DESeq2
Additional file 2: RNA-Seq data analysis pipeline
Transcriptomics – towards RNASeq – part III
Presentation transcript:

The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq visualization with cummeRbund

Papers and source materials Useful References *Graphics taken from these publications

Tuxedo Workflow Differential expression *TopHat and Cufflinks require a sequenced genome

Discovery Environment Using a GUI Tophat (bowtie) Cufflinks Cuffmerge Cuffdiff CummeRbund Your Data iPlant Data Store FASTQ Discovery Environment Atmosphere

CummeRbund Bioconductor R library; Getting started in Atmosphre “Allows for persistent storage, access, exploration, and manipulation of Cufflinks high-throughput sequencing data. In addition, provides numerous plotting functions for commonly used visualizations.” Any image w/R can work, and you could also search for an image with cummeRbund installed

Bring your Data into Atmosphere Using iCommands or iDrop

Connect with VNC Visualization use case for Atmosphere VNC Viewer

Installing CummeRbund Available via Bioconductor

Examine the cufflinks data > cuff <- readCufflinks() > cuff CuffSet instance with: 2 samples genes isoforms TSS CDS promoters splicing relCDS

Visualize sample dispersion >disp<-dispersionPlot(genes(cuff)) >disp Counts vs. dispersion Overdispersion greater variability in a data set than would be expected based on a given model ( in our case extra-Poisson variation) If you use Poisson model, you will overestimate differential expression

Variation matters Poisson adequately describes technical variation

Overdispersed Data

Squared Coefficient of Variation >genes.scv<-fpkmSCVPlot(genes(cuff)) >genes.scv Normalized measure of cross-replicate variability Represents the relationship of the standard deviation to the mean Differences in SCV can result in lower numbers of differentially expressed genes due to a higher degree of variability between replicate fpkm estimates

Distributions of FPKM scores across samples >dens<-csDensity(genes(cuff)) >dens >densRep<-csDensity(genes(cuff),replicates=T) >densRep Non-parametric estimate of pdf

FPKM Pairwise Scatter Plots > csScatter(genes(cuff),‘WT’,‘hy5’,smooth=T)

Saving your Plots Just in case you are not working in R studio 1. Plot type: >(e.g. jpeg, png, pdf) (file_path_and_file_name) 2. Plot function 3. dev.off() > png (‘csScatter.png’) #Will save in working directory > csScatter(genes(cuff),‘WT’,‘hy5’,smooth=T) >dev.off

Selecting and Filtering Gene Sets Using the ‘getSig’ function # Enables you to get genes at significance n >sig <-getSig(cuff, alpha=0.05, level =‘genes’) # genes of significance 0.05 >length(sig) #returns the number of genes in the sig object >sig <-getSig(cuff, alpha=0, level=‘genes’) >tail(sig,100) #displays the last 100 genes in the sig object you just made

Selecting and Filtering Gene Sets Using the ‘getGenes’ function # Get the gene information >sigGenes <- getGenes(cuff,sig) Plot this in another scatter plot >csScatter(sigGenes, ‘WT’, ‘hy5’)

Heat mapping Similar Expression Values >sigGenes <-getGenes(cuff,tail(sig,50)) #last 50 genes in the list we created >csHeatmap(sigGenes,cluster=‘both’)

Heat mapping Similar Expression Values >csHeatmap(sigGenes,cluster=‘both’,replicates=‘T’)

Expression Plots by Genes > myGeneId<-”AT5G41471" > myGene<-getGene(cuff,myGeneId) > myGene

Expression Plots by Genes > expressionPlot(myGene,replicates=‘T’)

Keep asking: ask.iplantcollabortive.org

The iPlant Collaborative is funded by a grant from the National Science Foundation Plant Cyberinfrastructure Program (#DBI ).