Download presentation
Presentation is loading. Please wait.
Published byCharlene Mitchell Modified over 9 years ago
1
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq visualization with cummeRbund
2
Papers and source materials Useful References *Graphics taken from these publications
3
Tuxedo Workflow Differential expression *TopHat and Cufflinks require a sequenced genome
4
Discovery Environment Using a GUI Tophat (bowtie) Cufflinks Cuffmerge Cuffdiff CummeRbund Your Data iPlant Data Store FASTQ Discovery Environment Atmosphere
5
CummeRbund Bioconductor R library; Getting started in Atmosphre “Allows for persistent storage, access, exploration, and manipulation of Cufflinks high-throughput sequencing data. In addition, provides numerous plotting functions for commonly used visualizations.” Any image w/R can work, and you could also search for an image with cummeRbund installed
6
Bring your Data into Atmosphere Using iCommands or iDrop
7
Connect with VNC Visualization use case for Atmosphere VNC Viewer
8
Installing CummeRbund Available via Bioconductor
9
Examine the cufflinks data > cuff <- readCufflinks() > cuff CuffSet instance with: 2 samples 33714 genes 43481 isoforms 35113 TSS 32924 CDS 33621 promoters 35113 splicing 27350 relCDS
10
Visualize sample dispersion >disp<-dispersionPlot(genes(cuff)) >disp Counts vs. dispersion Overdispersion greater variability in a data set than would be expected based on a given model ( in our case extra-Poisson variation) If you use Poisson model, you will overestimate differential expression
11
Variation matters http://www.fgcz.ch/education/StatMethodsExpression/03_Count_data_analysis.pdf Poisson adequately describes technical variation
12
Overdispersed Data
13
Squared Coefficient of Variation >genes.scv<-fpkmSCVPlot(genes(cuff)) >genes.scv Normalized measure of cross-replicate variability Represents the relationship of the standard deviation to the mean Differences in SCV can result in lower numbers of differentially expressed genes due to a higher degree of variability between replicate fpkm estimates
14
Distributions of FPKM scores across samples >dens<-csDensity(genes(cuff)) >dens >densRep<-csDensity(genes(cuff),replicates=T) >densRep Non-parametric estimate of pdf
15
FPKM Pairwise Scatter Plots > csScatter(genes(cuff),‘WT’,‘hy5’,smooth=T)
16
Saving your Plots Just in case you are not working in R studio 1. Plot type: >(e.g. jpeg, png, pdf) (file_path_and_file_name) 2. Plot function 3. dev.off() > png (‘csScatter.png’) #Will save in working directory > csScatter(genes(cuff),‘WT’,‘hy5’,smooth=T) >dev.off
17
Selecting and Filtering Gene Sets Using the ‘getSig’ function # Enables you to get genes at significance n >sig <-getSig(cuff, alpha=0.05, level =‘genes’) # genes of significance 0.05 >length(sig) #returns the number of genes in the sig object >sig <-getSig(cuff, alpha=0, level=‘genes’) >tail(sig,100) #displays the last 100 genes in the sig object you just made
18
Selecting and Filtering Gene Sets Using the ‘getGenes’ function # Get the gene information >sigGenes <- getGenes(cuff,sig) Plot this in another scatter plot >csScatter(sigGenes, ‘WT’, ‘hy5’)
19
Heat mapping Similar Expression Values >sigGenes <-getGenes(cuff,tail(sig,50)) #last 50 genes in the list we created >csHeatmap(sigGenes,cluster=‘both’)
20
Heat mapping Similar Expression Values >csHeatmap(sigGenes,cluster=‘both’,replicates=‘T’)
21
Expression Plots by Genes > myGeneId<-”AT5G41471" > myGene<-getGene(cuff,myGeneId) > myGene
22
Expression Plots by Genes > expressionPlot(myGene,replicates=‘T’)
23
Keep asking: ask.iplantcollabortive.org
24
The iPlant Collaborative is funded by a grant from the National Science Foundation Plant Cyberinfrastructure Program (#DBI-0735191).
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.