The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq visualization with cummeRbund
Papers and source materials Useful References *Graphics taken from these publications
Tuxedo Workflow Differential expression *TopHat and Cufflinks require a sequenced genome
Discovery Environment Using a GUI Tophat (bowtie) Cufflinks Cuffmerge Cuffdiff CummeRbund Your Data iPlant Data Store FASTQ Discovery Environment Atmosphere
CummeRbund Bioconductor R library; Getting started in Atmosphre “Allows for persistent storage, access, exploration, and manipulation of Cufflinks high-throughput sequencing data. In addition, provides numerous plotting functions for commonly used visualizations.” Any image w/R can work, and you could also search for an image with cummeRbund installed
Bring your Data into Atmosphere Using iCommands or iDrop
Connect with VNC Visualization use case for Atmosphere VNC Viewer
Installing CummeRbund Available via Bioconductor
Examine the cufflinks data > cuff <- readCufflinks() > cuff CuffSet instance with: 2 samples genes isoforms TSS CDS promoters splicing relCDS
Visualize sample dispersion >disp<-dispersionPlot(genes(cuff)) >disp Counts vs. dispersion Overdispersion greater variability in a data set than would be expected based on a given model ( in our case extra-Poisson variation) If you use Poisson model, you will overestimate differential expression
Variation matters Poisson adequately describes technical variation
Overdispersed Data
Squared Coefficient of Variation >genes.scv<-fpkmSCVPlot(genes(cuff)) >genes.scv Normalized measure of cross-replicate variability Represents the relationship of the standard deviation to the mean Differences in SCV can result in lower numbers of differentially expressed genes due to a higher degree of variability between replicate fpkm estimates
Distributions of FPKM scores across samples >dens<-csDensity(genes(cuff)) >dens >densRep<-csDensity(genes(cuff),replicates=T) >densRep Non-parametric estimate of pdf
FPKM Pairwise Scatter Plots > csScatter(genes(cuff),‘WT’,‘hy5’,smooth=T)
Saving your Plots Just in case you are not working in R studio 1. Plot type: >(e.g. jpeg, png, pdf) (file_path_and_file_name) 2. Plot function 3. dev.off() > png (‘csScatter.png’) #Will save in working directory > csScatter(genes(cuff),‘WT’,‘hy5’,smooth=T) >dev.off
Selecting and Filtering Gene Sets Using the ‘getSig’ function # Enables you to get genes at significance n >sig <-getSig(cuff, alpha=0.05, level =‘genes’) # genes of significance 0.05 >length(sig) #returns the number of genes in the sig object >sig <-getSig(cuff, alpha=0, level=‘genes’) >tail(sig,100) #displays the last 100 genes in the sig object you just made
Selecting and Filtering Gene Sets Using the ‘getGenes’ function # Get the gene information >sigGenes <- getGenes(cuff,sig) Plot this in another scatter plot >csScatter(sigGenes, ‘WT’, ‘hy5’)
Heat mapping Similar Expression Values >sigGenes <-getGenes(cuff,tail(sig,50)) #last 50 genes in the list we created >csHeatmap(sigGenes,cluster=‘both’)
Heat mapping Similar Expression Values >csHeatmap(sigGenes,cluster=‘both’,replicates=‘T’)
Expression Plots by Genes > myGeneId<-”AT5G41471" > myGene<-getGene(cuff,myGeneId) > myGene
Expression Plots by Genes > expressionPlot(myGene,replicates=‘T’)
Keep asking: ask.iplantcollabortive.org
The iPlant Collaborative is funded by a grant from the National Science Foundation Plant Cyberinfrastructure Program (#DBI ).