RNA-Seq Visualization cummrRbund in Atmosphere Jason Williams iPlant / Cold Spring Harbor Laboratory
*Graphics taken from these publications
The Tuxedo Protocol *TopHat and Cufflinks require a sequenced genome
Tophat Explain reference-sequence based NGS read alignments. Explain that we are skipping the cufflinks step because the Arabidopsis transcriptome is so well annotated that we can use the TAIR gene models as our refernce transcripts for CuffDiff
TopHat outputs in IGV
Using CummeRbund in Atmosphere Explain that we are skipping the cufflinks step because the Arabidopsis transcriptome is so well annotated that we can use the TAIR gene models as our refernce transcripts for CuffDiff
Using CummeRbund in Atmosphere Visualize and mine Cuffdiff results Output files from Cuffdiff are reorganized into a local database Explain that we are skipping the cufflinks step because the Arabidopsis transcriptome is so well annotated that we can use the TAIR gene models as our refernce transcripts for CuffDiff
Choose the right image We will be using “RNA-Seq Visualization” Rmi-BE9C2D12 Any image w/R can work, and you could also search For an image with cummeRbund installed Explain that we are skipping the cufflinks step because the Arabidopsis transcriptome is so well annotated that we can use the TAIR gene models as our refernce transcripts for CuffDiff
Installing cummeRbund in R
Installing cummeRbund in R
Reading the data in > cuff <- readCufflinks() > cuff CuffSet instance with: 2 samples 33714 genes 43481 isoforms 35113 TSS 32924 CDS 33621 promoters 35113 splicing 27350 relCDS
Visualizing dispersion >disp<-dispersionPlot(genes(cuff)) >disp Counts vs. dispersion Overdispersion greater variability in a data set than would be expected based on a given model ( in our case extra-Poisson variation) If you use Poisson model, you will overestimate differential expression
Visualizing dispersion Poisson adequately describes technical variation http://www.fgcz.ch/education/StatMethodsExpression/03_Count_data_analysis.pdf
Visualizing dispersion
Squared-coefficient of Variation (SCV) >genes.scv<-fpkmSCVPlot(genes(cuff)) >genes.scv Normalized measure of cross-replicate variability Represents the relationship of the standard deviation to the mean Differences in SCV can result in lower numbers of differentially expressed genes due to a higher degree of variability between replicate fpkm estimates
Distributions of FPKM scores across samples >dens<-csDensity(genes(cuff)) >dens >densRep<-csDensity(genes(cuff),replicates=T) >densRep Non-parametric estimate of pdf
FPKM Pairwise Scatter Plots > csScatter(genes(cuff),‘WT’,‘hy5’,smooth=T)
Saving your Plots 1. Plot type: >(e.g. jpeg, png, pdf) (file_path_and_file_name) 2. Plot function 3. dev.off() > png (‘csScatter.png’) #Will save in working directory > csScatter(genes(cuff),‘WT’,‘hy5’,smooth=T) >dev.off
Selecting and Filtering Gene Sets Using the ‘getSig’ function # Enables you to get genes at significance n >sig <-getSig(cuff, alpha=0.05, level =‘genes’) # genes of significance 0.05 >length(sig) #returns the number of genes in the sig object >sig <-getSig(cuff, alpha=0, level=‘genes’) >tail(sig,100) #displays the last 100 genes in the sig object you just made
Selecting and Filtering Gene Sets Using the ‘getGenes’ function # Get the gene information >sigGenes <- getGenes(cuff,sig) Plot this in another scatter plot >csScatter(sigGenes, ‘WT’, ‘hy5’)
Heat mapping Similar Expression Values >sigGenes <-getGenes(cuff,tail(sig,50)) #last 50 genes in the list we created of genes >csHeatmap(sigGenes,cluster=‘both’)
Heat mapping Similar Expression Values >csHeatmap(sigGenes,cluster=‘both’,replicates=‘T’)
Expression Plots by Genes > myGeneId<-”AT5G41471" > myGene<-getGene(cuff,myGeneId) > myGene
Expression Plots by Genes > expressionPlot(myGene,replicates=‘T’)