Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.

Similar presentations


Presentation on theme: "Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented."— Presentation transcript:

1 Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented at University of Texas, Health Science Center – San Antonio 25 March 2015

2 Session 2 Part 2 -Sample size for RNA Seq experiments -DNA methylation (minfi) Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

3 3 How many RNA Seq samples? Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 -Compared the tools for deferential expression analysis -Assessed the association between number of samples and statistical power.

4 4 How many RNA Seq samples? The more, the better to identify differentially expressed genes. Statistical power saturates with 5-10 samples. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 Sample s Statistical power Ching et al..

5 5 How many RNA Seq samples? DESeq2 and edgeR are relatively more powerful tools. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 Statistical power Ching et al..

6 6 Similar pattern in other datasets Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

7 7 Higher dispersion => more samples needed Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

8 8 The value of paired samples Design your experiments as structured as you can, e.g., Pair normal- tumor tissues, pre-post treatment, etc. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 Sample s Statistical power

9 9 The value of paired samples Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 2 datasets

10 10 Concluding remarks Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 - Remove the genes with “low” counts in all conditions, e.g., those with counts <5 in every condition (Rau et al. 2013). - While more samples leads to higher statistical power, the gain is negligible beyond a certain number (5-10). - The gain in power gain is minimal beyond 5–20 million reads. - Paired-sample data increases statistical power. Structure your experiment and use multifactor analysis.

11 More references on sample size estimation: Ching, Travers, Sijia Huang, and Lana X. Garmire. "Power analysis and sample size estimation for RNA-Seq differential expression." Rna 20.11 (2014): 1684- 1696. Hart, Steven N., et al. "Calculating sample size estimates for RNA sequencing data." Journal of Computational Biology 20.12 (2013): 970-978. Wu, Hao, Chi Wang, and Zhijin Wu. "PROPER: comprehensive power evaluation for differential expression using RNA-seq." Bioinformatics 31.2 (2015): 233-241. Rau A, Gallopin M, Celeux G, Jaffrezic F. 2013. Data-based filtering for replicated high-throughput transcriptome sequencing experiments. Bioinformatics 29: 2146–2152. 11 Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

12 12 DNA methylation data analysis Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 - More than 10,000 DNA methylation samples are available through TCGA and GEO. - Analysis of DNA methylation data is still evolving.

13 13 46 Bioconductor packages As of March 2016 Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

14 14 minfi Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 -Reads data ( Illumina’s 450k array IDAT files ) into R -Performs QC and normalization -Identifies differential methylation positions (DMP)

15 15 Installing minfi and minfiData Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 source("https://bioconductor.org/biocLite.R") biocLite("minfi") biocLite("minfiData")

16 16 Analyzing an example dataset Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 browseVignettes("minfi") We follow the package vignette.

17 17 Reading data Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 baseDir <- system.file("extdata", package = "minfiData") list.files(baseDir) targets <- read.450k.sheet(baseDir) RGset <- read.450k.exp(targets = targets) pd <- pData(Rgset) ## phenotypic data pd

18 18 QC Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 densityPlot(RGset, sampGroups = pd$Sample_Group, main = "Beta", xlab = "Beta”) Beta values are expected to cluster around 0 or 1.

19 19 QC Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 par(oma=c(2,10,1,1)) densityBeanPlot(RGset, sampGroups = pd$Sample_Group, sampNames = pd$Sample_Name)

20 20 Normalization Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 MSet.norm <- preprocessIllumina(RGset, bg.correct = TRUE, normalize = "controls", reference = 2) Different methods for normalization are proposed and still being developed…

21 21 Multi-dimensional scaling (MDS) plot Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 mdsPlot(MSet.norm, numPositions = 1000, sampGroups = pd$Sample_Group, sampNames =pd$Sample_Name) Similar to PCA, it is useful to identify outlier samples.

22 22 Getting M values Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 ## A small subset to speed up the demo: mset <- MSet.norm[1:20000,] ## Getting the M values: M <- getM(mset, type = "beta", betaThreshold = 0.001) -M values show the level of methylation. They are logit transformed beta values, and more appropriate for DMP analysis (Pan et al.) -Beta values ≤ 0.001, or more than 0.999 are truncated to avoid numerical issues.

23 23 Differentially methylated positions Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 dmp <- dmpFinder(M, pheno=pd$Sample_Group, type="categorical") head(dmp) Rows ordered by p-value.

24 24 Plotting methylation levels Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 cpgs <- rownames(dmp)[1:4] par(mfrow=c(2,2)) plotCpg(mset, cpg=cpgs, pheno=pd$Sample_Group)

25 Dr. Habil Zare, PhD The PI Computational Biologist Dr. Amir Forpushani, PhD Postdoc, Computational Biologist Rupesh Agrihari Grad student, Computer Science Acknowledgments (Oncinfo Lab Members) 25 I would like to thank Amir for helping me in preparing the pathway analysis slides, and Rupesh for his assistance to the audience during the workshop.

26 References: Pan Du, Xiao Zhang, Chiang-Ching et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics, 11:587, 2010. 26 Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016


Download ppt "Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented."

Similar presentations


Ads by Google