Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.

Slides:



Advertisements
Similar presentations
BiGCaT Bioinformatics Hunting strategy of the bigcat.
Advertisements

Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Exploring the Human Transcriptome
Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine
Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular.
RNA-seq: the future of transcriptomics ……. ?
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
MCB Lecture #21 Nov 20/14 Prokaryote RNAseq.
University of Louisville The Department of Bioinformatics and Biostatistics.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Educational Initiatives and Data Analysis in the Microarray Core Danny Park Bioinformatics (Sidney St) Lipid Metabolism Unit (Freeman)
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
High Throughput Sequencing
A cell and its population of genes :. DNA forms double strands by a process called hybridization:
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Detecting enriched regions (Chip- seq, RIP-seq) Statistical evaluation of enriched regions Data displayed in Genome Browser Detection of enriched motifs.
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
CDNA Microarrays MB206.
RNAseq analyses -- methods
Agenda Introduction to microarrays
Dr Andrew Harrison Departments of Mathematical Sciences and Biological Sciences University of Essex Looking for signals in tens of thousands.
Lecture 11. Microarray and RNA-seq II
Eran Yanowski, Eran Hornstein’s: Monitor drug impact on the transcriptome of mouse beta cells (primary and cell-line) using Transeq/RNA-Seq Report.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
The iPlant Collaborative
A quick introduction to Oncinfo Lab Dr. Habil Zare, PhD PI of Oncinfo Lab Department of Computer Science Texas State University 18 September 2015.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Bioinformatics Curriculum Issues, goals, curriculum.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Bioinformatics support at School of Biological Sciences
Microarray Data Analysis The Bioinformatics side of the bench.
The iPlant Collaborative
Bioinformatics for biologists
No reference available
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
AN INTRODUCTION TO GENE EXPRESSION ANALYSIS BY MICROARRAY TECHNIQUE (PART II) DR. AYAT B. AL-GHAFARI MONDAY 10 TH OF MUHARAM 1436.
Canadian Bioinformatics Workshops
Bioinformatics Shared Resource Introduction to Gene Expression Omnibus (GEO) bsrweb.sanfordburnham.org
Canadian Bioinformatics Workshops
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 3.
基于 R/Bioconductor 进行生物芯片数据分析 曹宗富 博奥生物有限公司
Statistics Behind Differential Gene Expression
RNA Quantitation from RNAseq Data
An Introduction to RNA-Seq Data and Differential Expression Tools in R
Bioinformatics for biologists (2)
RNA-Seq analysis in R (Bioconductor)
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
The RNA-Seq Bid Idea: Statistical Design and Analysis for RNA Sequencing Data The RNA-Seq Big Idea Team: Yaqing Zhao1,2, Erika Cule1†, Andrew Gehman1,
Lab meeting
Bioinformatics for biologists
Eigengenes as biological signatures
Day 4 Session 22: Questions and follow-up…. James C. Fleet, PhD
Day 2: Session 8: Questions and follow-up…. James C. Fleet, PhD
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center
Assessing changes in data – Part 2, Differential Expression with DESeq2
Gene Expression Analysis
NMDS clustering of sample types and differential expression analysis.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Transcriptomics Data Visualization Using Partek Flow Software
Schematic representation of a transcriptomic evaluation approach.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Differential Expression of RNA-Seq Data
Presentation transcript:

Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented at University of Texas, Health Science Center – San Antonio 25 March 2015

Session 2 Part 2 -Sample size for RNA Seq experiments -DNA methylation (minfi) Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

3 How many RNA Seq samples? Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March Compared the tools for deferential expression analysis -Assessed the association between number of samples and statistical power.

4 How many RNA Seq samples? The more, the better to identify differentially expressed genes. Statistical power saturates with 5-10 samples. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 Sample s Statistical power Ching et al..

5 How many RNA Seq samples? DESeq2 and edgeR are relatively more powerful tools. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 Statistical power Ching et al..

6 Similar pattern in other datasets Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

7 Higher dispersion => more samples needed Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

8 The value of paired samples Design your experiments as structured as you can, e.g., Pair normal- tumor tissues, pre-post treatment, etc. Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 Sample s Statistical power

9 The value of paired samples Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March datasets

10 Concluding remarks Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March Remove the genes with “low” counts in all conditions, e.g., those with counts <5 in every condition (Rau et al. 2013). - While more samples leads to higher statistical power, the gain is negligible beyond a certain number (5-10). - The gain in power gain is minimal beyond 5–20 million reads. - Paired-sample data increases statistical power. Structure your experiment and use multifactor analysis.

More references on sample size estimation: Ching, Travers, Sijia Huang, and Lana X. Garmire. "Power analysis and sample size estimation for RNA-Seq differential expression." Rna (2014): Hart, Steven N., et al. "Calculating sample size estimates for RNA sequencing data." Journal of Computational Biology (2013): Wu, Hao, Chi Wang, and Zhijin Wu. "PROPER: comprehensive power evaluation for differential expression using RNA-seq." Bioinformatics 31.2 (2015): Rau A, Gallopin M, Celeux G, Jaffrezic F Data-based filtering for replicated high-throughput transcriptome sequencing experiments. Bioinformatics 29: 2146– Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

12 DNA methylation data analysis Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March More than 10,000 DNA methylation samples are available through TCGA and GEO. - Analysis of DNA methylation data is still evolving.

13 46 Bioconductor packages As of March 2016 Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016

14 minfi Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March Reads data ( Illumina’s 450k array IDAT files ) into R -Performs QC and normalization -Identifies differential methylation positions (DMP)

15 Installing minfi and minfiData Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 source(" biocLite("minfi") biocLite("minfiData")

16 Analyzing an example dataset Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 browseVignettes("minfi") We follow the package vignette.

17 Reading data Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 baseDir <- system.file("extdata", package = "minfiData") list.files(baseDir) targets <- read.450k.sheet(baseDir) RGset <- read.450k.exp(targets = targets) pd <- pData(Rgset) ## phenotypic data pd

18 QC Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 densityPlot(RGset, sampGroups = pd$Sample_Group, main = "Beta", xlab = "Beta”) Beta values are expected to cluster around 0 or 1.

19 QC Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 par(oma=c(2,10,1,1)) densityBeanPlot(RGset, sampGroups = pd$Sample_Group, sampNames = pd$Sample_Name)

20 Normalization Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 MSet.norm <- preprocessIllumina(RGset, bg.correct = TRUE, normalize = "controls", reference = 2) Different methods for normalization are proposed and still being developed…

21 Multi-dimensional scaling (MDS) plot Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 mdsPlot(MSet.norm, numPositions = 1000, sampGroups = pd$Sample_Group, sampNames =pd$Sample_Name) Similar to PCA, it is useful to identify outlier samples.

22 Getting M values Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 ## A small subset to speed up the demo: mset <- MSet.norm[1:20000,] ## Getting the M values: M <- getM(mset, type = "beta", betaThreshold = 0.001) -M values show the level of methylation. They are logit transformed beta values, and more appropriate for DMP analysis (Pan et al.) -Beta values ≤ 0.001, or more than are truncated to avoid numerical issues.

23 Differentially methylated positions Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 dmp <- dmpFinder(M, pheno=pd$Sample_Group, type="categorical") head(dmp) Rows ordered by p-value.

24 Plotting methylation levels Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016 cpgs <- rownames(dmp)[1:4] par(mfrow=c(2,2)) plotCpg(mset, cpg=cpgs, pheno=pd$Sample_Group)

Dr. Habil Zare, PhD The PI Computational Biologist Dr. Amir Forpushani, PhD Postdoc, Computational Biologist Rupesh Agrihari Grad student, Computer Science Acknowledgments (Oncinfo Lab Members) 25 I would like to thank Amir for helping me in preparing the pathway analysis slides, and Rupesh for his assistance to the audience during the workshop.

References: Pan Du, Xiao Zhang, Chiang-Ching et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics, 11:587, Bioinformatics for biologists workshop (2), Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 25 March 2016