CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.

Slides:



Advertisements
Similar presentations
Shibing Deng Pfizer, Inc. Efficient Outlier Identification in Lung Cancer Study.
Advertisements

CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
1 Statistical Inference H Plan: –Discuss statistical methods in simulations –Define concepts and terminology –Traditional approaches: u Hypothesis testing.
OHRI Bioinformatics Introduction to the Significance Analysis of Microarrays application Stem.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Chapter Seventeen HYPOTHESIS TESTING
MARE 250 Dr. Jason Turner Hypothesis Testing II To ASSUME is to make an… Four assumptions for t-test hypothesis testing: 1. Random Samples 2. Independent.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.
Gene Expression Data Analyses (3)
Differentially expressed genes
Statistical Analysis of Microarray Data
Discovery of differentially expressed genes by statistical methods Esa Uusipaikka Department of Statistics University of Turku Microarray Bioinformatics.
Analysis of Variance Chapter 3Design & Analysis of Experiments 7E 2009 Montgomery 1.
1 Test of significance for small samples Javier Cabrera.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Today Concepts underlying inferential statistics
\department of mathematics and computer science Supervised microarray data analysis Mark van de Wiel.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 14: Non-parametric tests Marshall University Genomics.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Chapter 14 Inferential Data Analysis
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
Multiple testing in high- throughput biology Petter Mostad.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Essential Statistics in Biology: Getting the Numbers Right
COMM 250 Agenda - Week 12 Housekeeping RP2 Due Wed. RAT 5 – Wed. (FBK 12, 13) Lecture Experiments Descriptive and Inferential Statistics.
Assume we have two experimental conditions (j=1,2) We measure expression of all genes n times under both experimental conditions (n two- channel.
First approach - repeating a simple analysis for each gene separately - 30k times Assume we have two experimental conditions (j=1,2) We measure.
Controlling FDR in Second Stage Analysis Catherine Tuglus Work with Mark van der Laan UC Berkeley Biostatistics.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Chapter 15 – Analysis of Variance Math 22 Introductory Statistics.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Techniques for Analysing Microarrays Which genes are involved in ovarian and prostate cancer?
Statistics for Differential Expression Naomi Altman Oct. 06.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 10 The t Test for Two Independent Samples
Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
1 Significance analysis of Microarrays (SAM) Applied to the ionizing radiation response Tusher, Tibshirani, Chu (2001) Dafna Shahaf.
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Statistical Inferences for Variance Objectives: Learn to compare variance of a sample with variance of a population Learn to compare variance of a sample.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 15: Sample size and Power Marshall University Genomics.
A New Statistical Method for Analyzing Longitudinal Multifactor Expression Data and It ’ s Application to Time Course Burn Data Baiyu Zhou Department of.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Differential Gene Expression
Significance Analysis of Microarrays (SAM)
Statistical Methods Carey Williamson Department of Computer Science
Single-Factor Studies
Elementary Statistics
Single-Factor Studies
Significance Analysis of Microarrays (SAM)
Carey Williamson Department of Computer Science University of Calgary
Presentation transcript:

CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: University of South Carolina Department of Computer Science and Engineering

Outline The problem: identifying Diff Expressed Genes Statistic Methods: t-test Non-parametric: Rank product Summary 10/9/20152

The Biological Problem: Identify Differentially Expressed Genes 3 No treatmentTreatment Which pathways will be affected? Which genes are involved?

Identify differentially expressed genes One of the core goals of microarray data analysis is to identify which of the genes show good evidence of being DE. This goal has two parts. 1. The first is select a statistic which will rank the genes in order of evidence for differential expression, from strongest to weakest evidence. 2. The second is to choose a critical-value for the ranking statistic above which any value is considered to be significant.

k-fold change 1.measure of differential expression by the ratio of expression levels between two samples 2.genes with ratios above a fixed cut-off k that is, those whose expression underwent a k-fold change, were said to be differentially expressed 3.this test is not a statistical test, and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed

k-fold change 4.replication is essential in experimental design because it allows an estimate of variability 5. ability to assess such variability allows identification of biologically reproducible changes in gene expression levels

Standard statistical tests 1.More typically, researchers now rely on variants of common statistical tests. 2.These generally involve two parts: calculating a test statistic and determining the significance of the observed statistic. 3.A standard statistical test for detecting significant change between repeated measurements of a variable in two groups is the t-test; 4.this can be generalized to multiple groups via the ANOVA F statistic.

Standard statistical tests 1.For most practical cases, computing a standard t or F statistic is appropriate, although referring to the t or F distributions to determine significance is often not. 2. The main hazard in using such methods occurs when there are too few replicates to obtain an accurate estimate of experimental variances. In such cases, modeling methods that use pooled variance estimates may be helpful.

Standard statistical tests 1.Regardless of the test statistic used, one must determine its significance 2.Standard interpretations of t-like tests assume that the data are sampled from normal populations with equal variances 3.Expression data may fail to satisfy either or both of these constraints

Standard statistical tests 1.use of non-parametric rank-based statistics is also common, via both traditional statistical methods and 2.ad hoc ones designed specifically for microarray data

RankProd : a non-parametric method to detect differentially regulated genes in replicated experiments (1) originates from an analysis of biological reasoning, easy to understand (2) fast, simple and robust to outliers (suitable for noisy data ) (3) provides statistical significance for each gene and allows for the control of the overall significance (e.g., false discovery rate) (4) provides straightforward way for cross-platform meta-analysis (integrates data generated at different laboratories/under different environments into one study, and achieves increased power) What does it do? What is the method implemented in the package RankProd utilizes the so called rank product non-parametric method (Breitling et al., 2004 ) to identify up-regulated or down-regulated genes under one condition against another condition. Rank Product is a non-parametric statistic which detects items that are consistently highly ranked in a number of lists, for example genes that are consistently found among the most strongly unregulated genes in a number of replicate experiments. How does it compare to other methods for similar purpose

Rank Product Calculate RP: Calculate significance

Permutation tests for calulating significance levels Permutation tests, generally carried out by repeatedly scrambling the samples’ class labels and computing t statistics for all genes in the scrambled data, best capture the unknown structure of the data. Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA 98, (2001). Golub, T.R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, (1999). Dudoit, S., Yang, Y.-H., Callow, M.J. & Speed, T.P. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Technical Report 578 (Department of Statistics, University of California at Berkeley, Berkeley, CA, 2000).

Summary The problem: Identify Differentially expressed genes from Microarray data How to identify: t-test and Rank product How to evaluate significance of identified genes