Statistical Analysis of Microarray Data

Slides:



Advertisements
Similar presentations
1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html
Advertisements

From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
INT 506/706: Total Quality Management Lec #9, Analysis Of Data.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Getting the numbers comparable
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Microarray Data Preprocessing and Clustering Analysis
Gene Expression Data Analyses (3)
Differentially expressed genes
CDNA Microarray Design and Pre-processing By H. Bjørn Nielsen.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
1 Test of significance for small samples Javier Cabrera.
Biol 500: basic statistics
Chi-Square and F Distributions Chapter 11 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Introduction to DNA microarrays DTU - January Hanne Jarmer.
Scanning and image analysis Scanning -Dyes -Confocal scanner -CCD scanner Image File Formats Image analysis -Locating the spots -Segmentation -Evaluating.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 5 – Testing for equivalence or non-inferiority. Power.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Inferential Statistics: SPSS
Multiple testing in high- throughput biology Petter Mostad.
Hypothesis testing – mean differences between populations
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Differential Gene Expression Dennis Kostka, Christine Steinhoff Slides adapted from Rainer Spang.
Analysis of variance Petter Mostad Comparing more than two groups Up to now we have studied situations with –One observation per object One.
ANOVA (Analysis of Variance) by Aziza Munir
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Statistical analysis of expression data: Normalization, differential expression and multiple testing Jelle Goeman.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
The Statistical Analysis of Data. Outline I. Types of Data A. Qualitative B. Quantitative C. Independent vs Dependent variables II. Descriptive Statistics.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
Statistics for Differential Expression Naomi Altman Oct. 06.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
Introduction to Microarrays. The Central Dogma.
ANOVA P OST ANOVA TEST 541 PHL By… Asma Al-Oneazi Supervised by… Dr. Amal Fatani King Saud University Pharmacy College Pharmacology Department.
For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Kin 304 Inferential Statistics Probability Level for Acceptance Type I and II Errors One and Two-Tailed tests Critical value of the test statistic “Statistics.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Analysis of variance Tron Anders Moger
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
ANalysis Of VAriance (ANOVA) Used for continuous outcomes with a nominal exposure with three or more categories (groups) Result of test is F statistic.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Basic Statistic for Research Dr. Subash Gopinath School of Bioprocess Engineering, UniMAP.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Differential Gene Expression
Getting the numbers comparable
Statistical Analysis of Microarray Data
Presentation transcript:

Statistical Analysis of Microarray Data By H. Bjørn Nielsen & Hanne Jarmer

The DNA Array Analysis Pipeline Question Experimental Design Array design Probe design Sample Preparation Hybridization Buy Chip/Array Image analysis Normalization Expression Index Calculation Comparable Gene Expression Data Statistical Analysis Fit to Model (time series) Advanced Data Analysis Clustering PCA Classification Promoter Analysis Meta analysis Survival analysis Regulatory Network

What's the question? without alcohol with alcohol Typically we want to identify differentially expressed genes Example: alcohol dehydrogenase is expressed at a higher level when alcohol is added to the media alcohol dehydrogenase without alcohol with alcohol

Statistics However, the measurements contain stochastic noise There is no way around it He’s going to say it Statistics

You can choose to think of statistics as a black box Noisy measurements p-value statistics But, you still need to understand how to interpret the results

The output of the statistics P-value The chance of rejecting the null hypothesis by coincidence ---------------------------- For gene expression analysis we can say: the chance that a gene is categorized as differentially expressed by coincidence

The statistics gives us a p-value for each gene We can rank the genes according to the p-value But, we can’t really trust the p-value in a strict statistical way!

Why not! For two reasons: We are rarely fulfilling all the assumptions of the statistical test We have to take multi-testing into account

The t-test Assumptions 1. The observations in the two categories must be independent 2. The observations should be normally distributed 3. The sample size must be ‘large’ (>30 replicates)

Multi-testing? In a typical microarray analysis we test thousands of genes If we use a significance level of 0.05 and we test 1000 genes. We expect 50 genes to be significant by chance 1000 x 0.05 = 50

Correction for multiple testing Bonferroni: P ≤ 0.01 N Confidence level of 99% Benjamini-Hochberg: P ≤ i N 0.01 N = number of genes i = number of accepted genes

But really, those methods are too strict What we can trust is the ability of the statistical test to rank the genes according to their reliability The number of genes that are needed or can be handled in downstream processes can be used to set the cutoff If we permute the samples we can get an estimate of the False Discovery Rate (FDR) in this set

Volcano Plot log2 fold change (M) P-value

What's inside the black box ‘statistics’ t-test or ANOVA

The t-test Calculate T Lookup T in a table

The t-test II The t-test tests for difference in means () wt wt mut mutant Intensity of gene x Density

The t-test III The t statistic is based on the sample mean and variance t

ANOVA ANalysis Of Variance Very similar to the t-test, but can test multiple categories Ex: is gene x differentially expressed between wt, mutant 1 and mutant 2 Advantage: it has more ‘power’ than the t-test

ANOVA II Density Intensity Variance between groups Variance within groups Density Intensity

Blocks and paired tests Some undesired factors may influence the experiments, the effect of such can be greatly reduced if they are blocked out or if the experiment is paired. Some possible blocks: - Dye - Patient - Technician - Batch - Day of experiment - Array

Paired t-test Block 1 Block 2 Block 3 Hypothesis: There is no difference between the mean BLUE and RED Block 1 Block 2 Block 3

An example: (2-way ANOVA with blocks) Experimental Design: A187 CreA Glucose Ethanol Glucose Ethanol 3x Everything was done in batches to capture the systematic noise

Example: Batch to batch variation Within batch variation is lower than the between batch variation

Example: Data analysis - Blocking We can capture the batch variation by blocking Two-way ANOVA Glucose Ethanol A187 CreA Effect 1 Effect 2 Batch A B C

Ex: Result of the 2-way ANOVA (3 p-values) A genotype p-value A growth media p-value An interaction p-value Two-way ANOVA Ethanol Glucose A187 CreA Media effect Genotype effect Batch A B C

Conclusion Array data contains stochastic noise Therefore statistics is needed to conclude on differential expression We can’t really trust the p-value But the statistics can rank genes The capacity/needs of downstream processes can be used to set cutoff FDR can be estimated t-test is used for two category tests ANOVA is used for multiple categories