Significance analysis of microarrays (SAM) SAM can be used to pick out significant genes based on differential expression between sets of samples. Currently.

Slides:



Advertisements
Similar presentations
Mixed Designs: Between and Within Psy 420 Ainsworth.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Experimental Design Internal Validation Experimental Design I. Definition of Experimental Design II. Simple Experimental Design III. Complex Experimental.
Covariance and Correlation: Estimator/Sample Statistic: Population Parameter: Covariance and correlation measure linear association between two variables,
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
OHRI Bioinformatics Introduction to the Significance Analysis of Microarrays application Stem.
1 Def: Let and be random variables of the discrete type with the joint p.m.f. on the space S. (1) is called the mean of (2) is called the variance of (3)
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.
Gene Expression Data Analyses (3)
Differentially expressed genes
Statistical Analysis of Microarray Data
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Microarrays: Common Analysis Approaches  Missing Value Estimation  Differentially Expressed Genes  Clustering Algorithms  Principal Components Analysis.
1 Test of significance for small samples Javier Cabrera.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 13 Using Inferential Statistics.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
5-3 Inference on the Means of Two Populations, Variances Unknown
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
General Linear Model & Classical Inference
Choosing Statistical Procedures
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Multiple testing in high- throughput biology Petter Mostad.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
 Mean: true average  Median: middle number once ranked  Mode: most repetitive  Range : difference between largest and smallest.
Differential Analysis & FDR Correction
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Essential Statistics in Biology: Getting the Numbers Right
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
General Linear Model & Classical Inference London, SPM-M/EEG course May 2014 C. Phillips, Cyclotron Research Centre, ULg, Belgium
Differential Gene Expression Dennis Kostka, Christine Steinhoff Slides adapted from Rainer Spang.
Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Bioinformatics for Stem Cell Lecture 2 Debashis Sahoo, PhD.
Jeopardy Hypothesis Testing t-test Basics t for Indep. Samples Related Samples t— Didn’t cover— Skip for now Ancient History $100 $200$200 $300 $500 $400.
1 Statistical Significance Testing. 2 The purpose of Statistical Significance Testing The purpose of Statistical Significance Testing is to answer the.
Experimental Psychology PSY 433 Appendix B Statistics.
MRNA Expression Experiment Measurement Unit Array Probe Gene Sequence n n n Clinical Sample Anatomy Ontology n 1 Patient 1 n Disease n n ProjectPlatform.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Analysis of Affy 1.0 ST Gene Array Data in R To analyze Affymetrix 1.0 ST data (exon or gene) you need: Expression data in.CEL format A CDF (chip definition.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
Cluster validation Integration ICES Bioinformatics.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing Expression Data: Clustering and Stats Chapter 16.
T tests comparing two means t tests comparing two means.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
The Broad Institute of MIT and Harvard Differential Analysis.
1 Significance analysis of Microarrays (SAM) Applied to the ionizing radiation response Tusher, Tibshirani, Chu (2001) Dafna Shahaf.
From Wikipedia: “Parametric statistics is a branch of statistics that assumes (that) data come from a type of probability distribution and makes inferences.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Nonparametric Statistics
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Canadian Bioinformatics Workshops
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Canadian Bioinformatics Workshops
Significance analysis of microarrays (SAM)
Significance Analysis of Microarrays (SAM)
Introduction to Inferential Statistics
Significance Analysis of Microarrays (SAM)
Presentation transcript:

Significance analysis of microarrays (SAM) SAM can be used to pick out significant genes based on differential expression between sets of samples. Currently implemented for the following designs: -two-class unpaired -two-class paired -multi-class -censored survival -one-class

SAM SAM gives estimates of the False Discovery Rate (FDR), which is the proportion of genes likely to have been wrongly identified by chance as being significant. It is a very interactive algorithm – allows users to dynamically change thresholds for significance (through the tuning parameter delta) after looking at the distribution of the test statistic.

Two-class unpaired: to pick out genes whose mean expression level is significantly different between two groups of samples (analogous to between subjects t-test). Two-class paired: samples are split into two groups, and there is a 1-to-1 correspondence between an sample in group A and one in group B (analogous to paired t-test). SAM designs

Multi-class: picks up genes whose mean expression is different across > 2 groups of samples (analogous to one-way ANOVA) Censored survival: picks up genes whose expression levels are correlated with duration of survival. One-class: picks up genes whose mean expression across experiments is different from a user-specified mean.

1.Assign experiments to two groups, e.g., in the expression matrix below, assign Experiments 1, 2 and 5 to group A, and experiments 3, 4 and 6 to group B. Exp 1Exp 2Exp 3Exp 4Exp 5Exp 6 Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 2. Question: Is mean expression level of a gene in group A significantly different from mean expression level in group B? Exp 1Exp 2Exp 3Exp 4Exp 5Exp 6 Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Group AGroup B SAM Two-Class Unpaired

Permutation tests i)For each gene, compute d-value (analogous to t-statistic). This is the observed d-value for that gene. ii) Rank the genes in ascending order of their d-values. iii) Randomly shuffle the values of the genes between groups A and B, such that the reshuffled groups A and B respectively have the same number of elements as the original groups A and B. Compute the d-value for each randomized gene Exp 1Exp 2Exp 3Exp 4Exp 5Exp 6 Gene 1 Group AGroup B Exp 1Exp 4Exp 5Exp 2Exp 3Exp 6 Gene 1 Group AGroup B Original grouping Randomized grouping SAM Two-Class Unpaired

iv) Rank the permuted d-values of the genes in ascending order v) Repeat steps iii) and iv) many times, so that each gene has many randomized d-values corresponding to its rank from the observed (unpermuted) d-value. Take the average of the randomized d-values for each gene. This is the expected d-value of that gene. vi) Plot the observed d-values vs. the expected d-values

SAM Two-Class Unpaired Significant positive genes (i.e., mean expression of group B > mean expression of group A) Significant negative genes (i.e., mean expression of group A > mean expression of group B) “Observed d = expected d” line The more a gene deviates from the “observed = expected” line, the more likely it is to be significant. Any gene beyond the first gene in the +ve or –ve direction on the x-axis (including the first gene), whose observed exceeds the expected by at least delta, is considered significant.

SAM Two-Class Unpaired For each permutation of the data, compute the number of positive and negative significant genes for a given delta as explained in the previous slide. The median number of significant genes from these permutations is the median False Discovery Rate. The rationale behind this is, any genes designated as significant from the randomized data are being picked up purely by chance (i.e., “falsely” discovered). Therefore, the median number picked up over many randomizations is a good estimate of false discovery rate.