Significance analysis of microarrays (SAM)

Slides:



Advertisements
Similar presentations
Mixed Designs: Between and Within Psy 420 Ainsworth.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Experimental Design Internal Validation Experimental Design I. Definition of Experimental Design II. Simple Experimental Design III. Complex Experimental.
Hypothesis Testing Steps in Hypothesis Testing:
Covariance and Correlation: Estimator/Sample Statistic: Population Parameter: Covariance and correlation measure linear association between two variables,
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Data: Time course data with two conditions (e.g. WT and mutant). For each condition: N time points and M replicates per time-point. – In our example N=6,
Topic 6: Introduction to Hypothesis Testing
OHRI Bioinformatics Introduction to the Significance Analysis of Microarrays application Stem.
Differentially expressed genes
1 Test of significance for small samples Javier Cabrera.
Choosing Statistical Procedures
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Differential Analysis & FDR Correction
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Essential Statistics in Biology: Getting the Numbers Right
Significance analysis of microarrays (SAM) SAM can be used to pick out significant genes based on differential expression between sets of samples. Currently.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Bioinformatics for Stem Cell Lecture 2 Debashis Sahoo, PhD.
Jeopardy Hypothesis Testing t-test Basics t for Indep. Samples Related Samples t— Didn’t cover— Skip for now Ancient History $100 $200$200 $300 $500 $400.
Analysis of Affy 1.0 ST Gene Array Data in R To analyze Affymetrix 1.0 ST data (exon or gene) you need: Expression data in.CEL format A CDF (chip definition.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
Cluster validation Integration ICES Bioinformatics.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
The Broad Institute of MIT and Harvard Differential Analysis.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Kalman Filter with Process Noise Gauss- Markov.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Methods of Presenting and Interpreting Information Class 9.
Analysis of Multiple Experiments TIGR Multiple Experiment Viewer (MeV)
Chapter 11 Analysis of Variance
Nonparametric Statistics
MEASURES OF CENTRAL TENDENCY Central tendency means average performance, while dispersion of a data is how it spreads from a central tendency. He measures.
Mixture Modeling of the p-value Distribution
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
General Linear Model & Classical Inference
Xiaoshu Chen, Jianzhi Zhang  Cell Systems 
Differential Gene Expression
Comparing Three or More Means
Statistical Testing with Genes
A Session On Regression Analysis
Mixture modeling of the distribution of p-values from t-tests
Significance Analysis of Microarrays (SAM)
Graphing / Plotting Points Review
Bioinformatics for Stem Cell Lecture 2
Introduction to Inferential Statistics
Elizabeth Garrett Giovanni Parmigiani
Nonparametric Statistics
Comparing Groups.
Chapter 9 Hypothesis Testing.
Matrix form of 1-Way ANOVA Cell Means Model Weighted Least Squares

Inferences about Population Means
Significance Analysis of Microarrays (SAM)
Sam Norman-Haignere, Nancy G. Kanwisher, Josh H. McDermott  Neuron 
Psych 231: Research Methods in Psychology
Introduction to Mixed Linear Models in Microarray Experiments
Gene-Expression Variation Within and Among Human Populations
Inferential Statistics
Psych 231: Research Methods in Psychology
Hypothesis Testing S.M.JOSHI COLLEGE ,HADAPSAR
Chapter 10 Introduction to the Analysis of Variance
Nonparametric Statistics
pairing data values (before-after, method1 vs
Statistical Testing with Genes
Statistical chart of significantly differentially expressed genes
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Significance analysis of microarrays (SAM) SAM can be used to pick out significant genes based on differential expression between sets of samples. Currently implemented for the following designs: - two-class unpaired two-class paired multi-class censored survival one-class

SAM SAM gives estimates of the False Discovery Rate (FDR), which is the proportion of genes likely to have been wrongly identified by chance as being significant. It is a very interactive algorithm – allows users to dynamically change thresholds for significance (through the tuning parameter delta) after looking at the distribution of the test statistic.

SAM designs Two-class unpaired: to pick out genes whose mean expression level is significantly different between two groups of samples (analogous to between subjects t-test). Two-class paired: samples are split into two groups, and there is a 1-to-1 correspondence between an sample in group A and one in group B (analogous to paired t-test).

SAM designs Multi-class: picks up genes whose mean expression is different across > 2 groups of samples (analogous to one-way ANOVA) Censored survival: picks up genes whose expression levels are correlated with duration of survival. One-class: picks up genes whose mean expression across experiments is different from a user-specified mean.

SAM Two-Class Unpaired Assign experiments to two groups, e.g., in the expression matrix below, assign Experiments 1, 2 and 5 to group A, and experiments 3, 4 and 6 to group B. Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Group A Group B Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 2. Question: Is mean expression level of a gene in group A significantly different from mean expression level in group B?

SAM Two-Class Unpaired Permutation tests For each gene, compute d-value (analogous to t-statistic). This is the observed d-value for that gene. ii) Rank the genes in ascending order of their d-values. iii) Randomly shuffle the values of the genes between groups A and B, such that the reshuffled groups A and B respectively have the same number of elements as the original groups A and B. Compute the d-value for each randomized gene Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 Gene 1 Group A Group B Original grouping Exp 1 Exp 4 Exp 5 Exp 2 Exp 3 Exp 6 Gene 1 Group A Group B Randomized grouping

SAM Two-Class Unpaired iv) Rank the permuted d-values of the genes in ascending order v) Repeat steps iii) and iv) many times, so that each gene has many randomized d-values corresponding to its rank from the observed (unpermuted) d-value. Take the average of the randomized d-values for each gene. This is the expected d-value of that gene. vi) Plot the observed d-values vs. the expected d-values

SAM Two-Class Unpaired Significant positive genes (i.e., mean expression of group B > mean expression of group A) SAM Two-Class Unpaired “Observed d = expected d” line The more a gene deviates from the “observed = expected” line, the more likely it is to be significant. Any gene beyond the first gene in the +ve or –ve direction on the x-axis (including the first gene), whose observed exceeds the expected by at least delta, is considered significant. Significant negative genes (i.e., mean expression of group A > mean expression of group B)

SAM Two-Class Unpaired For each permutation of the data, compute the number of positive and negative significant genes for a given delta as explained in the previous slide. The median number of significant genes from these permutations is the median False Discovery Rate. The rationale behind this is, any genes designated as significant from the randomized data are being picked up purely by chance (i.e., “falsely” discovered). Therefore, the median number picked up over many randomizations is a good estimate of false discovery rate.