Multiple Comparisons with Gene Expression Arrays Using a Data Driven Ordering of Hypotheses Siegfried Kropf, Jürgen Läuter, Magdeburg, Germany Peter H.

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

Is it statistically significant?
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Design of Experiments and Analysis of Variance
Confidence Interval and Hypothesis Testing for:
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Elementary hypothesis testing
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA
Elementary hypothesis testing
Differentially expressed genes
Maximum likelihood (ML) and likelihood ratio (LR) test
Simulation Modeling and Analysis Session 12 Comparing Alternative System Designs.
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Final Review Session.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Chapter 2 Simple Comparative Experiments
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
5-3 Inference on the Means of Two Populations, Variances Unknown
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Multivariate Analysis of Variance, Part 1 BMTRY 726.
Multivariate Tests Based on Pairwise Distance or Similarity Measures Siegfried Kropf Institute for Biometry and Medical Informatics Otto von Guericke University.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Nonparametric or Distribution-free Tests
5-1 Introduction 5-2 Inference on the Means of Two Populations, Variances Known Assumptions.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
QNT 531 Advanced Problems in Statistics and Research Methods
ANOVA Greg C Elvers.
Profile Analysis. Definition Let X 1, X 2, …, X p denote p jointly distributed variables under study Let  1,  2, …,  p denote the means of these variables.
NONPARAMETRIC STATISTICS
Statistics for bioinformatics Filtering microarray data.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
© Copyright McGraw-Hill CHAPTER 12 Analysis of Variance (ANOVA)
Analysis of variance Petter Mostad Comparing more than two groups Up to now we have studied situations with –One observation per object One.
ANOVA (Analysis of Variance) by Aziza Munir
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
Multiple Testing in Microarray Data Analysis Mi-Ok Kim.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Simple Linear Regression ANOVA for regression (10.2)
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
New Proposals for Multiple Test Procedures, Applied to Gene Expression Array Data Siegfried Kropf, Otto von Guericke University Magdeburg in cooperation.
Statistics for Differential Expression Naomi Altman Oct. 06.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Linear Models One-Way ANOVA. 2 A researcher is interested in the effect of irrigation on fruit production by raspberry plants. The researcher has determined.
Bootstrap Event Study Tests Peter Westfall ISQS Dept. Joint work with Scott Hein, Finance.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
NON-PARAMETRIC STATISTICS
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Two-Sample Tests and One-Way ANOVA Business Statistics, A First.
Analysis of variance Tron Anders Moger
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
SUMMARY EQT 271 MADAM SITI AISYAH ZAKARIA SEMESTER /2015.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CHAPTER 13 Design and Analysis of Single-Factor Experiments:
Inference for the mean vector
Chapter 10: Analysis of Variance: Comparing More Than Two Means
Chapter 14: Analysis of Variance One-way ANOVA Lecture 8
marketing research with Spss
Statistics II: An Overview of Statistics
Presentation transcript:

Multiple Comparisons with Gene Expression Arrays Using a Data Driven Ordering of Hypotheses Siegfried Kropf, Jürgen Läuter, Magdeburg, Germany Peter H. Westfall, Lubbok, Texas, USA Markus Eszlinger, Leipzig, Germany MCP 2002, Bethesda, Maryland, USA, August 5-7, 2002

MCP Introduction Two well known procedures for MCPs controlling the FWE: Testing with a-priori ordered hypotheses (without  -adjustment) Bonferroni-Holm (data dependent order, with adjustment) In analysis of high-dimensional gene expression arrays not applicable/optimal.  We are looking for a method with data dependent ordering of hypotheses but without  -adjustment.

MCP Basic method Consider one-sample situation first: data matrix from n iid p-dimensional normal data vectors Aim: test of the local hypotheses H i :  i = 0 at the strong FWE . Procedure I: sort variables for decreasing values of, in that order carry out the unadjusted one-sample t tests for the variables as long as significance is attained.

MCP Remarks: This procedure maintains the FWE for normally distributed sample vectors with arbitrary covariance structure. Proof in Kropf (2000), Kropf and Läuter (2002), based on multi- variate theorems for spherically distributed observation vectors (Läuter, Glimm and Kropf, 1996, 1998). In order to yield an efficient order of variables, variances of variables should be approximately equal because with we have.

MCP Example 1 6 patients with nodules in thyroid gland (3 hot, 3 cold) 6 blocks, each 98 genes (double spotted): 588 genes + housekeeping genes Atlas Human Cancer 1.2 Array

MCP Comparison nodules vs. surrounding (hot and cold nodules together  one-sample test vs. 0) 1. block A only (98 genes, 2 spots aver., corr. with housek. genes) gene no. sum of squares unadjusted P-value # locally sign. genes: 33 # sign. genes Westfall-Young: 0 # sign. genes Holm‘s proc.: 0 # sign. genes Procedure I: 10

MCP blocks A - F (588 genes) – very similar : unadjusted: 131, Holm: 0, Proc.I: 9, Westfall-Young: 1 Simulation experiments guided by the example with one block: n = 6,...,33 cases, p = 98 variables, normally distributed, variance 1, pairwise correl. 0.5, expectation 0 for 88 var‘s, other 10 var‘s: sample size n Average # of significant genes in Monte Carlo replications

MCP Extensions Other testing problems: –particularly comparison of two/more independent samples ordering by sums of squares, i.e., related to the variablewise total mean of all samples, then two-sample t tests or one-way ANOVA. Other subsets of variables (e.g., pairs of variables)  Kropf, Läuter (Biometrical Journal, end of 2002) „Distribution-free“ version possible

MCP Example 2 30 patients with nodules in thyroids 15 hot nodules, 15 cold nodules tissue samples of nodules and surrounding analyzed with Affymetrix ® Gene Chips Signal log ratio nodule vs. surrounding from each patient for each of genes approximately multivariate normal distribution “similar” variances for all genes,expectation 0 if unaffected

MCP Cold nodules vs. surrounding (one-sample problem) For comparison: without any adjustment: 1064 Bonferroni /Holm: 1 (gene 8104) Westfall / Young: 0 The present procedure stops already after the 2nd gene. The procedure is sensitive to disturbances. It should be smoothed (see below, hybridisation with Bonferroni / Holm) · · · · · P valuegeneno.

MCP A weighted procedure: In the notation of the one-sample problem (Westfall, Kropf, Finos, 2002) Calculate the P-values p i (i = 1, …, p) for the usual unadjusted one-sample t test for each of the p variables. For each variable, determine the sums of squares values and the weights for fixed   0. Calculate the weighted P-values q i = p i / g i and order the variables for increasing values of them. Let S j denote the set of indices of all variables following the jth ordered variable in that order (including that variable itself). Then the hypothesis H (j) for the jth ordered variable is rejected iff

MCP Basic idea of proof: We restrict the consideration to the submatrix consisting of those variables with true null hypothesis (expectation zero). This matrix is left-spherically distributed. For fixed sums of squares w ii (i = 1,...,p) and cross products, its conditional distribution is also left-spherical. As then the weights are fixed, the standard theory (Fang, Zhang, 1990; Läuter, Glimm and Kropf, 1998) can be applied and ensures that the FWE is maintained for each condition and hence unconditionally, too. Special cases:  = 0 : Then the procedure is identical to Bonferroni / Holm.   : According to Westfall and Krishen (2001), the critical function converges to the fixed order as used in Procedure I. In an application,  has to be fixed in advance!

MCP Example 2 again Cold nodules vs. surrounding unadjusted 1064, Westf./Y  Is the choice of genes stable? B/H Pr. I 

MCP Example 2, cont. hot nodules vs. surrounding  hot vs. cold nodules B/H Pr. I B/H Pr. I unadjusted 2597, Westf./Y. 93unadjusted 1290, Westf./Y 

MCP Summary A new technique for multiple testing with data-dependent ordering of hypotheses is proposed. It keeps the FWE in the strong sense for arbitrary multivariate normal data. In order to provide a high power, the variables should have approximately equal variances. The proposal is advantageous in very small samples of high- dimensional data. The method is sensitive to disturbances. Westfall‘s proposal of the weighted procedure establishes a link of the above procedure and the Bonferroni-Holm method and smoothes out for these disturbances. The weighted procedure is a real alternative to existing analysis techniques for microarray data.

MCP References Fang, K.-T. and Zhang, Y.-T., 1990: General Multivariate Analysis. Science Press Beijing and Springer-Verlag Berlin Heidelberg. Kropf, S., 2000: Hochdimensionale multivariate Verfahren in der medizinischen Statistik. Shaker Verlag, Aachen. Kropf, S., and Läuter, J., 2002: Multiple Tests for Different Sets of Variables Using a Data-Driven Ordering of Hypotheses, with an Application to Gene Expression Data. Biometrical Journal, in print. Läuter, J., Glimm, E., and Kropf, S., 1996: New Multivariate Tests for Data with an Inherent Structure. Biometrical Journal 38, Erratum: Biometrical Journal 40, Läuter, J., Glimm, E., and Kropf, S., 1998: Multivariate Tests Based on Left-Spherically Distributed Linear Scores. Annals of Statistics 26, Erratum: Annals of Statistics 27, Westfall, P.H., Kropf, S., and Finos, L., 2002: Weighted FWE-controlling methods in high-dimensional situations. Submitted for IMS Philadelphia companion volume. Westfall, P.H. and Krishen, A. (2001): Optimally weighted, fixed sequence, and gatekeeping multiple testing procedures. Journal of Statistical Planning and Inference 99, Westfall, P.H. and Young, S.S., 1993: Resampling-Based Multiple Testing. John Wiley & Sons, New York.