Download presentation
Presentation is loading. Please wait.
Published byAbigayle Chapman Modified over 9 years ago
1
Adapted from: Wulff HR, Andersen B, Brandenhoff P, Guttler F (1987): What do doctors know about statistics? Statistics in Medicine 6:3-10 Suppose we conduct a t-test of the difference between two means and obtain a p-value <.05. Does this mean: a)There is less than a 5% chance that the results are due to chance. b)If there really is no difference between the population means, there is less than a 5% chance of obtaining a difference this large or larger. c)There is a 95% chance that if the study is repeated, the result will be replicated. d)There is a 95% chance that there is a real difference between the two population means.
2
What is a p-value? The probability of obtaining a test statistic (data) that departs as much as or more than the observed test statistic (data) if the null hypothesis were true.
3
Which Null Hypotheses are Meaningful and Testable? Those that precisely specify a probability model for the data.
4
A Perspective Samples Populations We study: We wish to obtain knowledge about: Data Nature
5
Gene Family-Based Hypothesis Testing Sketch of Typical (outmoded and inappropriate) Approach: 1.For Genes 1 to K, define a vector, R, of length K that contains the values of a categorical variable denoting group membership. 2.For Genes 1 to K, define a vector, C, of length K that contains the values of a binary variable denoting whether or not the gene was ‘significant’ or ‘interesting’ by some standard. 3.Conduct some frequentist significance test for an association between R and C.
7
The Independence Issue: A Real Example
8
Gene Family-Based Hypothesis Testing Which Null Hypothesis is Being Tested? 1.None of the genes in family c are differentially expressed (associated, methylated, etc.). 2.The proportion of genes in family c that are differentially expressed is equal to the proportion of genes in the remainder of the genome that are differentially expressed (beware of ‘anti-Bayesian’ element). 3.The proportion of genes in family c that are differentially expressed to an extent greater than is equal to the proportion of genes in the remainder of the genome that are differentially expressed. Note: These can all be subsumed under the general: H 0 :
9
Union-Intersection The compound hypothesis is rejected if any one of the individual hypotheses are rejected Multiplicity adjustment procedure is required to control type I error rate The rejection region for this test is the union of rejection regions corresponding to the individual tests Intersection-Union The compound hypothesis is rejected only if all of the individual hypotheses are rejected Overall type I error rate of α is maintained without multiplicity adjustment The rejection region for this test is the intersection of the rejection regions corresponding to the individual tests Union-Intersection vs Intersection-Union Tests Methods not yet well established. Bayesian methods involving posterior probabilities in place of p-values may be especially useful. When P << N, methods are well established (e.g., multiple regression. When P >> N optimal methods are not yet clear.
10
Normality? Exchangeability? Independence? Other? What assumptions are being made? Non-Parametric: Non-Panacea (Cohen, J.) Asymptotic Exact
11
Major Issues to Ask About in Selecting a Method for Gene Family or Pathway Testing ► What is the null? ► Does the method assume that all components (e.g., SNPs or gene expression levels) are independent? ► Is the method ‘anti-Bayesian’? ► Does the method use the continuity of information (not simply significant or not)?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.