Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html

Similar presentations


Presentation on theme: "1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html"— Presentation transcript:

1 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de

2 2 Measure in the sample Measure in the population? Variance? Confidence intervals? Estimation, Regression: II. Testing Difference in the sample Difference in the population? Probability of a false call? Significance Testing: Induction from the sample to the population II. Testing

3 3 What allows us to conclude from the sample to the population? The sample has to be representative (figures about drug abuse of students cannot be generalized to the whole population of Germany) How is representativity achieved? Large sample numbers Random recruitment of samples from the population E.g.: Dial a random phone number. Choose a random name from the register of birth (Advantages/Disadv.?) Randomization: Random allocation of the samples to the different experimental groups II. Testing

4 4 A non-sheep detector Training:Measure the length of all sheep that cross your way

5 5 Training:Measure the length of all sheep that cross your way. Determine the distribution of the quantity of interest. A non-sheep detector II Testing

6 6 Testing: For any unknown animal, test the hypothesis that it is a sheep. Measure ist length and compare it to the learned length distribution of the sheep. If its length is „out of bounds“, the animal will be called a non-sheep (rejection of the hypothesis). Otherwise, we cannot say much (non-rejection). A non-sheep detector Not a sheep II Testing

7 7 Advantage of the method: One does not need to know much about sheep. Disadvantage: It produces errors… True Negatives Negatives calls Positive calls Decision boundary True Positives False Positives False Negatives II Testing A non-sheep detector

8 8 Statistical Hypothesis Testing State a null hypothesis H 0 („nothing happens, there is no difference…“) Choose an appropriate test statistic (the data- derived quantity that finally leads to the decision) This implicitly determines the null distribution (the distribution of the test statistic under the null hypothesis). II Testing

9 9 Statistical Hypothesis Testing Stats an alternative hypothesis (e.g. „the test statistic is higher than expected under the null hypothesis“) Determine a decision boundary. This is equivalent to the chioce of a significance level α, i.e. the fraction of false positive calls you are willing to accept. α d II Testing Acceptance region Rejection region

10 10 Statistical Hypothesis Testing α d Calculate the actual value of the test statistic in the sample, and make your decision according to the pre- specified(!) decision boundary. Keep H 0 (no rejection) Reject H 0 (assume the alternative hypothesis) II Testing

11 11 0 d Good statistic Good test statistics, bad test statistics Accept null hypothesis Reject null hypothesis Null hypothesis is true right decision Typ I error (False Positive) Alternative is true Typ II error (False Negative) right decision Distribution of the test statistic under the null hypothesis Distribution of the test statistic under the alternative hypothesis II Testing

12 0 d Bad statistic II Testing Distribution of the test statistic under the null hypothesis Distribution of the test statistic under the alternative hypothesis Accept null hypothesis Reject null hypothesis Null hypothesis is true right decision Typ I error (False Positive) Alternative is true Typ II error (False Negative) right decision Good test statistics, bad test statistics

13 13 The Offenbach Oracle Throw the 20-sided dice Score = 20: reject the null hypothesis Score ≠ 20: keep the null hypothesis This is (independent of the null hypothesis) a valid statistical test at a 5% type I error level! Toni, 29, Offenbach, mechanician and moral philosopher II Testing

14 14 The Offenbach Oracle But: The distribution of the test statistic under null- and alternative hypothesis is identical This test cannot discriminate between the two alternatives! Distribution under H 0 Distribution under H 1 95% of the Positives (as well as the Negatives) will be missed. II Testing

15 15 The p-value p = 0.08 Given a test statistic and ist actual value t in a sample, a p-Wert can be calculated: Each test value t maps to a p-value, the latter is the probability of observing a value of the test statistic which is at least as extreme as the actual value t [under the assumption of the null hypothesis]. t=4.2 II Testing

16 16 p = 0.42 t=0.7 II Testing The p-value Given a test statistic and ist actual value t in a sample, a p-Wert can be calculated: Each test value t maps to a p-value, the latter is the probability of observing a value of the test statistic which is at least as extreme as the actual value t [under the assumption of the null hypothesis].

17 17 Test decisions according to the p-value Decision boundary d significance level α Observed test statistic t p-value α = 0.05 p ≥ α Keep H 0 (no rejection) p < α Reject H 0 (assume the alternative hypothesis) t p = 0.02 d t p = 0.83 t more extreme than d p is smaller than α II Testing

18 18 One- and two-sided hypotheses ][ Acceptance region Rejection region One-sided alternative H 0 : The value of a quantity of interest in group A is not higher than in group B. H 1 :The value of a quantity of interest in group A is higher than in group B. II Testing

19 19 ][ Acceptance region Rejection region H 0 : The quantity of interest has the same value in group A and group B H 1 :The quantity of interest is different in group A and group B ][ Rejection region Generally, two-sided alternatives are more conservative: Deviations in both directions are detected. II Testing One- and two-sided hypotheses Two-sided alternative

20 20 Example “Testing”: Colon Carcinoma What about this fact? Variable: Vaccine Scale: binary Endpoint: 4-year survival Scale: binary 32*94 ≈ 30 (62-32)*77 ≈ 23 II Testing

21 21 Interesting questions: Das the vaccine yield any effect? Is this effect „significant“ ? 4-year survival JaNein Vaccine yes (n=32)30 (94%)2 (6%) no (n=30)23 (77%)7 (23%) II Testing Example “Testing”: Colon Carcinoma

22 22 Null hypothesis H 0 : Vaccination has not (either positive or negative) impact on the patients. The survival rates in the vaccine and non-vaccine group in the whole population are the same. Alternative hypothesis H 1 : For the whole population, the survival rates in the vaccine and non vaccine group are different. Choose the significance level α (usually: α = 1%; 0.1%; 5%) Interpretation of the significane level α : If there is no difference between the groups, one obtains a false positive result with a probability of α. II Testing Example “Testing”: Colon Carcinoma

23 23 Choice of test statistic: „Fisher‘s Exact Test“ Sir Ronald Aylmer Fisher, 1890-1962 Theoretical Biology, Evolution Theory, Statistics II Testing Example “Testing”: Colon Carcinoma

24 24 Value of the test statistic t after the experiment has been carried out. This value can be converted into a p-value: p = 0.0766  7.7% Since we have chosen a significane level α = 5%, and p > α, we cannot reject the null hypothesis, thus we keep it. Formulation of the result: At a 5% significance level (and using Fisher‘s Exact Test), no significant effect of vaccination on survival could be detected. Consequence: We are not (yet) sufficiently convinced of the utility of this therapy. But this does not mean that there is no difference at all! II Testing Example “Testing”: Colon Carcinoma

25 25 “No test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of a hypothesis.“ Neyman J, Pearson E (1933) Phil Trans R Soc A Egon Pearson (1895-1980) Jerzy Neyman (1894-1981) Non-significance ≠ equivalence Statistics can never prove a hypothesis, it can only provide evidence II Testing

26 26 Confidence intervals 95%-Confidence interval: An estimated interval which contains the „true value“ of a quantity with a probability of 95%. 24,3 ____________________________________ () 20.529,5 X Interval estimate Point estimate (e.g. % votes for the SPD in the EU elections) ( 1 – α ) – Conficence interval: An estimated interval which contains the „true value“ of a quantity with a probability of (1 – α). 1 – α = confidence level, α = error probability Use confidence intervals with caution! I. Description

27 27 … Gene A Gene B gene expression measurements Which gene is expressed at a higher level? group 1 group 2 Comparison of two group means Specific statistical tests

28 28 group 1 group 2 Hypothesis: The expression of gene g in group 1 is lower than in group 2. Data: Expression of gene g in different samples Decision for “lower expression“, if Test statistic, e.g. Difference of group means Two group comparison

29 29 Bad idea: Difference of group means Problem: d is not scaling invariant Solution: Divide d by an estimate of the standard deviation s(d) in the two groups This is the t-statistic giving rise to the (unpaired) t-test. group 1 group 2 Two group comparison

30 Question: Given independent samples in group 1 and group 2, Are the values in group 1 smaller than in group 2 ? measurements group 1183695 group 215108712 1 2 3 4 5 6 7 8 9 10 3 5 6 7 8 9 10 12 15 18 Raw scale Rank scale Rank sum group 1: 1+2+3+6+10 = 22 Rank sum group 2: 4+5+7+8+9 = 33 Wilcoxon (rank sum) test (equiv. to Mann-Whitney-test)

31 Choose the rank sum of group 1 as test statistic W Rank sum distribution for group 1, |group 1| = 5, |group 2| = 5 The p-value corresponding to W can be computed exactly for small sample numbers. For large numbers, there exist good approximations. 22 P(W ≤ 22, given the groups do not differ in their location) Wilcoxon W 152025303540 = 0.15 Wilcoxon (rank sum) test (equiv. to Mann-Whitney-test)

32 32 Gaussian data? Paired samples? Paired Samples? Unpaired two sample t-test yes no Paired two sample t-Test Wilcoxon signed rank test Wilcoxon rank sum test yes no Question: Do the two measurements in the two groups differ in their location? Summary: Two-group comparison of a continuous variable

33 Effect effectno effect Medi- cation Verum657 Placebo4413 Question: Do the distributions in group 1 and group 2 differ? Unpaired data: Fisher‘s exact test Example: Clinical trial, unpaired design (each test person receives only one treatment) Comparison of two binary variables

34 headstails Fair coin5446 Bent coin8218 Odds (= Chances): Odds (fair coin) = 54 : 46 = 1.17 Odds (bent coin) = 82 : 18= 4.56 Odds Ratio Odds und Odds Ratio

35 Null hypothesis: 5yr survival is independent of tumor size. Unpaired data: Chi-square test (χ 2 -test) 5yr survival NoYes Tumor size 1108 22023 31910 43218 Comparison of two categorial variables In this example, p < 0.001.

36 Requirements Sample number sufficiently large (n ≥ 60) Expected number of is not too small ( ≥ 5) for all possible observations Unpaired data Chi-square test (χ 2 -test) Vergleich zweier kategorialer Merkmale Note that for binary data and large n, chisquare test and Fisher test are equivalent.

37 37 Binary data? Paired data? McNemar test yes no Fisher‘s exact test (Bowker Symmetry- test) Chisquare (χ 2 ) -Test yes no Question: Do there exist differences in the distribution of one variable if grouped by the second variable? Summary: Comparison of two categorial variables

38 38 MerkmalDesign Deskription numerisch Deskription graphisch Test Con- tinuous two sample Medians, quartiles 2 Boxplots Wilcoxon rank sum test, t-test* Con- tinuous paired Medians, quartiles óf differences Boxplot of differences Wilcoxon signed rank test, paired t-test* binarytwo sample Cross table, odds ratio Barplot Fisher‘s exact test binarypairedCross tableBarplot McNemar- test categorialtwo sampleCross table3D Barplotχ 2 -test * If differences follow a normal distribution Summary: Description und Testing

39 Data description is the mandatory first step of every statistical analysis / test. Test results should report the outcome (singificant/not significant) together with the p-value that has been obtained. Never report a p-value of exactly 0! (why?) Remarks on Testing

40 40 For large sample numbers, even tiny differences may produce significant findings. For small sample numbers, an observed relevant difference can be statistically insignificant. Statistical significance ≠ relevance

41 41 Examples of multiple tests: Testing of several endpoints (systolic and diastolic blood pressure, pulse, …) Comparison of several groups (e.g., 4 groups require 6 pairwise two-group comparisons) Let us set a significance level of 5%, and suppose the null hypothesis holds in all cases. → If we perform 6 tests, the probability of reporting at least one false positive finding can increase to 30%! Multiple Testing

42 Remedy: Bonferroni correction For m tests and a target significance level, perform each individual test at a significance level of α/m (local significance level). The probability of producing a false positive finding in at least one of the m tests is then at most α (multiple / global significance level) Multiple Testing, Bonferroni Correction

43


Download ppt "1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html"

Similar presentations


Ads by Google