Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.

Similar presentations


Presentation on theme: "Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan."— Presentation transcript:

1 Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan

2 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 2 Hypothesis Testing: Intro When setting up experiments: Goal: To assess falsifying hypotheses E.g: treatment has no effect Goal fails => falsifying hypothesis not true (unlikely) => our theory survives Falsifying hypothesis is called null hypothesis, marked H 0 We want to check whether the likelihood of H 0 being true is low.

3 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 3 Comparison Hypothesis Testing A very simple design: treatment experiment Also known as a lesion study / ablation test Two populations: control & treatment (finite or infinite) Assuming they are identical, except for the independent variable treatment Ind 1 & Ex 1 & Ex 2 &.... & Ex n ==> Dep 1 control Ind 0 & Ex 1 & Ex 2 &.... & Ex n ==> Dep 2 Treatment condition: Categorical independent variable What are possible hypotheses?

4 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 4 Hypotheses for a Treatment Experiment H 1 : Treatment has effect H 0 : Treatment has no effect Any effect is due to chance But how do we measure effect? We know of different ways to characterize data: Moments: Mean, median, mode,.... Dispersion measures (variance, interquartile range, std. dev) Shape (e.g., kurtosis)

5 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 5 Hypotheses for a Treatment Experiment H 1 : Treatment has effect H 0 : Treatment has no effect Any effect is due to chance Transformed into: H 1 : Treatment changes mean of population H 0 : Treatment does not change mean of population Any effect is due to chance

6 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 6 Hypotheses for a Treatment Experiment H 1 : Treatment has effect H 0 : Treatment has no effect Any effect is due to chance Transformed into: H 1 : Treatment changes variance of population H 0 : Treatment does not change variance of population Any effect is due to chance

7 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 7 Hypotheses for a Treatment Experiment H 1 : Treatment has effect H 0 : Treatment has no effect Any effect is due to chance Transformed into: H 1 : Treatment changes shape of population H 0 : Treatment does not change shape of population Any effect is due to chance

8 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 8 Chance Results for Samples The problem: Suppose first we know the mean of control population and sample the treatment population We find mean treatment results = 0.7 mean control = 0.5 How do we know there is a real difference? Difference could be due to chance – because we measure the value from a sample and not from the population In treatment experiment: two populations, null hypothesis H 0 states that their means are equal The key question: What is the probability of getting 0.7 in a sample from treatment population given H 0 ? If low, then we can reject H 0

9 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 9 One sample testing: Basics We begin with a simple case We are given a known control population P For example: life expectancy for patients (w/o treatment) Known parameters (e.g. known mean) Now we sample the treatment population Mean = Mt The question: Was the mean Mt drawn by chance from a population which behaves the same (mean, variance,...) as the control population? To answer this, must know: What is the sampling distribution of the mean of P?

10 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 10 Sampling Distributions Suppose given P we repeat the following: Draw N sample points, calculate mean M 1 Draw N sample points, calculate mean M 2..... Draw N sample points, calculate mean M n The collection of means forms a distribution, too: The sampling distribution of the mean

11 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 11 Central Limit Theorem The sampling distribution of the mean of samples of size N, of a population with mean M and std. dev. S: 1. Approaches a normal distribution as N increases, for which: 2. Mean = M 3. Standard Deviation = This is called the standard error of the sample mean Regardless of shape of underlying population

12 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 12 So? Why should we care? We can now examine the likelihood of obtaining the observed sample mean for the known population If it is “too unlikely”, then we can reject the null hypothesis e.g., if likelihood that the mean is due to chance is less than 5%. The process: We are given a control population C Mean Mc and standard deviation Sc A sample of the treatment population sample size N, mean Mt and standard deviation St If Mt is sufficiently different than Mc then we can reject the null hypothesis

13 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 13 Z-test by example We are given: Control mean Mc = 1, std. dev. = 0.948 Treatment N=25, Mt = 2.8 We compute: Standard error = 0.948/5 = 0.19 Z score of Mt = (2.8-population-mean-given-H 0 )/0.19 = (2.8-1)/0.19 = 9.47 Now we compute the percentile rank of 9.47 This sets the probability of receiving Mt of 2.8 or higher by chance Under the assumption that the real mean is 1. Notice: the z-score has standard normal distribution: Sample mean is normally distributed, subtracted/divided by constants to obtain Z (maintaining normality): Mean=0, stdev=1.

14 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 14 One- and two-tailed hypotheses The Z-test computes the percentile rank of the sample mean Using percentile table for standard normal distribution Assumption: drawn from sampling distribution of control population What kind of null hypotheses are rejected? Determined by research question in advance One-tailed hypothesis testing: H0: Mt = Mc H1: Mt > Mc If we receive Z >= 1.645, reject H0 Mean is most likely higher than MC, to explain Mt Z=1.645 =P 95 Z=0 =P 50 95% of Population

15 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 15 One- and two-tailed hypotheses What kind of null hypotheses are rejected? Two-tailed hypothesis testing: H0: Mt = Mc H1: Mt != Mc If we receive Z >= 1.96, reject H0. If we receive Z <= -1.96, reject H0. Z=1.96 =P 97.5 Z=0 =P 50 Z=-1.96 =P 2.5 95% of Population

16 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 16 Testing Errors The decision to reject the null hypothesis H 0 may lead to errors Type I error: Rejecting H 0 though it is true (false positive) Type II error: Failing to reject H 0 though it is false (false negative) Classification perspective of false/true-positive/negative We are worried about the probability of these errors (upper bounds) Normally, alpha is set to 0.05 or 0.01. This is our rejection criteria for H 0 (usually the focus of significance tests) 1-beta is the power of the test (its sensitivity)

17 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 17 Two designs for treatment experiments One-sample: Compare sample to a known population e.g., compare to specification, known history Two-sample: Compare two samples, establish whether they are produced from the same underlying distribution

18 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 18 Two-sample Z-test Up until now, assumed we know control population mean But what about cases where this is unknown? This is called a two-sample case: We have two samples of populations Treatment & control For now, assume we know std of both populations We want to compare estimated (sample) means

19 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 19 Two-sample Z-test (assume std known) Compare the differences of two population means When samples are independent (e.g. two patient groups) H 0 : M 1 -M 2 = d 0 H 1 : M 1 -M 2 != d 0 (this is the two-tailed version) var(X-Y) = var(X) + var(Y) for independent variables When we test for equality, d 0 = 0

20 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 20 Mean comparison when std unknown Up until now, assumed we have population std. But what about cases where std is unknown? => Have to be approximated When N sufficiently large (e.g., N>30) When population std unknown: Use sample std Population std is: Sample std is:

21 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 21 The Student's t-test Z-test works well with relatively large N e.g., N>30 for central limit theorem But is less accurate when population std unknown Std is not a constant anymore In this case, and small N: t-test is used t-distribution approaches normal for larger N (~60-120): t-test: Performed like z-test with sample std Compared against t-distribution t-score doesn’t distribute normally (denominator is variable) Assumes sample mean is normally distributed Which it is, based on the central limit theorem, though the t-score (based on sample std) is not normally distributed Requires use of size of sample N-1 degrees of freedom, a different distribution for each degree Std decreases as df increases, approaches normal t =0 =P 50 thicker tails

22 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 22 t-test variations Available in excel or statistical software packages Two-sample and one-sample t-test Two-tailed, one-tailed t-test t-test assuming equal and unequal variances Paired t-test Same inputs (e.g. before/after treatment), not independent The t-test is common for testing hypotheses about means

23 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 23 Testing variance hypotheses F-test: compares variances of populations Z-test, t-test: compare means of populations Testing procedure is similar H 0 : H 1 : OR OR Now calculate f =, where s x is the sample std of X When far from 1, the variances likely different To determine likelihood (how far), compare to F distribution

24 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 24 The F distribution F is based on the ratio of population and sample variances According to H 0, the two standard deviations are equal F-distribution Two parameters: numerator and denominator degrees-of-freedom Degrees-of-freedom (here): N-1 of sample Assumes both variables are normal

25 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 25 Other tests for two-sample testing There exist multiple other tests for two-sample testing Each with its own assumptions and associated power For instance, Kolmogorov-Smirnov (KS) test Non-parametric estimate of the difference between two distributions Turn to your friendly statistics book for help

26 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 26 Testing correlation hypotheses We now examine the significance of r To do this, we have to examine the sampling distribution of r The distribution of r values we get from different samples The sampling distribution of r is not easy to work with (how does it look?) Fisher's r-to-z transform: Approximately normal sampling distribution (N>10) Mean = z(ρ) (of population) standard error (independent of r):

27 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 27 Testing correlation hypotheses We now plug these values and do a Z-test For example: Let the r correlation coefficient for variables x,y = 0.14 Suppose n = 30 H 0 : ρ = 0 H 1 : ρ != 0 Cannot reject H 0

28 Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 28 Treatment Experiments (single-factor experiments) Allow comparison of multiple treatment conditions treatment 1 Ind 1 & Ex 1 & Ex 2 &.... & Ex n ==> Dep 1 treatment 2 Ind 2 & Ex 1 & Ex 2 &.... & Ex n ==> Dep 2 control Ex 1 & Ex 2 &.... & Ex n ==> Dep 3 Compare performance of algorithm A to B to C.... Control condition: Optional (e.g., to establish baseline) Cannot use the tests we learned: Why?


Download ppt "Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan."

Similar presentations


Ads by Google