Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.

Slides:



Advertisements
Similar presentations
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Inferential Statistics
Is it statistically significant?
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 10-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Hypothesis testing Week 10 Lecture 2.
Chapter 10 Two-Sample Tests
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 10 Hypothesis Testing:
Statistical Methods in Computer Science Hypothesis Testing III: Categorical dependence and Ido Dagan.
PSY 307 – Statistics for the Behavioral Sciences
Chap 11-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 11 Hypothesis Testing II Statistics for Business and Economics.
Independent Samples and Paired Samples t-tests PSY440 June 24, 2008.
Lecture 13: Review One-Sample z-test and One-Sample t-test 2011, 11, 1.
Topic 2: Statistical Concepts and Market Returns
Overview of Lecture Parametric Analysis is used for
1/45 Chapter 11 Hypothesis Testing II EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008.
A Decision-Making Approach
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 10-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
T-Tests Lecture: Nov. 6, 2002.
 What is t test  Types of t test  TTEST function  T-test ToolPak 2.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Today Concepts underlying inferential statistics
5-3 Inference on the Means of Two Populations, Variances Unknown
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Hypothesis Testing Using The One-Sample t-Test
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Two-Sample Tests Basic Business Statistics 10 th Edition.
Inferential Statistics
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
AM Recitation 2/10/11.
Hypothesis Testing:.
Overview of Statistical Hypothesis Testing: The z-Test
Week 9 Chapter 9 - Hypothesis Testing II: The Two-Sample Case.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Tuesday, September 10, 2013 Introduction to hypothesis testing.
Education 793 Class Notes T-tests 29 October 2003.
T-distribution & comparison of means Z as test statistic Use a Z-statistic only if you know the population standard deviation (σ). Z-statistic converts.
T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?
Comparing Two Population Means
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Week 8 Chapter 8 - Hypothesis Testing I: The One-Sample Case.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
Chapter 9: Testing Hypotheses
Hypothesis Testing CSCE 587.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 10. Hypothesis Testing II: Single-Sample Hypothesis Tests: Establishing the Representativeness.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
I. Statistical Tests: A Repetive Review A.Why do we use them? Namely: we need to make inferences from incomplete information or uncertainty þBut we want.
Jeopardy Hypothesis Testing t-test Basics t for Indep. Samples Related Samples t— Didn’t cover— Skip for now Ancient History $100 $200$200 $300 $500 $400.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Two-Sample Tests Statistics for Managers Using Microsoft.
Inferential Statistics. Coin Flip How many heads in a row would it take to convince you the coin is unfair? 1? 10?
1 URBDP 591 A Lecture 12: Statistical Inference Objectives Sampling Distribution Principles of Hypothesis Testing Statistical Significance.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
366_7. T-distribution T-test vs. Z-test Z assumes we know, or can calculate the standard error of the distribution of something in a population We never.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 10-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Chapter Eleven Performing the One-Sample t-Test and Testing Correlation.
Lecture 8 Estimation and Hypothesis Testing for Two Population Parameters.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Central Limit Theorem, z-tests, & t-tests
Chapter 9 Hypothesis Testing.
I. Statistical Tests: Why do we use them? What do they involve?
F-tests Testing hypotheses.
Statistics for the Social Sciences
Presentation transcript:

Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 2 Hypothesis Testing: Intro We have looked at setting up experiments Goal: To prove falsifying hypotheses Goal fails => falsifying hypothesis not true (unlikely) => our theory survives Falsifying hypothesis is called null hypothesis, marked H 0 We want to show that the likelihood of H 0 being true is low.

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 3 Comparison Hypothesis Testing A very simple design: treatment experiment Also known as a lesion study / ablation test treatment Ind 1 & Ex 1 & Ex 2 &.... & Ex n ==> Dep 1 control Ex 1 & Ex 2 &.... & Ex n ==> Dep 2 Treatment condition: Categorical independent variable What are possible hypotheses?

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 4 Hypotheses for a Treatment Experiment H 1 : Treatment has effect H 0 : Treatment has no effect Any effect is due to chance But how do we measure effect? We know of different ways to characterize data: Moments: Mean, median, mode,.... Dispersion measures (variance, interquartile range, std. dev) Shape (e.g., kurtosis)

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 5 Hypotheses for a Treatment Experiment H 1 : Treatment has effect H 0 : Treatment has no effect Any effect is due to chance Transformed into: H 1 : Treatment changes mean of population H 0 : Treatment does not change mean of population Any effect is due to chance

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 6 Hypotheses for a Treatment Experiment H 1 : Treatment has effect H 0 : Treatment has no effect Any effect is due to chance Transformed into: H 1 : Treatment changes variance of population H 0 : Treatment does not change variance of population Any effect is due to chance

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 7 Hypotheses for a Treatment Experiment H 1 : Treatment has effect H 0 : Treatment has no effect Any effect is due to chance Transformed into: H 1 : Treatment changes shape of population H 0 : Treatment does not change shape of population Any effect is due to chance

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 8 Chance Results The problem: Suppose we sample the treatment and control groups We find mean treatment results = 0.7 mean control = 0.5 How do we know there is a real difference? It could be due to chance! In other words: What is the probability of getting 0.7 given H 0 ? If low, then we can reject H 0

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 9 Testing Errors The decision to reject the null hypothesis H 0 may lead to errors Type I error: Rejecting H 0 though it is true (false positive) Type II error: Failing to reject H 0 though it is false (false negative) Classification perspective of false/true-positive/negative We are worried about the probability of these errors (upper bounds) Normally, alpha is set to 0.05 or This is our rejection criteria for H 0 (usually the focus of significance tests) 1-beta is the power of the test (its sensitivity)

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 10 Two designs for treatment experiments One-sample: Compare sample to a known population e.g., compare to specification Two-sample: Compare two samples, establish whether they are produced from the same underlying distribution

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 11 One sample testing: Basics We begin with a simple case We are given a known control population P For example: life expectancy for patients (w/o treatment) Known parameters (e.g. known mean) Recall terminology: population vs. sample Now we sample the treatment population Mean = Mt Was the mean Mt drawn by chance from the known control population? To answer this, must know: What is the sampling distribution of the mean of P?

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 12 Sampling Distributions Suppose given P we repeat the following: Draw N sample points, calculate mean M 1 Draw N sample points, calculate mean M Draw N sample points, calculate mean M n The collection of means forms a distribution, too: The sampling distribution of the mean

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 13 Central Limit Theorem The sampling distribution of the mean of samples of size N, of a population with mean M and std. dev. S: 1. Approaches a normal distribution as N increases, for which: 2. Mean = M 3. Standard Deviation = This is called the standard error of the sample mean Regardless of shape of underlying population

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 14 So? Why should we care? We can now examine the likelihood of obtaining the observed sample mean for the known population If it is “too unlikely”, then we can reject the null hypothesis e.g., if likelihood that the mean is due to chance is less than 5%. The process: We are given a control population C Mean Mc and standard deviation Sc A sample of the treatment population sample size N, mean Mt and standard deviation St If Mt is sufficiently different than Mc then we can reject the null hypothesis

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 15 Z-test by example We are given: Control mean Mc = 1, std. dev. = Treatment N=25, Mt = 2.8 We compute: Standard error = 0.948/5 = 0.19 Z score of Mt = (2.8-population-mean-given-H 0 )/0.19 = (2.8-1)/0.19 = 9.47 Now we compute the percentile rank of 9.47 This sets the probability of receiving Mt of 2.8 or higher by chance Under the assumption that the real mean is 1. Notice: the z-score has standard normal distribution Sample mean is normally distributed, and subtracted/divided by constants; Z has Mean=0, stdev=1.

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 16 One- and two-tailed hypotheses The Z-test computes the percentile rank of the sample mean Assumption: drawn from sampling distribution of control population What kind of null hypotheses are rejected? One-tailed hypothesis testing: H0: Mt = Mc H1: Mt > Mc If we receive Z >= 1.645, reject H0. Z=1.645 =P 95 Z=0 =P 50 95% of Population

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 17 One- and two-tailed hypotheses The Z-test computes the percentile rank of the mean Assumption: drawn from sampling distribution of control population What kind of null hypotheses are rejected? Two-tailed hypothesis testing: H0: Mt = Mc H1: Mt != Mc If we receive Z >= 1.96, reject H0. If we receive Z <= -1.96, reject H0. Z=1.96 =P 97.5 Z=0 =P 50 Z=-1.96 =P % of Population

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 18 Two-sample Z-test Up until now, assumed we have population mean But what about cases where this is unknown? This is called a two-sample case: We have two samples of populations Treatment & control For now, assume we know std of both populations We want to compare estimated (sample) means

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 19 Two-sample Z-test (assume std known) Compare the differences of two population means When samples are independent (e.g. two patient groups) H 0 : M 1 -M 2 = d 0 H 1 : M 1 -M 2 != d 0 (this is the two-tailed version) var(X-Y) = var(X) + var(Y) for independent variables When we test for equality, d 0 = 0

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 20 Mean comparison when std unknown Up until now, assumed we have population std. But what about cases where std is unknown? => Have to be approximated When N sufficiently large (e.g., N>30) When population std unknown: Use sample std Population std is: Sample std is:

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 21 The Student's t-test Z-test works well with relatively large N e.g., N>30 But is less accurate when population std unknown In this case, and small N: t-test is used It approaches normal for large N t-test: Performed like z-test with sample std Compared against t-distribution t-score doesn’t distribute normally (denominator is variable) Assumes sample mean is normally distributed Requires use of size of sample N-1 degrees of freedom, a different distribution for each degree t =0 =P 50 thicker tails

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 22 t-test variations Available in excel or statistical software packages Two-sample and one-sample t-test Two-tailed, one-tailed t-test t-test assuming equal and unequal variances Paired t-test Same inputs (e.g. before/after treatment), not independent The t-test is common for testing hypotheses about means

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 23 Testing variance hypotheses F-test: compares variances of populations Z-test, t-test: compare means of populations Testing procedure is similar H 0 : H 1 : OR OR Now calculate f =, where s x is the sample std of X When far from 1, the variances likely different To determine likelihood (how far), compare to F distribution

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 24 The F distribution F is based on the ratio of population and sample variances According to H 0, the two standard deviations are equal F-distribution Two parameters: numerator and denominator degrees-of-freedom Degrees-of-freedom (here): N-1 of sample Assumes both variables are normal

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 25 Other tests for two-sample testing There exist multiple other tests for two-sample testing Each with its own assumptions and associated power For instance, Kolmogorov-Smirnov (KS) test Non-parametric estimate of the difference between two distributions Turn to your friendly statistics book for help

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 26 Testing correlation hypotheses We now examine the significance of r To do this, we have to examine the sampling distribution of r What distribution of r values will we get from the different samples? The sampling distribution of r is not easy to work with Fisher's r-to-z transform: Where the standard error of the r sampling distribution is:

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 27 Testing correlation hypotheses We now plug these values and do a Z-test For example: Let the r correlation coefficient for variables x,y = 0.14 Suppose n = 30 H 0 : r = 0 H 1 : r != 0 Cannot reject H 0

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan 28 Treatment Experiments (single-factor experiments) Allow comparison of multiple treatment conditions treatment 1 Ind 1 & Ex 1 & Ex 2 &.... & Ex n ==> Dep 1 treatment 2 Ind 2 & Ex 1 & Ex 2 &.... & Ex n ==> Dep 2 control Ex 1 & Ex 2 &.... & Ex n ==> Dep 3 Compare performance of algorithm A to B to C.... Control condition: Optional (e.g., to establish baseline) Cannot use the tests we learned: Why?