Hypothesis Testing.  Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Do people.

Slides:



Advertisements
Similar presentations
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Advertisements

Topics Today: Case I: t-test single mean: Does a particular sample belong to a hypothesized population? Thursday: Case II: t-test independent means: Are.
Economics 105: Statistics Go over GH 11 & 12 GH 13 & 14 due Thursday.
Confidence Interval and Hypothesis Testing for:
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
Probability & Statistical Inference Lecture 7 MSc in Computing (Data Analytics)
Probability & Statistical Inference Lecture 6 MSc in Computing (Data Analytics)
PSY 307 – Statistics for the Behavioral Sciences
Elementary hypothesis testing
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
BCOR 1020 Business Statistics Lecture 22 – April 10, 2008.
Today Today: Chapter 10 Sections from Chapter 10: Recommended Questions: 10.1, 10.2, 10-8, 10-10, 10.17,
Topic 2: Statistical Concepts and Market Returns
Chapter Goals After completing this chapter, you should be able to:
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Chapter 2 Simple Comparative Experiments
Chapter 11: Inference for Distributions
Inferences About Process Quality
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Today Concepts underlying inferential statistics
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
The t Tests Independent Samples.
Comparing Means.  Comparing two means is not very different from comparing two proportions.  This time the parameter of interest is the difference between.
Nonparametrics and goodness of fit Petter Mostad
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
AM Recitation 2/10/11.
Chapter 13 – 1 Chapter 12: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Errors Testing the difference between two.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
1 STATISTICAL HYPOTHESES AND THEIR VERIFICATION Kazimieras Pukėnas.
Lecture 14 Testing a Hypothesis about Two Independent Means.
Ch 11 – Inference for Distributions YMS Inference for the Mean of a Population.
NONPARAMETRIC STATISTICS
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/09/2015 7:46 PM 1 Two-sample comparisons Underlying principles.
Comparing Two Population Means
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
Chapter 9: Testing Hypotheses
Hypothesis Testing CSCE 587.
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Statistics for the Behavioral Sciences Second Edition Chapter 11: The Independent-Samples t Test iClicker Questions Copyright © 2012 by Worth Publishers.
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
Hypothesis Testing Using the Two-Sample t-Test
Tests for Random Numbers Dr. Akram Ibrahim Aly Lecture (9)
1 10 Statistical Inference for Two Samples 10-1 Inference on the Difference in Means of Two Normal Distributions, Variances Known Hypothesis tests.
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
1 9 Tests of Hypotheses for a Single Sample. © John Wiley & Sons, Inc. Applied Statistics and Probability for Engineers, by Montgomery and Runger. 9-1.
Chapter Twelve The Two-Sample t-Test. Copyright © Houghton Mifflin Company. All rights reserved.Chapter is the mean of the first sample is the.
Chapter 9: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Type I and II Errors Testing the difference between two means.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
Chapter 10 The t Test for Two Independent Samples
© Copyright McGraw-Hill 2004
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Testing Statistical Hypothesis The One Sample t-Test Heibatollah Baghi, and Mastee Badii.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
CHAPTER 7: TESTING HYPOTHESES Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 4. Inference about Process Quality
Environmental Modeling Basic Testing Methods - Statistics
Presentation transcript:

Hypothesis Testing

 Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Do people click more on headline A or B? The New York Times Daily Dilemma 2

 Two Populations Testing Hypotheses 12 ? ? 10 9 ? ? 4 ? Which one has the largest average? ? ? 3

The two-sample t-test Is difference in averages between two groups more than we would expect based on chance alone? 4

More Broadly: Hypothesis Testing Procedures Hypothesis Testing Procedures ParametricZ Testt TestCohen's d Nonparamet ric Wilcoxon Rank Sum Test Kruskal-Walli H-Test Kolmogorov- Smirnov test 5

Parametric Test Procedures  Tests Population Parameters (e.g. Mean)  Distribution Assumptions (e.g. Normal distribution)  Examples: Z Test, t-Test,  2 Test, F test 6

Nonparametric Test Procedures  Not Related to Population Parameters Example: Probability Distributions, Independence  Data Values not Directly Used Uses Ordering of Data Examples: Wilcoxon Rank Sum Test, Komogorov-Smirnov Test 7

In class experiment with R  Left= c(20,5,500,15,30)  Right = c(0,50,70,100)  t.test(Left, Right, alternative=c("two.sided","less","greater"), var.equal=TRUE, conf.level=0.95) Two-sample t-Test 8

t-Test (Independent Samples) H 0 : μ 1 - μ 2 = 0 H 1 : μ 1 - μ 2 ≠ 0 The goal is to evaluate if the average difference between two populations is zero The t-test makes the following assumptions The values in X (0) and X (1) follow a normal distribution Observations are independent Two hypotheses: 9

General t formula t = sample statistic - hypothesized population parameter estimated standard error Independent samples t Empirical averages Estimated standard deviation?? t-Test Calculation 10

Standard deviation of difference in empirical averages t-Test: Standard Deviation Calculation How much variance when we use average difference of observations to represent the true average difference? 11

t-Test: Standard Deviation Calculation (2/2) Standard deviation of difference in empirical averages with degrees of freedom Also known as Welsh’s t Sample variance of X (0) Number of observations in X (0) 12

 What is the p-value?  Can we ever accept hypothesis H 1 ? 13 t-Statistics p-value H 0 : μ 1 - μ 2 = 0 H 1 : μ 1 - μ 2 ≠ 0

14

t-Test tests only if the difference is zero or not. What about effect size? Cohen’s d where s is the pooled variance t-Test: Effect Size 15

16

 Probability of hypothesis given data  The Bayes factor 17 Bayesian Approach

18

 Two-sample Kolmogorov-Smirnov Test ◦ Do X (0) and X (1) come from same underlying distribution? ◦ Hypothesis (same distribution) rejected at level p if 19 Nonparametric Testing of Distributions Wikipedia Empirical Confidence interval factor Sample size correction The K-S test is less sensitive when the differences between curves is greatest at the beginning or the end of the distributions. Works best when distributions differ at center. Good reading: M. Tygert, Statistical tests for whether a given set of independent, identically distributed draws comes from a specified probability density. PNAS 2010

20

 Twitter users can have gender and number of tweets.  We want to determine whether gender is related to number of tweets.  Use chi-square test for independence Chi-Squared Test 21

 When to use chi-square test for independence: ◦ Uniform sampling design ◦ Categorical features ◦ Population is significantly larger than sample  State the hypotheses: ◦ H 0 ? ◦ H 1 ? When to use Chi-Squared test 22

men = c(300, 100, 40) women = c(350, 200, 90) data = as.data.frame(rbind(men, women)) names(data) = c('low', 'med', 'large') data chisq.test(data) Reject H 0 (p<0.05) means … Example Chi-Squared Test 23

24

 Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Assign half the readers to headline A and half to headline B? ◦ Yes? ◦ No? ◦ Which test to use? What happens A is MUCH better than B? Revisiting The New York Times Dilemma 25

 How to stop experiment early if hypothesis seems true ◦ Stopping criteria often needs to be decided before experiment starts ◦ If ever needed: 26 Sequential Analysis (Sequential Hypothesis Test)

27 But there is a better way…

 K distinct hypotheses (so far we had K = 2) ◦ Hypothesis = choosing NYT headline  Each time we pull arm i we get reward X i (simple version of problem)  Underlying population (reward distribution) does not change over time  Bandit algorithms attempt to minimize regret ◦ If n = total actions ; n i = total actions i 28 Bandit Algorithms largest true average

 Note that regret is defined over the true average reward  How can we estimate true average reward X i ? ◦ We need to get lots of observations from population i ◦ But what happens if E[ X i ] is small?  Core of decision making problems: ◦ Exploration vs. exploitation ◦ When exploring we seek to improve estimated average reward ◦ When exploiting we try what has worked better in the past  Balancing exploration and exploitation: ◦ Instead of trying the action with highest estimated average, we try the action with the highest upper bound on its confidence interval (more on this next class) 29 Challenge

 Multi-Armed Bandit (MAB) ◦ Bandit process is a special type of Markov Decision Process ◦ Generally, reward X i (n i ) at n i –th arm pull of arm i is P[ X i (n i ) | X i (n i - 1)]  UCB 1 ◦ Use arm i that maximizes 30 UCB 1 (Upper Confidence Bound 1 )

31 R Example numT < # number of time steps ttest <- c() # mean of population 1 mean1 = 0.4 # mean of population 2 mean2 = 0.7 # initialize observations x1 <- c(rbinom(n=1,size=1,prob=mean1)) x2 <- c(rbinom(n=1,size=1,prob=mean2)) n1 = 1 n2 = 1 for (i in 2:numT){ # compute reward of bandit 1 reward_1 = mean(x1) + sqrt(2*log(i)/n1) # compute reward of bandit 2 reward_2 = mean(x2) + sqrt(2*log(i)/n2) # decides which arm to pull if (reward_1 > reward_2) { x1 <- c(rbinom(n=1,size=1,prob=mean1),x1) n1 = n1 + 1 } else { x2 <- c(rbinom(n=1,size=1,prob=mean2),x2) n2 = n2 + 1 } # computes the t-Test p-value of observations if ((n1 > 2) && (n2 > 2)) { d <- t.test(x1,x2)$p.value } else { d = 1 } ttest <- c(ttest, d) } par(mfrow=c(1,2)) plot(2:numT,ttest,"l",xlab="Time",ylab="t-Test pvalue",log="y") barplot(c(n1,n2),xlab="Arm Pulls",ylab="Observations",names.arg=c("n0", "n1"))