Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hypothesis Testing.  Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Do people.

Similar presentations


Presentation on theme: "Hypothesis Testing.  Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Do people."— Presentation transcript:

1 Hypothesis Testing

2  Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Do people click more on headline A or B? The New York Times Daily Dilemma 2

3  Two Populations Testing Hypotheses 12 ? ? 10 9 ? ? 4 ? Which one has the largest average? ? ? 3

4 The two-sample t-test Is difference in averages between two groups more than we would expect based on chance alone? 4

5 More Broadly: Hypothesis Testing Procedures Hypothesis Testing Procedures ParametricZ Testt TestCohen's d Nonparamet ric Wilcoxon Rank Sum Test Kruskal-Walli H-Test Kolmogorov- Smirnov test 5

6 Parametric Test Procedures  Tests Population Parameters (e.g. Mean)  Distribution Assumptions (e.g. Normal distribution)  Examples: Z Test, t-Test,  2 Test, F test 6

7 Nonparametric Test Procedures  Not Related to Population Parameters Example: Probability Distributions, Independence  Data Values not Directly Used Uses Ordering of Data Examples: Wilcoxon Rank Sum Test, Komogorov-Smirnov Test 7

8 In class experiment with R  Left= c(20,5,500,15,30)  Right = c(0,50,70,100)  t.test(Left, Right, alternative=c("two.sided","less","greater"), var.equal=TRUE, conf.level=0.95) Two-sample t-Test 8

9 t-Test (Independent Samples) H 0 : μ 1 - μ 2 = 0 H 1 : μ 1 - μ 2 ≠ 0 The goal is to evaluate if the average difference between two populations is zero The t-test makes the following assumptions The values in X (0) and X (1) follow a normal distribution Observations are independent Two hypotheses: 9

10 General t formula t = sample statistic - hypothesized population parameter estimated standard error Independent samples t Empirical averages Estimated standard deviation?? t-Test Calculation 10

11 Standard deviation of difference in empirical averages t-Test: Standard Deviation Calculation How much variance when we use average difference of observations to represent the true average difference? 11

12 t-Test: Standard Deviation Calculation (2/2) Standard deviation of difference in empirical averages with degrees of freedom Also known as Welsh’s t Sample variance of X (0) Number of observations in X (0) 12

13  What is the p-value?  Can we ever accept hypothesis H 1 ? 13 t-Statistics p-value H 0 : μ 1 - μ 2 = 0 H 1 : μ 1 - μ 2 ≠ 0

14 14

15 t-Test tests only if the difference is zero or not. What about effect size? Cohen’s d where s is the pooled variance t-Test: Effect Size 15

16 16

17  Probability of hypothesis given data  The Bayes factor 17 Bayesian Approach

18 18

19  Two-sample Kolmogorov-Smirnov Test ◦ Do X (0) and X (1) come from same underlying distribution? ◦ Hypothesis (same distribution) rejected at level p if 19 Nonparametric Testing of Distributions Wikipedia Empirical Confidence interval factor Sample size correction The K-S test is less sensitive when the differences between curves is greatest at the beginning or the end of the distributions. Works best when distributions differ at center. Good reading: M. Tygert, Statistical tests for whether a given set of independent, identically distributed draws comes from a specified probability density. PNAS 2010

20 20

21  Twitter users can have gender and number of tweets.  We want to determine whether gender is related to number of tweets.  Use chi-square test for independence Chi-Squared Test 21

22  When to use chi-square test for independence: ◦ Uniform sampling design ◦ Categorical features ◦ Population is significantly larger than sample  State the hypotheses: ◦ H 0 ? ◦ H 1 ? When to use Chi-Squared test 22

23 men = c(300, 100, 40) women = c(350, 200, 90) data = as.data.frame(rbind(men, women)) names(data) = c('low', 'med', 'large') data chisq.test(data) Reject H 0 (p<0.05) means … Example Chi-Squared Test 23

24 24

25  Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Assign half the readers to headline A and half to headline B? ◦ Yes? ◦ No? ◦ Which test to use? What happens A is MUCH better than B? Revisiting The New York Times Dilemma 25

26  How to stop experiment early if hypothesis seems true ◦ Stopping criteria often needs to be decided before experiment starts ◦ If ever needed: 26 Sequential Analysis (Sequential Hypothesis Test)

27 27 But there is a better way…

28  K distinct hypotheses (so far we had K = 2) ◦ Hypothesis = choosing NYT headline  Each time we pull arm i we get reward X i (simple version of problem)  Underlying population (reward distribution) does not change over time  Bandit algorithms attempt to minimize regret ◦ If n = total actions ; n i = total actions i 28 Bandit Algorithms largest true average

29  Note that regret is defined over the true average reward  How can we estimate true average reward X i ? ◦ We need to get lots of observations from population i ◦ But what happens if E[ X i ] is small?  Core of decision making problems: ◦ Exploration vs. exploitation ◦ When exploring we seek to improve estimated average reward ◦ When exploiting we try what has worked better in the past  Balancing exploration and exploitation: ◦ Instead of trying the action with highest estimated average, we try the action with the highest upper bound on its confidence interval (more on this next class) 29 Challenge

30  Multi-Armed Bandit (MAB) ◦ Bandit process is a special type of Markov Decision Process ◦ Generally, reward X i (n i ) at n i –th arm pull of arm i is P[ X i (n i ) | X i (n i - 1)]  UCB 1 ◦ Use arm i that maximizes 30 UCB 1 (Upper Confidence Bound 1 )

31 31 R Example numT <- 2500 # number of time steps ttest <- c() # mean of population 1 mean1 = 0.4 # mean of population 2 mean2 = 0.7 # initialize observations x1 <- c(rbinom(n=1,size=1,prob=mean1)) x2 <- c(rbinom(n=1,size=1,prob=mean2)) n1 = 1 n2 = 1 for (i in 2:numT){ # compute reward of bandit 1 reward_1 = mean(x1) + sqrt(2*log(i)/n1) # compute reward of bandit 2 reward_2 = mean(x2) + sqrt(2*log(i)/n2) # decides which arm to pull if (reward_1 > reward_2) { x1 <- c(rbinom(n=1,size=1,prob=mean1),x1) n1 = n1 + 1 } else { x2 <- c(rbinom(n=1,size=1,prob=mean2),x2) n2 = n2 + 1 } # computes the t-Test p-value of observations if ((n1 > 2) && (n2 > 2)) { d <- t.test(x1,x2)$p.value } else { d = 1 } ttest <- c(ttest, d) } par(mfrow=c(1,2)) plot(2:numT,ttest,"l",xlab="Time",ylab="t-Test pvalue",log="y") barplot(c(n1,n2),xlab="Arm Pulls",ylab="Observations",names.arg=c("n0", "n1"))


Download ppt "Hypothesis Testing.  Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Do people."

Similar presentations


Ads by Google