Download presentation
Presentation is loading. Please wait.
Published byNeil Neal Modified over 9 years ago
1
Hypothesis Testing
2
Select 50% users to see headline A ◦ Titanic Sinks Select 50% users to see headline B ◦ Ship Sinks Killing Thousands Do people click more on headline A or B? The New York Times Daily Dilemma 2
3
Two Populations Testing Hypotheses 12 ? ? 10 9 ? ? 4 ? Which one has the largest average? ? ? 3
4
The two-sample t-test Is difference in averages between two groups more than we would expect based on chance alone? 4
5
More Broadly: Hypothesis Testing Procedures Hypothesis Testing Procedures ParametricZ Testt TestCohen's d Nonparamet ric Wilcoxon Rank Sum Test Kruskal-Walli H-Test Kolmogorov- Smirnov test 5
6
Parametric Test Procedures Tests Population Parameters (e.g. Mean) Distribution Assumptions (e.g. Normal distribution) Examples: Z Test, t-Test, 2 Test, F test 6
7
Nonparametric Test Procedures Not Related to Population Parameters Example: Probability Distributions, Independence Data Values not Directly Used Uses Ordering of Data Examples: Wilcoxon Rank Sum Test, Komogorov-Smirnov Test 7
8
In class experiment with R Left= c(20,5,500,15,30) Right = c(0,50,70,100) t.test(Left, Right, alternative=c("two.sided","less","greater"), var.equal=TRUE, conf.level=0.95) Two-sample t-Test 8
9
t-Test (Independent Samples) H 0 : μ 1 - μ 2 = 0 H 1 : μ 1 - μ 2 ≠ 0 The goal is to evaluate if the average difference between two populations is zero The t-test makes the following assumptions The values in X (0) and X (1) follow a normal distribution Observations are independent Two hypotheses: 9
10
General t formula t = sample statistic - hypothesized population parameter estimated standard error Independent samples t Empirical averages Estimated standard deviation?? t-Test Calculation 10
11
Standard deviation of difference in empirical averages t-Test: Standard Deviation Calculation How much variance when we use average difference of observations to represent the true average difference? 11
12
t-Test: Standard Deviation Calculation (2/2) Standard deviation of difference in empirical averages with degrees of freedom Also known as Welsh’s t Sample variance of X (0) Number of observations in X (0) 12
13
What is the p-value? Can we ever accept hypothesis H 1 ? 13 t-Statistics p-value H 0 : μ 1 - μ 2 = 0 H 1 : μ 1 - μ 2 ≠ 0
14
14
15
t-Test tests only if the difference is zero or not. What about effect size? Cohen’s d where s is the pooled variance t-Test: Effect Size 15
16
16
17
Probability of hypothesis given data The Bayes factor 17 Bayesian Approach
18
18
19
Two-sample Kolmogorov-Smirnov Test ◦ Do X (0) and X (1) come from same underlying distribution? ◦ Hypothesis (same distribution) rejected at level p if 19 Nonparametric Testing of Distributions Wikipedia Empirical Confidence interval factor Sample size correction The K-S test is less sensitive when the differences between curves is greatest at the beginning or the end of the distributions. Works best when distributions differ at center. Good reading: M. Tygert, Statistical tests for whether a given set of independent, identically distributed draws comes from a specified probability density. PNAS 2010
20
20
21
Twitter users can have gender and number of tweets. We want to determine whether gender is related to number of tweets. Use chi-square test for independence Chi-Squared Test 21
22
When to use chi-square test for independence: ◦ Uniform sampling design ◦ Categorical features ◦ Population is significantly larger than sample State the hypotheses: ◦ H 0 ? ◦ H 1 ? When to use Chi-Squared test 22
23
men = c(300, 100, 40) women = c(350, 200, 90) data = as.data.frame(rbind(men, women)) names(data) = c('low', 'med', 'large') data chisq.test(data) Reject H 0 (p<0.05) means … Example Chi-Squared Test 23
24
24
25
Select 50% users to see headline A ◦ Titanic Sinks Select 50% users to see headline B ◦ Ship Sinks Killing Thousands Assign half the readers to headline A and half to headline B? ◦ Yes? ◦ No? ◦ Which test to use? What happens A is MUCH better than B? Revisiting The New York Times Dilemma 25
26
How to stop experiment early if hypothesis seems true ◦ Stopping criteria often needs to be decided before experiment starts ◦ If ever needed: 26 Sequential Analysis (Sequential Hypothesis Test)
27
27 But there is a better way…
28
K distinct hypotheses (so far we had K = 2) ◦ Hypothesis = choosing NYT headline Each time we pull arm i we get reward X i (simple version of problem) Underlying population (reward distribution) does not change over time Bandit algorithms attempt to minimize regret ◦ If n = total actions ; n i = total actions i 28 Bandit Algorithms largest true average
29
Note that regret is defined over the true average reward How can we estimate true average reward X i ? ◦ We need to get lots of observations from population i ◦ But what happens if E[ X i ] is small? Core of decision making problems: ◦ Exploration vs. exploitation ◦ When exploring we seek to improve estimated average reward ◦ When exploiting we try what has worked better in the past Balancing exploration and exploitation: ◦ Instead of trying the action with highest estimated average, we try the action with the highest upper bound on its confidence interval (more on this next class) 29 Challenge
30
Multi-Armed Bandit (MAB) ◦ Bandit process is a special type of Markov Decision Process ◦ Generally, reward X i (n i ) at n i –th arm pull of arm i is P[ X i (n i ) | X i (n i - 1)] UCB 1 ◦ Use arm i that maximizes 30 UCB 1 (Upper Confidence Bound 1 )
31
31 R Example numT <- 2500 # number of time steps ttest <- c() # mean of population 1 mean1 = 0.4 # mean of population 2 mean2 = 0.7 # initialize observations x1 <- c(rbinom(n=1,size=1,prob=mean1)) x2 <- c(rbinom(n=1,size=1,prob=mean2)) n1 = 1 n2 = 1 for (i in 2:numT){ # compute reward of bandit 1 reward_1 = mean(x1) + sqrt(2*log(i)/n1) # compute reward of bandit 2 reward_2 = mean(x2) + sqrt(2*log(i)/n2) # decides which arm to pull if (reward_1 > reward_2) { x1 <- c(rbinom(n=1,size=1,prob=mean1),x1) n1 = n1 + 1 } else { x2 <- c(rbinom(n=1,size=1,prob=mean2),x2) n2 = n2 + 1 } # computes the t-Test p-value of observations if ((n1 > 2) && (n2 > 2)) { d <- t.test(x1,x2)$p.value } else { d = 1 } ttest <- c(ttest, d) } par(mfrow=c(1,2)) plot(2:numT,ttest,"l",xlab="Time",ylab="t-Test pvalue",log="y") barplot(c(n1,n2),xlab="Arm Pulls",ylab="Observations",names.arg=c("n0", "n1"))
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.