Nov 7, 2013 Lirong Xia Hypothesis testing and statistical decision theory.

Slides:



Advertisements
Similar presentations
1/2/2014 (c) 2001, Ron S. Kenett, Ph.D.1 Parametric Statistical Inference Instructor: Ron S. Kenett Course Website:
Advertisements

Advanced Piloting Cruise Plot.
Introductory Mathematics & Statistics for Business
STATISTICS HYPOTHESES TEST (I)
Decision Analysis and Its Applications to Systems Engineering The Hampton Roads Area International Council on Systems Engineering (HRA INCOSE) chapter.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
C82MST Statistical Methods 2 - Lecture 2 1 Overview of Lecture Variability and Averages The Normal Distribution Comparing Population Variances Experimental.
SADC Course in Statistics Tests for Variances (Session 11)
STATISTICAL INFERENCE ABOUT MEANS AND PROPORTIONS WITH TWO POPULATIONS
Chapter 7 Sampling and Sampling Distributions
Chapter 17/18 Hypothesis Testing
Hypothesis Test II: t tests
Department of Engineering Management, Information and Systems
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
Randomized Algorithms Randomized Algorithms CS648 1.
6. Statistical Inference: Example: Anorexia study Weight measured before and after period of treatment y i = weight at end – weight at beginning For n=17.
(This presentation may be used for instructional purposes)
Detection Chia-Hsin Cheng. Wireless Access Tech. Lab. CCU Wireless Access Tech. Lab. 2 Outlines Detection Theory Simple Binary Hypothesis Tests Bayes.
ABC Technology Project
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
Chapter 7, Sample Distribution
Hypothesis Tests: Two Independent Samples
Chapter 4 Inference About Process Quality
Squares and Square Root WALK. Solve each problem REVIEW:
Type I & Type II errors Brian Yuen 18 June 2013.
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Chapter 8: Introduction to Hypothesis Testing. 2 Hypothesis Testing An inferential procedure that uses sample data to evaluate the credibility of a hypothesis.
Addition 1’s to 20.
25 seconds left…...
Putting Statistics to Work
Week 1.
Statistical Inferences Based on Two Samples
Copyright © 2010 Pearson Addison-Wesley. All rights reserved. Chapter 10 One- and Two-Sample Tests of Hypotheses.
Hypothesis Testing Variance known?. Sampling Distribution n Over-the-counter stock selling prices calculate average price of all stocks listed [  ]calculate.
We will resume in: 25 Minutes.
CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
IP, IST, José Bioucas, Probability The mathematical language to quantify uncertainty  Observation mechanism:  Priors:  Parameters Role in inverse.
Statistically-Based Quality Improvement
Testing Hypotheses About Proportions
Simple Linear Regression Analysis
Oct 9, 2014 Lirong Xia Hypothesis testing and statistical decision theory.
Multiple Regression and Model Building
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
Classification Classification Examples
Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.
Chapter 26 Comparing Counts
1 Review Lecture: Guide to the SSSII Assignment Gwilym Pryce 5 th March 2006.
Oct 14, 2014 Lirong Xia Recommender systems acknowledgment: Li Zhang, UCSC.
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Oct 6, 2014 Lirong Xia Judgment aggregation acknowledgment: Ulle Endriss, University of Amsterdam.
Hypothesis Testing.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Not in FPP Bayesian Statistics. The Frequentist paradigm Defines probability as a long-run frequency independent, identical trials Looks at parameters.
Descriptive and Inferential Statistics Descriptive statistics The science of describing distributions of samples or populations Inferential statistics.
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
More about tests and intervals CHAPTER 21. Do not state your claim as the null hypothesis, instead make what you’re trying to prove the alternative. The.
Hypothesis testing and statistical decision theory
AP Statistics: Chapter 21
Presentation transcript:

Nov 7, 2013 Lirong Xia Hypothesis testing and statistical decision theory

Hypothesis testing (definitions) Statistical decision theory –a more general framework for statistical inference –try to explain the scene behind tests 1 Schedule

The average GRE quantitative score of –RPI graduate students vs. –national average: 558(139) Method 1: compute the average score of all RPI graduate students and compare to national average End of class 2 An example

Two heuristic algorithms: which one runs faster in general? Method 1: compare them on all instances Method 2: compare them on a few “randomly” generated instances 3 Another example

You have a random variable X –you know the shape of X: normal the standard deviation of X: 1 –you don’t know the mean of X After observing one sample of X (with value x ), what can you say when comparing the mean to 0? –what if you see 10? –what if you see 2? –what if you see 1? 4 Simplified problem: one sample location test

Method 1 –if x >1.645 then say the mean is strictly positive Method 2 –if x < then say the mean is strictly negative Method 3 –if x 1.96 then say the mean is non- zero How should we evaluate these methods? 5 Some quick answers

Given a statistical model –parameter space: Θ –sample space: S –Pr( s | θ ) H 1 : the alternative hypothesis –H 1 ⊆ Θ –the set of parameters you think contain the ground truth H 0 : the null hypothesis –H 0 ⊆ Θ –H 0 ∩H 1 = ∅ –the set of parameters you want to test (and ideally reject) Output of the test –reject the null: suppose the ground truth is in H 0, it is unlikely that we see what we observe in the data –retain the null: we don’t have enough evidence to reject the null 6 The null and alternative hypothesis

Combination 1 (one-sided, right tail) –H 1 : mean>0 –H 0 : mean=0 (why not mean<0?) Combination 2 (one-sided, left tail) –H 1 : mean<0 –H 0 : mean=0 Combination 3 (two-sided) –H 1 : mean≠0 –H 0 : mean=0 A hypothesis test is a mapping f : S{reject, retain} 7 One sample location test

H 1 : mean>0 H 0 : mean=0 Parameterized by a number 0< α <1 –is called the level of significance Let x α be such that Pr(X> x α |H 0 )= α – x α is called the critical value Output reject, if – x>x α, or Pr(X> x |H 0 )< α Pr(X> x |H 0 ) is called the p-value Output retain, if – x≤x α, or p-value≥ α 8 One-sided Z-test 0 xαxα α

Popular values of α : –5%: x α = std (somewhat confident) –1%: x α = 2.33 std (very confident) α is the probability that given mean=0, a randomly generated data will leads to “reject” –Type I error 9 Interpreting level of significance 0 xαxα α

H 1 : mean≠0 H 0 : mean=0 Parameterized by a number 0< α <1 Let x α be such that 2Pr(X> x α |H 0 )= α Output reject, if – x>x α, or x<x α Output retain, if – -x α ≤x≤x α 10 Two-sided Z-test 0 xαxα α -x α

One/two-sided Z test: hypothesis tests for one sample location test (for different H 1 ’s) Outputs either to “reject” or “retain” the null hypothesis And defined a lot of seemingly fancy terms on the way –null/alternative hypothesis –level of significance –critical value –p-value –Type I error 11 What we have learned so far…

What the heck are you doing by using different H 1 ? –the description of the tests does not depend on the selection of H 1 –if we reject H 0 using one-sided test (mean>0), shouldn’t we already be able to say mean≠0? Why need two-sided test? What the heck are you doing by saying “reject” and “retain” –Can’t you just predict whether the ground truth is in H 0 or H 1 ? 12 Questions that haunted me when I first learned these

What is a “correct” answer given by a test? –when the ground truth is in H 0, retain the null (≈saying that the ground truth is in H 0 ) –when the ground truth is in H 1, reject the null (≈saying that the ground truth is in H 1 ) –only consider cases where θ ∈ H 0 ∪ H 1 Two types of errors –Type I: wrongly reject H 0, false alarm –Type II: wrongly retain H 0, fail to raise the alarm –Which is more serious? 13 Evaluation of hypothesis tests

14 Type I and Type II errors Output RetainReject Ground truth in H0H0 size: 1- α Type I: α H1H1 Type II: β power: 1- β Type I: the max error rate for all θ ∈ H 0 α =sup θ ∈ H 0 Pr(false alarm| θ ) Type II: the error rate given θ ∈ H 1 Is it possible to design a test where α = β =0? –usually impossible, needs a tradeoff

One-sided Z-test –we can freely control Type I error –for Type II, fix some θ ∈ H 1 15 Illustration 0 α: Type I error θ β: Type II error Output RetainReject Ground truth in H0H0 size: 1- α Type I: α H1H1 Type II: β power: 1- β

Errors for one-sided Z-test Errors for two-sided Z-test, same α 16 Using two-sided Z-test for one-sided hypothesis 0 θ α: Type I error Type II error α: Type I error Type II error

H 1 : mean>0 H 0 : mean≤0 (vs. mean=0) Sup θ ≤0 Pr(false alarm| θ )=Pr(false alarm| θ=0 ) –Type I error is the same Type II error is also the same for any θ>0 Any better tests? 17 Using one-sided Z-test for a set-valued null hypothesis

A hypothesis test f is uniformly most powerful (UMP), if –for any other test f’ with the same Type I error –for any θ ∈ H 1, we have Type II error of f < Type II error of f’ Corollary of Karlin-Rubin theorem: One- sided Z-test is a UMP for H 0 :≤0 and H 1 :>0 –generally no UMP for two-sided tests 18 Optimal hypothesis tests

Tell you the H 0 and H 1 used in the test –e.g., H0:mean≤0 and H1:mean>0 Tell you the test statistic, which is a function from data to a scalar –e.g., compute the mean of the data For any given α, specify a region of test statistic that will leads to the rejection of H 0 –e.g., 19 Template of other tests 0

Step 1: look for a type of test that fits your problem (from e.g. wiki) Step 2: choose H 0 and H 1 Step 3: choose level of significance α Step 4: run the test 20 How to do test for your problem?

Given –statistical model: Θ, S, Pr( s | θ ) –decision space: D –loss function: L( θ, d ) ∈ℝ We want to make a decision based on observed generated data –decision function f : dataD 21 Statistical decision theory

D={reject, retain} L( θ, reject)= –1, if θ ∈ H 1 –0, if θ ∈ H 0 (type I error) L( θ, retain)= –1, if θ ∈ H 0 –0, if θ ∈ H 1 (type II error) 22 Hypothesis testing as a decision problem

Given data and the decision d –EL B (data, d ) = E θ |data L( θ,d ) Compute a decision that minimized EL for a given the data 23 Bayesian expected loss

Given the ground truth θ and the decision function f –EL F ( θ, f ) = E data| θ L( θ,f ( θ )) Compute a decision function with small EL for all possible ground truth –c.f. uniformly most powerful test: for all θ ∈ H 1, the UMP test always has the lowest expected loss (Type II error) 24 Frequentist expected loss

Problem: make a decision based on randomly generated data Z-test –null/alternative hypothesis –level of significance –reject/retain Statistical decision theory framework –Bayesian expected loss –Frequentist expected loss 25 Recap