Oct 9, 2014 Lirong Xia Hypothesis testing and statistical decision theory.

Slides:

Advertisements

Similar presentations

1/2/2014 (c) 2001, Ron S. Kenett, Ph.D.1 Parametric Statistical Inference Instructor: Ron S. Kenett Course Website:

Advertisements

Introductory Mathematics & Statistics for Business

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.

STATISTICS Sampling and Sampling Distributions

STATISTICS HYPOTHESES TEST (III) Nonparametric Goodness-of-fit (GOF) tests Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.

STATISTICS HYPOTHESES TEST (I)

STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.

Detection of Hydrological Changes – Nonparametric Approaches

STATISTICS Random Variables and Distribution Functions

Decision Analysis and Its Applications to Systems Engineering The Hampton Roads Area International Council on Systems Engineering (HRA INCOSE) chapter.

Thursday, March 7 Duality 2 – The dual problem, in general – illustrating duality with 2-person 0-sum game theory Handouts: Lecture Notes.

Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION

Chapter 7 Sampling and Sampling Distributions

Chapter 17/18 Hypothesis Testing

Department of Engineering Management, Information and Systems

Randomized Algorithms Randomized Algorithms CS648 1.

Elementary Statistics

6. Statistical Inference: Example: Anorexia study Weight measured before and after period of treatment y i = weight at end – weight at beginning For n=17.

Detection Chia-Hsin Cheng. Wireless Access Tech. Lab. CCU Wireless Access Tech. Lab. 2 Outlines Detection Theory Simple Binary Hypothesis Tests Bayes.

5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.

Hypothesis Tests: Two Independent Samples

Chapter 4 Inference About Process Quality

Chapter 8: Introduction to Hypothesis Testing. 2 Hypothesis Testing An inferential procedure that uses sample data to evaluate the credibility of a hypothesis.

25 seconds left…...

Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.

Statistical Inferences Based on Two Samples

1 MADE Why do we need econometrics? If there are two points and we want to know what relation describes that? X Y.

CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.

Testing Hypotheses About Proportions

Lecture 11 One-way analysis of variance (Chapter 15.2)

Simple Linear Regression Analysis

Multiple Regression and Model Building

Nov 7, 2013 Lirong Xia Hypothesis testing and statistical decision theory.

January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.

9. Two Functions of Two Random Variables

4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.

Basics of Statistical Estimation

Classification Classification Examples

EC941 - Game Theory Prof. Francesco Squintani Lecture 4 1.

NON - zero sum games.

Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.

Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.

Ch11 Curve Fitting Dr. Deshi Ye

Likelihood ratio tests

1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.

Chapter 9 Hypothesis Testing.

Oct 6, 2014 Lirong Xia Judgment aggregation acknowledgment: Ulle Endriss, University of Amsterdam.

Confidence Intervals and Hypothesis Testing - II

1 © Lecture note 3 Hypothesis Testing MAKE HYPOTHESIS ©

Hypothesis Testing State the hypotheses. Formulate an analysis plan. Analyze sample data. Interpret the results.

McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.

Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.

Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.

Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.

Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.

Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.

Hypothesis testing and statistical decision theory

CHAPTER 9 Testing a Claim

Hypothesis Testing: Hypotheses

CONCEPTS OF HYPOTHESIS TESTING

Chapter 9 Hypothesis Testing.

9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE

Review of Statistical Inference

Power Section 9.7.

CHAPTER 9 Testing a Claim

CHAPTER 9 Testing a Claim

CHAPTER 9 Testing a Claim

Presentation transcript:

Oct 9, 2014 Lirong Xia Hypothesis testing and statistical decision theory

Hypothesis testing Statistical decision theory –a more general framework for statistical inference –try to explain the scene behind tests Two applications of the minimax theorem –Yao’s minimax principle –Finding a minimax rule in statistical decision theory 1 Schedule

The average GRE quantitative score of –RPI graduate students vs. –national average: 558(139) Randomly sample some GRE Q scores of RPI graduate students and make a decision based on these 2 An example

You have a random variable X –you know the shape of X: normal the standard deviation of X: 1 –you don’t know the mean of X 3 Simplified problem: one sample location test

Given a statistical model –parameter space: Θ –sample space: S –Pr( s | θ ) H 1 : the alternative hypothesis –H 1 ⊆ Θ –the set of parameters you think contain the ground truth H 0 : the null hypothesis –H 0 ⊆ Θ –H 0 ∩H 1 = ∅ –the set of parameters you want to test (and ideally reject) Output of the test –reject the null: suppose the ground truth is in H 0, it is unlikely that we see what we observe in the data –retain the null: we don’t have enough evidence to reject the null 4 The null and alternative hypothesis

Combination 1 (one-sided, right tail) –H 1 : mean>0 –H 0 : mean=0 (why not mean<0?) Combination 2 (one-sided, left tail) –H 1 : mean<0 –H 0 : mean=0 Combination 3 (two-sided) –H 1 : mean≠0 –H 0 : mean=0 A hypothesis test is a mapping f : S{reject, retain} 5 One sample location test

H 1 : mean>0 H 0 : mean=0 Parameterized by a number 0< α <1 –is called the level of significance Let x α be such that Pr(X> x α |H 0 )= α – x α is called the critical value Output reject, if – x>x α, or Pr(X> x |H 0 )< α Pr(X> x |H 0 ) is called the p-value Output retain, if – x≤x α, or p-value≥ α 6 One-sided Z-test 0 xαxα α

Popular values of α : –5%: x α = std (somewhat confident) –1%: x α = 2.33 std (very confident) α is the probability that given mean=0, a randomly generated data will leads to “reject” –Type I error 7 Interpreting level of significance 0 xαxα α

H 1 : mean≠0 H 0 : mean=0 Parameterized by a number 0< α <1 Let x α be such that 2Pr(X> x α |H 0 )= α Output reject, if – x>x α, or x<x α Output retain, if – -x α ≤x≤x α 8 Two-sided Z-test 0 xαxα α -x α

What is a “correct” answer given by a test? –when the ground truth is in H 0, retain the null (≈saying that the ground truth is in H 0 ) –when the ground truth is in H 1, reject the null (≈saying that the ground truth is in H 1 ) –only consider cases where θ ∈ H 0 ∪ H 1 Two types of errors –Type I: wrongly reject H 0, false alarm –Type II: wrongly retain H 0, fail to raise the alarm –Which is more serious? 9 Evaluation of hypothesis tests

10 Type I and Type II errors Output RetainReject Ground truth in H0H0 size: 1- α Type I: α H1H1 Type II: β power: 1- β Type I: the max error rate for all θ ∈ H 0 α =sup θ ∈ H 0 Pr(false alarm| θ ) Type II: the error rate given θ ∈ H 1 Is it possible to design a test where α = β =0? –usually impossible, needs a tradeoff

One-sided Z-test –we can freely control Type I error –for Type II, fix some θ ∈ H 1 11 Illustration 0 α: Type I error θ β: Type II error Output RetainReject Ground truth in H0H0 size: 1- α Type I: α H1H1 Type II: β power: 1- β xαxα Type I: α Type II: β Black: One-sided Z-test Another test

Errors for one-sided Z-test Errors for two-sided Z-test, same α 12 Using two-sided Z-test for one-sided hypothesis 0 θ α: Type I error Type II error α: Type I error Type II error

H 0 : mean≤0 (vs. mean=0) H 1 : mean>0 sup θ ≤0 Pr(false alarm| θ )=Pr(false alarm| θ=0 ) –Type I error is the same Type II error is also the same for any θ>0 Any better tests? 13 Using one-sided Z-test for a set-valued null hypothesis

A hypothesis test f is uniformly most powerful (UMP), if –for any other test f’ with the same Type I error –for any θ ∈ H 1, Type II error of f < Type II error of f’ 14 Optimal hypothesis tests Corollary of Karlin-Rubin theorem: One-sided Z-test is a UMP for H 0 :≤0 and H 1 :>0 –generally no UMP for two-sided tests Type I: α Type II: β Black: UMP Any other test

Tell you the H 0 and H 1 used in the test –e.g., H 0 :mean≤0 and H 1 :mean>0 Tell you the test statistic, which is a function from data to a scalar –e.g., compute the mean of the data For any given α, specify a region of test statistic that will leads to the rejection of H 0 –e.g., 15 Template of other tests 0

Step 1: look for a type of test that fits your problem (from e.g. wiki) Step 2: choose H 0 and H 1 Step 3: choose level of significance α Step 4: run the test 16 How to do test for your problem?

Given –statistical model: Θ, S, Pr( s | θ ) –decision space: D –loss function: L( θ, d ) ∈ℝ We want to make a decision based on observed generated data –decision function f : dataD 17 Statistical decision theory

D={reject, retain} L( θ, reject)= –0, if θ ∈ H 1 –1, if θ ∈ H 0 (type I error) L( θ, retain)= –0, if θ ∈ H 0 –1, if θ ∈ H 1 (type II error) 18 Hypothesis testing as a decision problem

Given data and the decision d –EL B (data, d ) = E θ |data L( θ,d ) Compute a decision that minimized EL for a given the data 19 Bayesian expected loss

Given the ground truth θ and the decision function f –EL F ( θ, f ) = E data| θ L( θ,f ( data )) Compute a decision function with small EL for all possible ground truth –c.f. uniformly most powerful test: for all θ ∈ H 1, the UMP test always has the lowest expected loss (Type II error) A minimax decision rule f is argmin f max θ EL F ( θ, f ) –most robust against unknown parameter 20 Frequentist expected loss

21 Two interesting applications of game theory

For any simultaneous-move two player zero-sum game The value of a player’s mixed strategy s is her worst-case utility against against the other player –Value( s )=min s’ U ( s, s’ ) – s 1 is a mixed strategy for player 1 with maximum value – s 2 is a mixed strategy for player 2 with maximum value Theorem Value( s 1 )=-Value( s 2 ) [von Neumann] –( s 1, s 2 ) is an NE –for any s 1 ’ and s 2 ’, Value( s 1 ’ ) ≤ Value( s 1 )= -Value( s 2 ) ≤ - Value( s 2 ’ ) –to prove that s 1 * is minimax, it suffices to find s 2 * with Value( s 1 * )=-Value( s 2 * ) 22 The Minimax theorem

Question: how to prove a randomized algorithm A is (asymptotically) fastest? –Step 1: analyze the running time of A –Step 2: show that any other randomized algorithm runs slower for some input –but how to choose such a worst-case input for all other algorithms? Theorem [Yao 77] For any randomized algorithm A –the worst-case expected running time of A is more than –for any distribution over all inputs, the expected running time of the fastest deterministic algorithm against this distribution Example. You designed a O ( n 2 ) randomized algorithm, to prove that no other randomized algorithm is faster, you can –find a distribution π over all inputs (of size n ) –show that the expected running time of any deterministic algorithm on π is more than O ( n 2 ) 23 App1: Yao’s minimax principle

Two players: you, Nature Pure strategies –You: deterministic algorithms –Nature: inputs Payoff –You: negative expected running time –Nature: expected running time For any randomized algorithm A –largest expected running time on some input –is more than the expected running time of your best (mixed) strategy –=the expected running time of Nature’s best (mixed) strategy –is more than the smallest expected running time of any deterministic algorithm on any distribution over inputs 24 Proof

Guess a least favorable distribution π over the parameters –let f π denote its Bayesian decision rule –Proposition. f π minimizes the expected loss among all rules, i.e. f π = argmin f E θ ∽ π EL F ( θ, f ) Theorem. If for all θ, EL F ( θ, f π ) are the same, then f π is minimax 25 App2: finding a minimax rule?

Two players: you, Nature Pure strategies –You: deterministic decision rules –Nature: the parameter Payoff –You: negative frequentist loss, want to minimize the max frequentist loss –Nature: frequentist loss EL F ( θ, f ) = E data| θ L( θ,f ( data )), want to maximize the minimum frequentist loss Nee to prove that f π is minimax –suffices to show that there exists a mixed strategy π* for Nature π* is a distribution over Θ –such that for all rule f and all parameter θ, EL F ( π*, f ) ≥ EL F ( θ, f π ) –the equation holds for π*=π QED 26 Proof

Problem: make a decision based on randomly generated data Z-test –null/alternative hypothesis –level of significance –reject/retain Statistical decision theory framework –Bayesian expected loss –Frequentist expected loss Two applications of the minimax theorem 27 Recap