1 Interpretation of the Test Statistic or: basic Hypothesis Testing, with applications, in 15 minutes Patrick Nolan Stanford University GLAST LAT DC2 Kickoff.

Slides:



Advertisements
Similar presentations
Introductory Mathematics & Statistics for Business
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Tests of Hypotheses Based on a Single Sample
Brief introduction on Logistic Regression
Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
11.1 – Significance Tests: The Basics
Chapter 9 Hypothesis Testing Understandable Statistics Ninth Edition
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and Alternative Hypotheses Type I and Type II Errors Type I and Type II Errors.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
Chapter 9 Hypothesis Testing
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
Likelihood ratio tests
Chapter 7: Statistical Applications in Traffic Engineering
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Maximum likelihood (ML) and likelihood ratio (LR) test
Hypothesis testing Some general concepts: Null hypothesisH 0 A statement we “wish” to refute Alternative hypotesisH 1 The whole or part of the complement.
Elementary hypothesis testing
An Inference Procedure
Maximum likelihood (ML) and likelihood ratio (LR) test
Pengujian Hipotesis Nilai Tengah Pertemuan 19 Matakuliah: I0134/Metode Statistika Tahun: 2007.
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Evaluating Hypotheses
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
Inference about a Mean Part II
Experimental Evaluation
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Maximum likelihood (ML)
Sample Size Determination Ziad Taib March 7, 2014.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 11 Introduction to Hypothesis Testing.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
March  There is a maximum of one obtuse angle in a triangle, but can you prove it?  To prove something like this, we mathematicians must do a.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Estimation of Statistical Parameters
Statistical Decision Theory
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Chapter 9 Tests of Hypothesis Single Sample Tests The Beginnings – concepts and techniques Chapter 9A.
Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Fermi LAT Monash University Nov 21, 2009 R.DuboisFermi LAT Science Analysis Tutorial1 Issues in a Nutshell LS5039 Low stats: 4k photons in 1 yr Strong.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
CHAPTER 9 Testing a Claim
"Classical" Inference. Two simple inference scenarios Question 1: Are we in world A or world B?
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
STA 2023 Module 5 Discrete Random Variables. Rev.F082 Learning Objectives Upon completing this module, you should be able to: 1.Determine the probability.
STA Lecture 221 !! DRAFT !! STA 291 Lecture 22 Chapter 11 Testing Hypothesis – Concepts of Hypothesis Testing.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
One Sample Inf-1 In statistical testing, we use deductive reasoning to specify what should happen if the conjecture or null hypothesis is true. A study.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Lec. 19 – Hypothesis Testing: The Null and Types of Error.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
Hypothesis Testing Chapter Hypothesis Testing  Developing Null and Alternative Hypotheses  Type I and Type II Errors  One-Tailed Tests About.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Unit 5: Hypothesis Testing
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Presentation transcript:

1 Interpretation of the Test Statistic or: basic Hypothesis Testing, with applications, in 15 minutes Patrick Nolan Stanford University GLAST LAT DC2 Kickoff 2 March 2006

2 The Likelihood Ratio Likelihood is defined to be the probability of observing the data, assuming that our model is correct. L(θ)  P(x|θ) Here x is the observed data and θ is the parameter(s) of the model. Likelihood is a function of the model parameters (aka the “hypothesis”). Suppose there are two models with parameter(s) θ 0 and θ 1. Typically θ 0 represents a “null” hypothesis (for instance, no point source is present) while θ 1 represents an “alternate” hypothesis (for instance, there is a point source). The likelihood ratio is Λ  L(θ 0 )/L(θ 1 ). If Λ is small, then the alternate hypothesis explains the data better than the null hypothesis. This needs to be made quantitative.

3 The Power of a Statistical Test In hypothesis testing, we decide whether we think θ 0 or θ 1 is the best explanation for the data. There are two ways we could go wrong: We would like to have both α and  be small, but there are tradeoffs. The usual procedure is to design a statistical test so that α is fixed at some value, called the size or significance level of the test. For a single test, a number like 5% might be OK. When looking for point sources in many places, a smaller α is needed because there are many opportunities for a Type 1 error. Once α is fixed, 1-  is called the power of the test. Large power means that real effects are unlikely to be missed. The likelihood ratio is useful in this context because of the Neyman-Pearson lemma, which says that the likelihood ratio is the “best” way to choose between hypotheses. If we choose the alternative hypothesis over the null when Λ < k, where P(Λ < k | θ 0 ) = α, then the results will be unbiased and the test is the most powerful available. Type 1 errorNull is true, but we choose alternate. (spurious detection) Prob = α Type 2 errorAlternate is true, but we choose null. (missed detection) Prob =  This is the notation used in every textbook.

4 Making it Quantitative Usually we deal with composite hypotheses. That is, θ isn’t a single point in parameter space, but we allow a range of values. Then we compare the best representatives of the two hypotheses: choose θ 0 to maximize L(θ 0 ) and θ 1 to maximize L(θ 1 ) in the allowed regions of parameter space. In order to use the likelihood ratio test (LRT) we need to be able to solve the equation P(Λ < k | θ 0 ) = α for k. In general the distribution of Λ is unknown, but Wilks’s Theorem gives a useful asymptotic expression. The alternate model must “include” the null. That is, the set of null parameters {θ 0 } must be a subset of {θ 1 }. For instance, θ 0 describes N point sources, while θ 1 has N+1 sources. When there are many events, -2 ln(Λ) ~  r 2 This is what we call TS (“test statistic”). Here r is the difference in the number of parameters in the null and alternate sets. This is the basis for the popular  2 and F tests. If r = 1, then the unit normal distribution! Thus a 3-sigma result requires ln(Λ) = Why doesn’t this work? See next page.

5 Conditions, caveats & quibbles How many photons do we need to use the asymptotic distribution? I’m not sure. The faintest EGRET detections on a strong background always had at least ~50. That’s certainly enough. Can GLAST detect a source with fewer? More seriously, Wilks’s Theorem doesn’t work for our most common situation. It is valid only under “regularity” conditions on the likelihood function and the area of parameter space we study. Example: We want to know if there is a point source at a certain position. The brightness of the source will be the only adjustable parameter in the alternate model. Of course the brightness must be ≥ 0. When the brightness  0, the alternate and null models are indistinguishable. This is one of the regularity conditions. What are the consequences?

6 EGRET pathology: not so bad Extensive simulations were done using the EGRET likelihood program and a realistic background model, with no point sources. The histogram of test statistic values doesn’t follow the  1 2 distribution. It’s low by a factor of 2. This discrepancy isn’t surprising. Half of the simulations would produce a better fit with negative source brightness. This isn’t allowed, so Λ = 1 (TS = 0) in all these cases. There should be a δ-function at 0 in the graph. Statisticians call the resulting distribution ½  ½  1 2.

7 GLAST pathology: ??? We are in the early stages of similar simulations for GLAST. The results are harder to understand. In this example, about ¾ of the cases result in TS = 0, rather than the expected half. About half of the positive TS values are < 0.1. The distribution cuts off at large TS more sharply than a  2 should. If this type of behavior persists, the interpretation of TS values will be more difficult. We will need to use simulations to produce probability tables.

8 Final Words This is by no means everything we need to know about statistics. I have said nothing about parameter estimation, upper limits, or comparing models which are not “nested”. Finding an efficient method to optimize the parameter values is a major effort. The problem of multiple point sources is an example of a “mixture model”. How do we decide when to stop adding more sources? That’s cutting-edge research in statistics. I have also skipped over the Bayesian method for dealing with the hypothesis testing problem. That could be a whole other talk. Some of us have been talking with Ramani Pilla of the Statistics dept. at Case Western. She has a novel method which avoids the use of Wilks’s Theorem. The computation of probabilities is quite involved, but it should be tractable for comparisons with only one additional parameter.

9 References The ultimate reference for all things statistical is Kendall & Stuart, “The Advanced Theory of Statistics”. I have consulted the 1979 edition, Volume 2. It is very dense and mathematical. More accessible versions can be found in Barlow’s “Statistics” and Cowan’s “Statistical Data Analysis”, both written for physicists. These books are a bit expensive, but I like them. They consider both Bayesian and frequentist methods. A cheaper alternative is Wall & Jenkins “Practical Statistics for Astronomers”. It tends to skimp on the theory, but it could be useful. The downfall of the LRT was pointed out clearly by Protassov et al. 2002, ApJ 571, 545. Pilla’s method is described in Pilla et al. 2005, PRL 95, The EGRET likelihood method is explained by Mattox et al. 1996, ApJ 461, 396.