Lecture 3 - Hypothesis Testing and Inference with Likelihood

Slides:



Advertisements
Similar presentations
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Advertisements

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Today Concepts underlying inferential statistics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Inferential Statistics
Lecture 4 Model Selection and Multimodel Inference
AM Recitation 2/10/11.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Likelihood Methods in Ecology June 2 nd – 13 th, 2008 New York, NY Instructors: Charles Canham and María Uriarte Teaching Assistant Charles Yackulic.
Lecture Slides Elementary Statistics Twelfth Edition
1 © Lecture note 3 Hypothesis Testing MAKE HYPOTHESIS ©
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Estimation of Statistical Parameters
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Essential Statistics Chapter 131 Introduction to Inference.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Lecture 3 Hypothesis Testing and Statistical Inference using Likelihood: The Central Role of Models Likelihood Methods in Ecology April , 2011 Granada,
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
© Copyright McGraw-Hill 2004
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 9 Introduction to the t Statistic
Lecture Slides Elementary Statistics Twelfth Edition
HYPOTHESIS TESTING.
Module 10 Hypothesis Tests for One Population Mean
Virtual University of Pakistan
Psych 231: Research Methods in Psychology
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Lecture 4 Model Selection and Multimodel Inference
What Is a Test of Significance?
Unit 5: Hypothesis Testing
CHAPTER 9 Testing a Claim
Testing Hypotheses About Proportions
Understanding Results
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Chapter 25 Comparing Counts.
CHAPTER 9 Testing a Claim
Sampling and Sampling Distributions
CHAPTER 17: Tests of Significance: The Basics
CHAPTER 9 Testing a Claim
Discrete Event Simulation - 4
Geology Geomath Chapter 7 - Statistics tom.h.wilson
Essential Statistics Introduction to Inference
I. Statistical Tests: Why do we use them? What do they involve?
CHAPTER 9 Testing a Claim
Significance Tests: The Basics
Significance Tests: The Basics
Psych 231: Research Methods in Psychology
Testing Hypotheses About Proportions
CHAPTER 9 Testing a Claim
Chapter 26 Comparing Counts.
STAT 111 Introductory Statistics
Lecture 4 Model Selection and Multimodel Inference
Inferential Statistics
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
CHAPTER 9 Testing a Claim
Chapter 26 Comparing Counts.
CHAPTER 9 Testing a Claim
Lecture 4 Model Selection and Multimodel Inference
Lecture 3 - Hypothesis Testing and Inference with Likelihood
CHAPTER 9 Testing a Claim
Presentation transcript:

Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Lecture 3 Hypothesis Testing and Statistical Inference using Likelihood: The Central Role of Models

Outline… Statistical inference: it’s what we use statistics for, but there are some surprisingly tricky philosophical difficulties that have plagued statisticians for over a century… The “frequentist” vs. “likelihoodist” solutions Hypothesis testing as a process of comparing alternate models Examples – ANOVA and ANCOVA The issue of parsimony

Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Inference defined... “a : the act of passing from one proposition, statement, or judgment considered as true to another whose truth is believed to follow from that of the former b : the act of passing from statistical sample data to generalizations (as of the value of population parameters) usually with calculated degrees of certainty” Sampling theory is usually thought of first when we think about the issues of generalization... Source: Merriam-Webster Online Dictionary

Statistical Inference... Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Statistical Inference... ... Typically concerns inferring properties of an unknown distribution from data generated by that distribution ... Components: -- Point estimation -- Hypothesis testing -- Model comparison

Probability and Inference Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Probability and Inference How do you choose the “correct inference” from your data, given inevitable uncertainty and error? Can you assign a probability to your certainty in the correctness of a given inference? (hint: if this is really important to you, then you should consider becoming a Bayesian, as long as you can accept what I consider to be some fairly objectionable baggage…) How do you choose between alternate hypotheses? Can you assess the strength of your evidence for alternate hypotheses?

Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham The crux of the problem... “Thus, our general problem is to assess the relative merits of rival hypotheses in the light of observational or experimental data that bear upon them....” (Edwards, pg 1). Edwards, A.W.F. 1992. Likelihood. Expanded Edition. Johns Hopkins University Press.

Assigning Probabilities to Hypotheses Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Assigning Probabilities to Hypotheses Unfortunately, hypotheses (or even different parameter estimates) can not generally be treated as “data” (outcomes of trials) Statisticians have debated alternate solutions to this problem for centuries (with no generally agreed upon solution)

Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham One Way Out: Classical “Frequentist” Statistics and Tests of Null Hypotheses Probability is defined in terms of the outcome of a series of repeated trials.. Hypothesis testing via “significance” of pre-defined “statistics” What is the probability of observing a particular value of a predefined test statistic, given an assumed hypothesis about the underlying scientific model, and assumptions about the probability model of the test statistic... Hypotheses are never “accepted”, but are “rejected” (categorically) if the probability of obtaining the observed value of the test statistic is very small (“p-value”)

An Implicit Assumption Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham An Implicit Assumption The data are an approximate “sample” of an underlying “true” reality – i.e., there is a true population mean, and the sample provides an estimate of it...

Limitations of Frequentist Statistics Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Limitations of Frequentist Statistics Do not provide a means of measuring relative strength of observational support for alternate hypotheses (merely helps decide when to “reject” individual hypotheses in comparison to a single “null” hypothesis...) So you conclude the slope of the line is not = 0. How strong is your evidence that the slope is really 0.45 vs. 0.50? Extremely non-intuitive: just what is a “confidence interval” anyway... Compare the practice of using a t-statistic to compare a sample mean to some hypothesized value, vs. turning the question around and asking what the likelihood is of particular parameter values, given your dataset... “Significance testing” - how can you really quantify Type I and II errors if there isn’t an underlying “true” result?

Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Confidence Intervals A typical definition: “...If a series of samples are drawn and the mean of each calculated, 95% of the means would be expected to fall within the range of two* standard errors above and two below the mean of these means...” *actually, 1.96 Source: http://bmj.bmjjournals.com/collections/statsbk/4.shtml

The “null hypothesis” approach Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham The “null hypothesis” approach When and where is “strong inference” really useful? When is it just an impediment to progress? Platt, J. R. 1964. Strong inference. Science 146:347-353 Stephens et al. 2005. Information theory and hypothesis testing: a call for pluralism. Journal of Applied Ecology 42:4-12.

Chamberlain’s alternative: multiple working hypotheses Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Chamberlain’s alternative: multiple working hypotheses Science rarely progresses through a series of dichotomously branched decisions… Instead, we are constantly trying to choose among a large set of alternate hypotheses Concept is very old, but the computational power needed to adopt this approach has only recently become available… Chamberlain, T. C. 1890. The method of multiple working hypotheses. Science 15:92.

Hypothesis testing and “significance” Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Hypothesis testing and “significance” Nester’s (1996) Creed: TREATMENTS: all treatments differ FACTORS: all factors interact CORRELATIONS: all variables are correlated POPULATIONS: no two populations are identical in any respect NORMALITY: no data are normally distributed VARIANCES: variances are never equal MODELS: all models are wrong EQUALITY: no two numbers are the same SIZE: many numbers are very small If you accept even some of this credo, then the rational basis for most hypothesis testing and tests of “statistical significance” disappears. Unfortunately, Nestor wasn’t a likelihoodist, and didn’t point out the obvious way forward... Nester, M. R. 1996. An applied statistician’s creed. Applied Statistician 45:401-410

Hypothesis testing vs. estimation Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Hypothesis testing vs. estimation “The problem of estimation is of more central importance, (than hypothesis testing).. for in almost all situations we know that the effect whose significance we are measuring is perfectly real, however small; what is at issue is its magnitude.” (Edwards, 1992, pg. 2) Just how useful is it to conclude that the slope of the regression is not equal to 1? (compared to assessing the strength of evidence for a particular, estimated value of the slope...) “An insignificant result, far from telling us that the effect is non-existent, merely warns us that the sample was not large enough to reveal it.” (Edwards, 1992, pg. 2)

Hypothesis testing and probability: the likelihood compromise Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Hypothesis testing and probability: the likelihood compromise Probability (of the data) can not generally be used directly to test alternate hypotheses (about parameters)...

The “Likelihood Principle” Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham The “Likelihood Principle” In plain English: “The likelihood (L) of the set of parameters (θ) (in the scientific model), given an observation (x) is proportional to the probability of observing the data, given the parameters...” {and this probability is something we can calculate, using the appropriate underlying probability model (i.e. a PDF)} Emphasize the difference between likelihood (of the model, given the data) and probability (of the data, given the model)

The most important point of the course… Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham The most important point of the course… Any hypothesis test can be framed as a comparison of alternate models… (and being free of the constraints imposed by the alternate models embedded in classical statistical tests is perhaps the most important benefit of the likelihood approach…)

A simple example: The likelihood alternative to 1-way ANOVA Basic model: a set of observations (j=1..n) that can be classified into i = 1..a distinct groups (i.e. levels of treatment A) A likelihood alternative

Differences in Frequentist vs. Likelihood Approaches Traditional Frequentist Approach: Report “significance” of a test that …… based on a test statistic calculated from sums of squares (F statistic), with a necessary assumption of a homogeneous and normally distributed error Likelihood Approach Compare a set of alternate models, assess the strength of evidence in your data for each of them, and identify the “best” model If the assumption about the error term isn’t appropriate, use a different error term!

So, what would make sense as alternate models? Our first model A “null” model: Could and should you test additional models that lump some groups together (particularly if that lumping is based on looking at the estimated group means)?

Remember that the error term is part of the model… And you don’t just have to accept that a simple, normally distributed, homogeneous error is appropriate… Estimate a separate error term for each group Or an error term that varies as a function of the predicted value Or where the error isn’t normally distributed

A more general notation for the model… The “scientific model” And a likelihood function [ g(yi|θ) ]specifies the probability of observing yi, given the predicted value for that observation ( ) (i.e. calculated as a function of the parameters in the scientific model), and any parameters in the PDF (i.e. σ)

Another Example: Analysis of Covariance Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Another Example: Analysis of Covariance A traditional ANCOVA model (homogeneous slopes): What is restrictive about this model? How would you generalize this in a likelihood framework? What alternate models are you testing with the standard frequentist statistics? What more general alternate models might you like to test?

Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham But is likelihood enough? The challenge of parsimony The importance of seeking simple answers... “It will not be sufficient, when faced with a mass of observations, to plead special creation, even though, as we shall see, such a hypothesis commands a higher numerical likelihood than any other.” (Edwards, 1992, pg. 1, in explaining the need for a rigorous basis for scientific inference, given uncertainty in nature...) Edwards delves into likelihood precisely because he feels that the concept of probability is too limited to serve as the sole basis for scientific inference His comment on special creation as the highest numerical likelihood can be put in the context of a “full” model, in which every observation is described by a model with n parameters for n observations – everything just is exactly as the creator desired... It is exactly this line of reasoning that led Occam to propose his razor...

Models, Truth, and “Full Reality” (The Burnham and Anderson view...) Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Models, Truth, and “Full Reality” (The Burnham and Anderson view...) “We believe that “truth” (full reality) in the biological sciences has essentially infinite dimension, and hence ... cannot be revealed with only ... finite data and a “model” of those data... ... We can only hope to identify a model that provides a good approximation to the data available.” (Burnham and Anderson 2002, pg. 20)

Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham The “full” model What I irreverently call the “god” model: everything is the way it is because it is… In statistical terms, this is simply a model with as many parameters as observations i.e.: xi = θi This will always be the model with the highest likelihood! (but it won’t be the most parsimonious)…

Parsimony, Ockham’s razor, and drawing elephants... Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Parsimony, Ockham’s razor, and drawing elephants... William of Ockham (1285-1349): “Pluralitas non est ponenda sine neccesitate” “entities should not be multiplied unnecessarily” In our context: if the likelihood of two different models, given the observed data, is similar, Ockham’s razor would favor the model that had the fewest necessary parts... “Parsimony: ... 2 : economy in the use of means to an end; especially : economy of explanation in conformity with Occam's razor” (Merriam-Webster Online Dictionary)

So how many parameters DOES it take to draw an elephant...?* Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham So how many parameters DOES it take to draw an elephant...?* Information Theory perspective: “How much information is lost when using a simple model to approximate reality?” Then, the best model is the one that loses the smallest amount of information while using the smallest number of parameters... Kullback-Leibler Distance is a measure of this information loss. It is generally unknowable, but Aikaike provided an way to identify the model that minimizes KL distance: Answer: the Kullback-Leibler Distance (generally unknowable) More Practical Answer: Akaike’s Information Criterion (AIC) identifies the model that minimizes KL distance *30 would “carry a chemical engineer into preliminary design” (Wel, 1975) (cited in B&A, pg 30)

Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham The brave new world… Science is the development of simplified models as explanations (approximations) of reality… The “quality” of the explanation (the model) will be a balance of many factors (both quantitative and qualitative)

Consilience… E.O. Wilson’s view of science In his book Consilience: The Unity of Knowledge (1998) , Wilson asserts that the sciences, humanities, and arts have a common goal: to give a purpose to understanding the details, to lend to all inquirers "a conviction, far deeper than a mere working proposition, that the world is orderly and can be explained by a small number of natural laws." Source: Wikipedia