Download presentation
Presentation is loading. Please wait.
Published byLinette Anderson Modified over 8 years ago
1
Lecture 3 Hypothesis Testing and Statistical Inference using Likelihood: The Central Role of Models Likelihood Methods in Ecology April 25 - 29, 2011 Granada, Spain
2
Outline… l Statistical inference: - it’s what we use statistics for, but there are some surprisingly tricky philosophical difficulties that have plagued statisticians for over a century… l The “frequentist” vs. “likelihoodist” solutions l Hypothesis testing as a process of comparing alternate models l Examples – ANOVA and ANCOVA l The issue of parsimony
3
Inference defined... “a : the act of passing from one proposition, statement, or judgment considered as true to another whose truth is believed to follow from that of the former b : the act of passing from statistical sample data to generalizations (as of the value of population parameters) usually with calculated degrees of certainty” Source: Merriam-Webster Online Dictionary
4
Statistical Inference...... Typically concerns inferring properties of an unknown distribution from data generated by that distribution... Components: -- Point estimation -- Hypothesis testing -- Model comparison
5
Probability and Inference l How do you choose the “correct inference” from your data, given inevitable uncertainty and error? l Can you assign a probability to your certainty in the correctness of a given inference? - (hint: if this is really important to you, then you should consider becoming a Bayesian, as long as you can accept what I consider to be some fairly objectionable baggage…) l How do you choose between alternate hypotheses? - Can you assess the strength of your evidence for alternate hypotheses?
6
“Thus, our general problem is to assess the relative merits of rival hypotheses in the light of observational or experimental data that bear upon them....” (Edwards, pg 1). The crux of the problem... Edwards, A.W.F. 1992. Likelihood. Expanded Edition. Johns Hopkins University Press.
7
Assigning Probabilities to Hypotheses l Unfortunately, hypotheses (or even different parameter estimates) can not generally be treated as “data” (outcomes of trials) l Statisticians have debated alternate solutions to this problem for centuries - (with no generally agreed upon solution)
8
One Way Out: Classical “Frequentist” Statistics and Tests of Null Hypotheses l Probability is defined in terms of the outcome of a series of repeated trials.. l Hypothesis testing via “significance” of pre-defined “statistics” - What is the probability of observing a particular value of a predefined test statistic, given an assumed hypothesis about the underlying scientific model, and assumptions about the probability model of the test statistic... - Hypotheses are never “accepted”, but are “rejected” (categorically) if the probability of obtaining the observed value of the test statistic is very small (“p-value”)
9
An Implicit Assumption l The data are an approximate “sample” of an underlying “true” reality – i.e., there is a true population mean, and the sample provides an estimate of it...
10
Limitations of Frequentist Statistics l Do not provide a means of measuring relative strength of observational support for alternate hypotheses (merely helps decide when to “reject” a single “null” hypothesis...) - So you conclude the slope of the line is not = 0. How strong is your evidence that the slope is really 0.45 vs. 0.50? l Extremely non-intuitive: just what is a “confidence interval” anyway...
11
The “null hypothesis” approach l When and where is “strong inference” really useful? l When is it just an impediment to progress? Stephens et al. 2005. Information theory and hypothesis testing: a call for pluralism. Journal of Applied Ecology 42:4-12. Platt, J. R. 1964. Strong inference. Science 146:347-353
12
Chamberlain’s alternative: multiple working hypotheses l Science rarely progresses through a series of dichotomously branched decisions… l Instead, we are constantly trying to choose among a large set of alternate hypotheses - Concept is very old, but the computational power needed to adopt this approach has only recently become available… Chamberlain, T. C. 1890. The method of multiple working hypotheses. Science 15:92.
13
Hypothesis testing and “significance” Nester’s (1996) Creed: TREATMENTS: all treatments differ FACTORS: all factors interact CORRELATIONS: all variables are correlated POPULATIONS: no two populations are identical in any respect NORMALITY: no data are normally distributed VARIANCES: variances are never equal MODELS: all models are wrong EQUALITY: no two numbers are the same SIZE: many numbers are very small Nester, M. R. 1996. An applied statistician’s creed. Applied Statistician 45:401-410
14
Hypothesis testing vs. estimation “The problem of estimation is of more central importance, (than hypothesis testing).. for in almost all situations we know that the effect whose significance we are measuring is perfectly real, however small; what is at issue is its magnitude.” (Edwards, 1992, pg. 2) “An insignificant result, far from telling us that the effect is non-existent, merely warns us that the sample was not large enough to reveal it.” (Edwards, 1992, pg. 2)
15
The most important point of the course… Any hypothesis test can be framed as a comparison of alternate models… (and being free of the constraints imposed by the alternate models embedded in classical statistical tests is perhaps the most important benefit of the likelihood approach…)
16
A simple example: The likelihood alternative to 1-way ANOVA l Basic model: a set of observations (j=1..n) that can be classified into i = 1..a distinct groups (i.e. levels of treatment A) l A likelihood alternative
17
Differences in Frequentist vs. Likelihood Approaches l Traditional Frequentist Approach: - Report “significance” of a test that …… based on a test statistic calculated from sums of squares (F statistic), with a necessary assumption of a homogeneous and normally distributed error l Likelihood Approach - Compare a set of alternate models, assess the strength of evidence in your data for each of them, and identify the “best” model - If the assumption about the error term isn’t appropriate, use a different error term!
18
So, what would make sense as alternate models? A “null” model: Our first model Could and should you test additional models that lump some groups together (particularly if that lumping is based on looking at the estimated group means)?
19
Remember that the error term is part of the model… And you don’t just have to accept that a simple, normally distributed, homogeneous error is appropriate… Estimate a separate error term for each group Or an error term that varies as a function of the predicted value Or where the error isn’t normally distributed
20
A more general notation for the model… The “scientific model” And a likelihood function [ g(y i |θ) ] specifies the probability of observing y i, given the predicted value for that observation ( ) i.e. calculated as a function of the parameters in the scientific model and the independent variables, and any parameters in the PDF (i.e. σ ) The “likelihood function”
21
Another Example: Analysis of Covariance l A traditional ANCOVA model (homogeneous slopes): l What is restrictive about this model? l How would you generalize this in a likelihood framework? - What alternate models are you testing with the standard frequentist statistics? - What more general alternate models might you like to test?
22
“It will not be sufficient, when faced with a mass of observations, to plead special creation, even though, as we shall see, such a hypothesis commands a higher numerical likelihood than any other.” (Edwards, 1992, pg. 1, in explaining the need for a rigorous basis for scientific inference, given uncertainty in nature...) But is likelihood enough? The challenge of parsimony The importance of seeking simple answers...
23
Models, Truth, and “Full Reality” (The Burnham and Anderson view...) “We believe that “truth” (full reality) in the biological sciences has essentially infinite dimension, and hence... cannot be revealed with only... finite data and a “model” of those data...... We can only hope to identify a model that provides a good approximation to the data available.” (Burnham and Anderson 2002, pg. 20)
24
The “full” model l What I irreverently call the “god” model: everything is the way it is because it is… l In statistical terms, this is simply a model with as many parameters as observations - i.e.: x i = θ i This will always be the model with the highest likelihood! (but it won’t be the most parsimonious)…
25
Parsimony, Ockham’s razor, and drawing elephants... William of Ockham (1285-1349): “Pluralitas non est ponenda sine neccesitate” “entities should not be multiplied unnecessarily” “Parsimony:... 2 : economy in the use of means to an end; especially : economy of explanation in conformity with Occam's razor” (Merriam-Webster Online Dictionary)
26
So how many parameters DOES it take to draw an elephant...?* *30 would “carry a chemical engineer into preliminary design” (Wel, 1975) (cited in B&A, pg 30) Information Theory perspective: “How much information is lost when using a simple model to approximate reality?” Answer: the Kullback-Leibler Distance (generally unknowable) More Practical Answer: Akaike’s Information Criterion (AIC) identifies the model that minimizes KL distance
27
The brave new world… l Science is the development of simplified models as explanations (approximations) of reality… l The “quality” of the explanation (the model) will be a balance of many factors (both quantitative and qualitative)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.