Download presentation
Presentation is loading. Please wait.
Published bySharleen Day Modified over 9 years ago
1
Hypothesis Testing
2
Statistical Inference – dealing with parameter and model uncertainty Confidence Intervals (credible intervals) Hypothesis Tests Goodness-of-fit Model Selection (AIC) Model averaging Bayesian Model Updating
3
Statistical Testing of Hypotheses Objective of determining whether parameters differ from hypothesized values. Testing procedure framed in terms of comparison of null and alternative hypotheses. Null hypothesis Alternative hypothesis Compound (1-sided) alternatives
4
Procedure for Null Hypothesis Testing Specify Null and alternate hypotheses Compute test statistic Random variable that summarizes expected sample distribution given the null hypothesis is true (i.e., probability difference between sample means for 2 groups if the true mean is the same) Compare to the sampled value Test is binary decision Significance level of the test α Two types of incorrect decisions: rejecting H 0 when it is true (Type I error), Pr = α Not rejecting H 0 when it is false (type II error), Pr = β Power of test = 1- β
5
P- values Probability of obtaining a test statistic at least as extreme as the observed one, given that null hypothesis is true Not Pr(Null hypothesis is true) Degree of consistency of data with null, not strength of evidence for alternative Dependent on null hypothesis (if null is that groups differ by 1 rather than 0 p-value will be different) Dependent on sample size Does not provide information on size or precision of estimated effect (i.e., not a measure of biological relevance or a confidence interval)
6
Reality Conclusion ↓ H 0 True, H a FalseH 0 False, H a True We don’t reject H 0 (null hypothesis) 1- (eg., 0.95) Odds of saying there is no difference when there really is one. 95/100 times when there is no effect, we’ll correctly say there is no effect. (eg., 0.20) Type II Error Odds of saying there is no difference when there really is one. 20/100 times when there is an effect, we’ll say there is no effect. We reject H 0, accept H a (alternative hypothesis) (eg., 0.05) Type I Error Odds of saying there is a difference when there is no difference. 5/100 times when there is no effect, we’ll say there is one. 1- (eg., 0.80) POWER Odds of saying there is a difference when there is one. 80/100 times when there is an effect, we’ll say there is oen.
7
Comments: Lower , lower power; higher , higher power Lower , conservative in terms of rejecting the null when it’s true (i.e., saying there’s an effect when there really isn’t) Higher increases chances of Type I Error, decreases chances of making Type II Error and decreases rigor of test.
8
Sample Design: Choosing a sample size Can choose based on target precision level (e.g. confidence intervals) or power (hypothesis testing) Requires assumptions and tentative parameter (e.g., effect size) values Therefore it is an exercise in approximation Might identify cases where minimal sufficient sample size would bust budget or is logistically impractical to achieve.
9
Likelihood Ratio Tests Comparing fit of hypothesized model to another model (generally containing more parameters) – Null model to alternative model with additional parameters Maximum likelihood estimation theory Evaluate MLE for restricted and more general parameterizations Calculate Likelihood ratio Chi-square, with degrees of freedom of difference in number of parameters among models
10
Goodness of fit (GOF) “Absolute” fit of model Goal is to determine if data are reflective of the statistical model Test statistic generated based on probability model using estimated parameters Is there variation in the data that is out of the ordinary and not reflected in our statistical model?
11
Pearson’s 2 GOF Test Logic: If model is ‘correct’, expected and observed cell frequencies for each multinomial cell should be similar. Imagine we roll a die 1000 times and want to determine if the model P(x=1)=P(x=2)=…=P(x=6) is a good model If sample size is adequate, (expect at least 2 per cell), (observed i – expected i ) 2 /expected i df = # cells – 1
12
General GOF if Large Sample Pearson’s 2 Direct use of Deviance
13
Bootstrap GOF Test Compute ML estimates for parameters, Produce empirical distribution of estimates: Simulate capture histories for each released animal: assume parameter = MLE, ‘flip coins’ to determine survival and capture for each period, Repeat for { R i } animals, estimate parameters, Compute deviance Compare original deviance with empirical distribution (i.e., what percentile?)
14
What indicates lack of fit? With GOF test, the hope and purpose is to accept the null hypothesis This is counter to statistical hypothesis testing What is a ‘significant’ P-value?
15
What might cause lack of fit? Inadequate model structure for detection or survival, e.g., Age dependence, size dependence, etc. Trap dependence Those released earlier survive at different rate Non-random temporary emigration Lack of independence among animals
16
Solutions Inadequate model structure? Improve it. Goal: Subdivide animals sufficiently that there is equal p and S within a group Warning: Inadequate model structure doesn’t always result in lack of fit, e.g., Permanent emigration (confounded with S) Random temporary emigration (confounded with p) Random ring loss (confounded with S) Lack of independence? Correct for Overdispersion Inflate variances using quasi-likelihood.
17
Adjusting Variances for Overdispersion Based on Quasi-likelihood theory c-hat = deviance/df adj. variance = c-hat * (ML variance)
18
Bootstrap adjustment for overdispersion For each simulated sample: compute deviance compute c-hat = deviance/df Bootstrap c-hat = (observed deviance)/(mean deviance), or (observed c-hat) / (mean c-hat) Note: could replace deviance with Pearson 2, or mean with median.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.