Presentation is loading. Please wait.

Presentation is loading. Please wait.

Today: Quizz 11: review. Last quizz! Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions DEC 8 – 9am FINAL.

Similar presentations


Presentation on theme: "Today: Quizz 11: review. Last quizz! Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions DEC 8 – 9am FINAL."— Presentation transcript:

1 Today: Quizz 11: review. Last quizz! Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions DEC 8 – 9am FINAL EXAM EN 2007

2 Resampling

3 Resampling Introduction We have relied on idealized models of the origins of our data (ε ~N) to make inferences But, these models can be inadequate Resampling techniques allow us to base the analysis of a study solely on the design of that study, rather than on a poorly-fitting model

4 Fewer assumptions –Ex: resampling methods do not require that distributions be Normal or that sample sizes be large Generality: Resampling methods are remarkably similar for a wide range of statistics and do not require new formulas for every statistic Promote understanding: Boostrap procedures build intuition by providing concrete analogies to theoretical concepts Resampling Why resampling

5 Resampling Collection of procedures to make statistical inferences without relying on parametric assumptions - bias - variance, measures of error - Parameter estimation - hypothesis testing

6 Resampling Procedures Permutation (randomization) Bootsrap Jackknife Monte Carlo techniques

7 Resampling With replacement Without replacement

8 Resampling permutation

9 If number of samples in each group=10 And n groups=2  number of permutations=184756 If n groups = 3  number of permutations > 5 000 billions

10 Resampling Bootstrap "to pull oneself up by one's bootstraps" The Surprising Adventures of Baron Munchausen, (1781) by Rudolf Erich Raspe Bradley Efron 1979

11 Resampling Bootstrap Hypothesis testing, parameter estimation, assigning measures of accuracy to sample estimates e.g.: se, CI Useful when: formulas for parameter estimates are based on assumptions that are not met computational formulas only valid for large samples computational formulas do not exist

12 Resampling Bootstrap Assume that sample is representative of population Approximate the distribution of the population by repeatedly resampling (with replacement) from the sample

13 Resampling Bootstrap

14

15 Non-parametric bootstrap resample observation from original samples Parametric bootstrap fit a particular model to the data and then use this model to produce bootstrap samples

16 Confidence intervals Non Parametric Bootstrap Large Capelin Small Capelin Other Prey

17 Confidence intervals Non Parametric Bootstrap 74 lc  %N lc = 48.3% 76 sc  %N sc = 49.6% 3 ot  %N lc = 1.9% What about uncertainty around the point estimates? Bootstrap

18 Confidence intervals Non Parametric Bootstrap Bootstrap: - 153 balls, each with a tag: 76 sc, 74 lc, 3 ot - Draw 153 random samples (with replacement) and record tag - Calculate %N lc * 1, %N sc * 1, %N ot * 1 - Repeat nboot=50 000 times (%N lc * 1, %N sc * 1, %N ot * 1 ), (%N lc * 2, %N sc * 2, %N ot * 2 ),…, ( %N lc * nboot, %N sc * nboot, %N ot * nboot ) - sort the %N i * b  UCL = 48750 th %N i * b (0.975)  LCL = 1250 th %N i * b (0.025)

19 Confidence intervals Parametric Bootstrap

20

21

22

23 Params β α Confidence intervals Parametric Bootstrap

24 Params β* 1 α* 1 Confidence intervals Parametric Bootstrap

25 Params β* 2 α* 2 Confidence intervals Parametric Bootstrap

26 Params β* nboot α* nboot

27 Params β* 1, β* 2, …,β* nboot Construct Confidence Interval for β 1. Percentile Method 2. Bias-Corrected Percentile Confidence Limits 3. Accelerated Bias-Corrected Percentile Limits 4.Bootstrap-t 5, 6, …., Other methods Confidence intervals Parametric Bootstrap

28 Bootstrap CI – Percentile Method

29

30 Bootstrap Caveats Independence Incomplete data Outliers Cases where small perturbations to the data-generating process produce big swings in the sampling distribution

31 Resampling Jackknife Tukey 1958 Quenouille 1956 Estimate bias and variance of a statistic Concept: Leave one observation out and recompute statistic

32 Cross-validation Jackknife Assess the performance of the model How accurately will the model predict a new observation?

33 Cross-validation Jackknife

34

35

36

37

38

39

40

41

42

43

44

45

46

47 Jackknife Bootstrap Differences Both estimate variability of a statistic between subsamples Jackknife provides estimate of the variance of an estimator Bootstrap first estimates the distribution of the estimator. From this distribution, we can estimate the variance Using the same data set: bootstrap results will always be different (slightly) jackknife results will always be identical

48 Resampling Monte Carlo Stanisław Ulam John von Neumann Mid 1940s “The first thoughts and attempts I made to practice [the Monte Carlo Method] were suggested by a question which occurred to me in 1946 as I was convalescing from an illness and playing solitaires. The question was what are the chances that a Canfield solitaire laid out with 52 cards will come out successfully? After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method than "abstract thinking" might not be to lay it out say one hundred times and simply observe and count the number of successful plays.”

49 Resampling Monte Carlo Monte Carlo methodS: not just one no clear consensus on how they should be defined Commonality: repeated sampling from populations with known characteristics, i.e. we assume a distribution and create random samples that follow that distribution, then compare our estimated statistic to the distribution of outcomes

50 Monte Carlo Goal: assess the robustness of constant escapement and constant harvest rate policies with respect to management error for Pacific salmon fisheries

51 Monte Carlo Spawner-return dynamic model Constant harvest rate Constant escapement Stochastic production variation Management error Average catchCV of catch

52

53

54

55 Discussion Was the bootstrap example I showed parametric or non-parametric? Could you think an example of the other case? So, what’s the difference between a bootstrap and a Monte Carlo?

56 74 large capelin 76 small capelin 3 other prey Bootstrap samples: 153 balls, each with a tag: sc, lc, ot Draw 153 random samples and record the tag Repeat 50 000 times

57 Discussion In principle both the parametric and the non-parametric bootstrap are special cases of Monte Carlo simulations used for a very specific purpose: estimate some characteristics of the sampling distribution. The idea behind the bootstrap is that the sample is an estimate of the population, so an estimate of the sampling distribution can be obtained by drawing many samples (with replacement) from the observed sample, compute the statistic in each new sample. Monte Carlo simulations are more general: basically it refers to repeatedly creating random data in some way, do something to that random data, and collect some results. This strategy could be used to estimate some quantity, like in the bootstrap, but also to theoretically investigate some general characteristic of an estimator which is hard to derive analytically. In practice it would be pretty safe to presume that whenever someone speaks of a Monte Carlo simulation they are talking about a theoretical investigation, e.g. creating random data with no empirical content what so ever to investigate whether an estimator can recover known characteristics of this random `data', while the (parametric) bootstrap refers to an emprical estimation. The fact that the parametric bootstrap implies a model should not worry you: any empirical estimate is based on a model. Hope this helps, Maarten ----------------------------------------- Maarten L. Buis http://www.stata.com/statalist/archive/2008-06/msg00802.html


Download ppt "Today: Quizz 11: review. Last quizz! Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions DEC 8 – 9am FINAL."

Similar presentations


Ads by Google