Simulation: Sensitivity, Bootstrap, and Power

Simulation: Sensitivity, Bootstrap, and Power
Nathaniel MacNell EPID 799C Fall 2017

Overview Simulations: Why? Simulations: How? Simulations: In Practice
Bootstrap (confidence intervals) Sensitivity Analysis Power Calculations

[Statistical] Simulations: Why?
We study populations and characterize their attributes using (probability) distributions We use the concept of randomness to stand in for lack of information about a population. We also use randomness as a tool for causal inference (in the model of the randomized experiment). We can use random sampling to simulate variety of (parametric) statistical processes.

Simulations: How? A 4-step process: Characterize. (Re)sample.
Calculate statistic. Summarize.

1. Characterize Determine the distribution(s) of interest. In R, this is represented as a vector. An empirical distribution (for example, the values of height for each person in a research study) is just the data itself. A parametric distribution (for example, a normal distribution of height with mean 68 inches and standard deviation 4 inches) can be constructed from statistics or created a priori). You can build associations into the data.

2. (Re)sample Use the sample() function to draw a subset from the empirical distribution at random. Alternatively, use built-in functions like rnorm() to sample from a parametric distribution A sample of size 1 simulates a random variable. A sample of size >1 simulates a random sample. Most applications require sampling with replacement unless you are interested in a permutation-type problem. [Typically, for large samples there isn’t much of a difference].

3. Calculate Statistic Write code to calculate the statistic of interest. Recall that statistic is just a general name for any summary of the data (including multivariate statistics): Mean, median, min, max, of a sample. Measures of occurrence (risk, odds, Measures of association (ratios or differences between other statistics) Measures comparing to a baseline or null hypothesis (p-values, confidence intervals, etc.)

4. Summarize We now need to calculate statistics for the statistic of interest. In other words, we want to characterize the distribution of (resampled) distributions: Mean of the sample means. Standard deviation of the sample mean. Mean of the odds ratio. Confidence interval for the odds ratio. Proportion of the distribution above a threshold (e.g. power, signifigance)

Example 1: Bootstrap We can use resampling to estimate univariate statistics; this is particularly useful when the calculation is difficult or not straightforward.

Example 2: Sensitivity Analysis
We can use edited copies our dataset, consistent with different assumptions (or typically, violations of standard assumptions), to assess the degree to which our results are affected by those assumptions. Measurement error Misclassification Covariance Interference Adherence Residual confounding

Example 3: Power We can use parametric distributions to estimate the probability of rejecting the null hypothesis or characterize the expected confidence intervals resulting from a specific set of assumptions. Useful for complex designs; i.e. essentially all study designs you will work on (few dissertations have the luxury of being randomized controlled trials). (As in any power analysis) the outputs from your simulation are only as good as the assumptions you have made and how realistic they are.

Lab: Practice Simulations

Simulation: Sensitivity, Bootstrap, and Power

Similar presentations

Presentation on theme: "Simulation: Sensitivity, Bootstrap, and Power"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Simulation: Sensitivity, Bootstrap, and Power

Similar presentations

Presentation on theme: "Simulation: Sensitivity, Bootstrap, and Power"— Presentation transcript:

Similar presentations

About project

Feedback