Presentation is loading. Please wait.

Presentation is loading. Please wait.

Making computing skills part of learning introductory stats

Similar presentations


Presentation on theme: "Making computing skills part of learning introductory stats"— Presentation transcript:

1 Making computing skills part of learning introductory stats
Dr. Kari Lock Morgan Department of Statistics Penn State University USA Royal Statistical Society 10/13/16

2 Simulation-Based Inference
ASA 2016 Recommendations for Intro Stat GAISE: Guidelines for Assessment and Instruction in Statistics Education 1. Teach statistical thinking. Teach statistics as an investigative process of problem-solving and decision making. Give students experience with multivariable thinking. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyze data. 6. Use assessments to improve and evaluate student learning. 1. Teach statistical thinking. Teach statistics as an investigative process of problem-solving and decision making. Give students experience with multivariable thinking. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyze data. 6. Use assessments to improve and evaluate student learning. Simulation-Based Inference

3 Does drinking tea boost your immune system?
Question #1 Does drinking tea boost your immune system?

4 Tea and Immune Response
Participants were randomized to drink five or six cups of either tea (black) or coffee every day for two weeks (both drinks have caffeine but only tea has L-theanine) After two weeks, blood samples were exposed to an antigen, and immune system response was measured Does tea boost immunity (over coffee)? Antigens in tea-Beverage Prime Human Vγ2Vδ2 T Cells in vitro and in vivo for Memory and Non-memory Antibacterial Cytokine Responses, Kamath et.al., Proceedings of the National Academy of Sciences, May 13, 2003.

5 Tea and Immune System 𝑥 𝑇 − 𝑥 𝐶 =34.82−17.70=17.12

6 Getting the p-value: Option 1
Check conditions Compute statistic: choose formula, plug and chug Use theoretical distribution (which one? df?) 0.025 < p-value < 0.05 𝑛 1 =11 𝑛 2 =10 So what’s a p-value??? 𝑡= 𝑥 1 − 𝑥 𝑠 𝑛 𝑠 𝑛 2 =2.07

7 Traditional Inference
Plugging numbers into formulas does little to help reinforce conceptual understanding With a different formula for each test/interval, students often get mired in the details and fail to see the big picture We need a better way…

8 Actual Experiment Tea Coffee R R R R R R R R R R R R R R R R R R R R R

9 Actual Experiment Tea Coffee 5 11 13 18 20 3 11 15 47 48 52 55 56 58
R R 3 R R 11 R 15 47 R 48 R 52 R 55 R 56 R R 58 R 16 21 R R 21 R 38 R 52

10 Actual Experiment Two plausible explanations:
Tea boosts immunity Random chance What might happen just by random chance??? Tea Coffee R 5 11 R 13 R R 18 20 R R R 3 R 11 R R 15 47 R R 48 52 R R 55 56 R R 58 16 R R 21 21 R 38 R 52 R

11 Simulation R R 3 R R 11 R 15 16 R 21 R R 21 38 R 52 R R 5 11 R R 13 R 18 20 R R 47 R 48 52 R R 55 56 R 58 R Tea Coffee R 5 11 R R 13 R 18 R 20 R R R 3 R 11 R 15 R 47 R 48 R 52 R 55 R 56 58 R 16 R 21 R 21 R 38 R 52 R

12 Simulation 3 11 R 15 16 R 21 R 21 R R 38 52 R R 5 R 11 13 R R 18 R 20 R 47 R 48 52 R 55 R R 56 R 58 Tea Coffee R 38 52 R R 5 15 R R 16 21 R R 21 13 R R 18 20 R 47 R 55 R R 11 R 48 52 R 56 R 58 R

13 Simulation Repeat Many Times! Tea Coffee 11 38 52 5 3 15 16 21 21 13
R 11 R R 38 R 52 R 5 3 R 15 16 R 21 R R 21 13 R R 18 20 R R 47 R 55 11 R R 48 R 52 56 R R 58

14 StatKey We need technology! www.lock5stat.com/statkey Free Easy to use
Online (or offline as chrome app) Patti

15 Randomization Test p-value
Distribution of statistic if H0 true Proportion as extreme as observed statistic p-value observed statistic If there were no difference between tea and coffee regarding immune system response, we would see results this extreme about 2.6% of the time

16 p-value: The chance of obtaining a statistic as extreme as that observed, just by random chance, if the null hypothesis is true

17 Simulation-Based Inference
Intrinsically connected to concepts Same procedure works for many statistics More generalizable (new statistics or designs) Minimal background knowledge needed Fewer conditions; conditions transparent

18 Question #2 What is the average mercury level of fish (Large Mouth Bass) in Florida lakes?

19 Mercury Levels in Fish Lange, T., Royals, H. and Connor, L. (2004). Mercury accumulation in largemouth bass (Micropterus salmoides) in a Florida Lake. Archives of Environmental Contamination and Toxicology, 27(4),

20 Getting a Margin of Error
Confidence Interval statistic ± ME Sample Population Sample Sample . . . Margin of Error (ME) (95% CI: ME = 2×SE) Sample Sample Sample Sampling Distribution Calculate statistic for each sample Standard Error (SE): standard deviation of sampling distribution

21 Assessing Uncertainty
Key idea: how much do statistics vary from sample to sample? Problem? We can’t take lots of samples from the population!

22 Getting a Margin of Error
Population (???) Sample Best Guess at Population Sample Sample Sample . . . GOAL: Sample Sample Sample statistic ± ME Calculate statistic for each sample Distribution of the statistic Margin of Error (ME) (95% CI: ME = 2×SE) Standard Error (SE): standard deviation of the statistic

23 Bootstrapping What is our best guess at the population, given sample data? The sample itself! Draw samples repeatedly from the sample data (of size n = 53)… … with replacement! (bootstrapping) Calculate statistic for each bootstrap sample SE = standard deviation of these statistics

24 We Need Technology! StatKey: lock5stat.com/statkey
Rossman/Chance: rossmanchance.com/applets InZight: stat.auckland.ac.nz/~wild/iNZight R: cran.r-project.org RStudio: rstudio.com Fathom: fathom.concord.org Tinkerplots: tinkerplots.com JMP: jmp.com Minitab Express: minitab.com StatCrunch: statcrunch.com Red = Free

25 Mercury Levels in Fish statistic ± 2 x SE 0.527 ± 2 x 0.047
(0.433, 0.621) We are 95% confident that average mercury level in fish in Florida lakes is between and ppm. 95% Confidence Interval

26 Same process for every parameter!
Estimate the margin of error and/or a confidence interval for... proportion (𝑝) difference in means (µ1 −µ2 ) difference in proportions (𝑝1 −𝑝2 ) standard deviation (𝜎) correlation (𝜌) ... Sample with replacement Calculate statistic Repeat...

27 Mercury and pH in Lakes For Florida lakes, what is the correlation between average mercury level (ppm) in fish taken from a lake and acidity (pH) of the lake? r = Give a 95% CI for ρ Lange, Royals, and Connor, Transactions of the American Fisheries Society (1993)

28 Simulation-Based Inference
Students leave the course with… Better conceptual understanding (Tintle et al, JSE, 2011; Maurer and Lock, TISE, 2016) Better retention of concepts (Tintle et al, SERJ, 2012) Broader ability to apply what they have learned Familiarity with modern computationally- intensive methods

29 Conceptual Understanding Scores on a National Assessment
p-value: Averages: 43% 60% 63% National: 47%

30 Student Behavior How they created the interval:
Students were given data on the second midterm (right after learning about t- intervals!) and asked to compute a confidence interval for the mean How they created the interval: Bootstrapping t.test in R Formula 84% 8%

31 Common Core State Standards in Mathematics (High School)
Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation methods for random sampling Use data from a randomized experiment to compare two treatments; use simulation to decide if differences between parameters are significant

32 Traditional Methods Still There
We use simulation-based inference to introduce inference and build understanding We then cover traditional normal and t-based methods and SE formulas (goes quickly!) CLT easy to motivate after simulation Testing and intervals concepts already there

33 Sir R. A. Fisher "Actually, the statistician does not carry out this very simple and very tedious process [the randomization test], but his conclusions have no justification beyond the fact that they agree with those which could have been arrived at by this elementary method." -- Sir R. A. Fisher, 1936

34 George Cobb “... the consensus curriculum is still an unwitting prisoner of history. What we teach is largely the technical machinery of numerical approximations based on the normal distribution and its many subsidiary cogs. This machinery was once necessary, because the conceptually simpler alternative based on permutations was computationally beyond our reach. Before computers statisticians had no choice. These days we have no excuse. Randomization-based inference makes a direct connection between data production and the logic of inference that deserves to be at the core of every introductory course.” -- Professor George Cobb, 2007

35 Want More? Simulation-based inference blog: causeweb.org/sbi
Videos, presentations, and more: lock5stat.com Recordings from the 2016 Electronic Conference on Teaching Statistics (eCOTS): causeweb.org/cause/ecots/ecots16 me:

36


Download ppt "Making computing skills part of learning introductory stats"

Similar presentations


Ads by Google