Teaching the statistical investigation process with simulation-based inference BETH CHANCE, CAL POLY- SAN LUIS OBISPO NATHAN TINTLE, DORDT COLLEGE.

Slides:



Advertisements
Similar presentations
Using Technology to Engage Students and Enhance Learning Roxy Peck Cal Poly, San Luis Obispo
Advertisements

Implementation and Order of Topics at Hope College.
Panel at 2013 Joint Mathematics Meetings
Randomization workshop eCOTS May 22, 2014 Presenters: Nathan Tintle and Beth Chance.
An Active Approach to Statistical Inference using Randomization Methods Todd Swanson & Jill VanderStoep Hope College Holland, Michigan.
Concepts of Statistical Inference: A Randomization-Based Curriculum Allan Rossman, Beth Chance, John Holcomb Cal Poly – San Luis Obispo, Cleveland State.
QUANTITATIVE EVIDENCE FOR THE USE OF SIMULATION AND RANDOMIZATION IN THE INTRODUCTORY STATISTICS COURSE Nathan Tintle Associate Professor of Statistics.
A debate of what we know, think we know, and don’t know about the use of simulation and randomization-based methods as alternatives to the consensus curriculum.
Using Simulation/Randomization to Introduce p-value in Week 1 Soma Roy Department of Statistics, Cal Poly, San Luis Obispo ICOTS9, Flagstaff – Arizona.
John Holcomb - Cleveland State University Beth Chance, Allan Rossman, Emily Tietjen - Cal Poly State University George Cobb - Mount Holyoke College
Significance Tests About
A new approach to introductory statistics Nathan Tintle Hope College.
Stat 301 – Day 14 Review. Previously Instead of sampling from a process  Each trick or treater makes a “random” choice of what item to select; Sarah.
Stat 512 – Lecture 12 Two sample comparisons (Ch. 7) Experiments revisited.
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Stat 512 – Day 8 Tests of Significance (Ch. 6). Last Time Use random sampling to eliminate sampling errors Use caution to reduce nonsampling errors Use.
Project Design and Data Collection Methods: A quick and dirty introduction to classroom research Margaret Waterman September 21, 2005 SoTL Fellows
Robert delMas (Univ. of Minnesota, USA) Ann Ooms (Kingston College, UK) Joan Garfield (Univ. of Minnesota, USA) Beth Chance (Cal Poly State Univ., USA)
Introducing Inference with Bootstrap and Randomization Procedures Dennis Lock Statistics Education Meeting October 30,
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Using Simulation Methods to Introduce Inference Kari Lock Morgan Duke University In collaboration with Robin Lock, Patti Frazer Lock, Eric Lock, Dennis.
Assessing Student Learning about Statistical Inference Beth Chance – Cal Poly, San Luis Obispo, USA John Holcomb – Cleveland State University, USA Allan.
1 An Application of the Action Research Model for Assessment Preliminary Report JSM, San Francisco August 5, 2003 Tracy Goodson-Espy, University of AL,
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
T tests comparing two means t tests comparing two means.
How to Handle Intervals in a Simulation-Based Curriculum? Robin Lock Burry Professor of Statistics St. Lawrence University 2015 Joint Statistics Meetings.
Statistics: Unlocking the Power of Data Lock 5 Afternoon Session Using Lock5 Statistics: Unlocking the Power of Data Patti Frazer Lock University of Kentucky.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Essential Statistics Chapter 131 Introduction to Inference.
CHAPTER 14 Introduction to Inference BPS - 5TH ED.CHAPTER 14 1.
Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
2015 ASA V2A 1 by Milo Schield, Augsburg College Member: International Statistical Institute US Rep: International Statistical Literacy Project Director,
Introducing Inference with Bootstrapping and Randomization Kari Lock Morgan Department of Statistical Science, Duke University with.
Implementing a Randomization-Based Curriculum for Introductory Statistics Robin H. Lock, Burry Professor of Statistics St. Lawrence University Breakout.
PANEL: Rethinking the First Statistics Course for Math Majors Joint Statistical Meetings, 8/11/04 Allan Rossman Beth Chance Cal Poly – San Luis Obispo.
Give your data the boot: What is bootstrapping? and Why does it matter? Patti Frazer Lock and Robin H. Lock St. Lawrence University MAA Seaway Section.
+ Using StatCrunch to Teach Statistics Using Resampling Techniques Webster West Texas A&M University.
Early Inference: Using Randomization to Introduce Hypothesis Tests Kari Lock, Harvard University Eric Lock, UNC Chapel Hill Dennis Lock, Iowa State Joint.
Reflections on making the switch to a simulation-based inference curriculum Panelists: Julie Clark, Lacey Echols, Dave Klanderman, Laura Shultz Moderator:
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Teaching Introductory Statistics with Simulation-Based Inference Allan Rossman and Beth Chance Cal Poly – San Luis Obispo
Review Statistical inference and test of significance.
Using Simulation to Introduce Concepts of Statistical Inference Allan Rossman Cal Poly – San Luis Obispo
Simulation-based inference beyond the introductory course Beth Chance Department of Statistics Cal Poly – San Luis Obispo
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock, Lock, and Lock Minicourse – Joint Mathematics.
Introducing Statistical Inference with Resampling Methods (Part 1)
Simulation Based Inference for Learning
Teaching Statistics with Simulation
Unit 5: Hypothesis Testing
Assessing the association between quantitative maturity and student performance in simulation-based and non-simulation based introductory statistics Nathan.
Stat 217 – Day 28 Review Stat 217.
Stat 217 – Day 17 Review.
Using Simulation Methods to Introduce Inference
Essential Statistics Introduction to Inference
Significance Tests: The Basics
Using Simulation Methods to Introduce Inference
Significance Tests: The Basics
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
CHAPTER 18: Inference about a Population Mean
Assessing the association between quantitative maturity and student performance in simulation-based and non-simulation based introductory statistics Nathan.
Chapter 4 Summary.
Presentation transcript:

Teaching the statistical investigation process with simulation-based inference BETH CHANCE, CAL POLY- SAN LUIS OBISPO NATHAN TINTLE, DORDT COLLEGE

Introductions  Beth  Nathan

Goals  What/why SBI? (11:00-11:30 ET)  One proportion examples and where to from here ( :50)  Q+A(11:50-12:00)  Two group simulation (12:00-12:15)  How to assess and what student performance looks like? (12:15-12:35)  How to get started/get more information; Q+A (12:35-12:45)

Brief and select history of stat ed  Consensus approach for intro stats by late 1990s, but nexus in early 1980s  Descriptive Statistics  Probability/Design/Sampling Distributions  Inference (testing and intervals)  GAISE College Report (2005)  Six pedagogical suggestions for Stat 101: Conceptual understanding, Active learning, Real data, Statistical literacy and thinking, Use technology, and Use assessments for learning

Brief history of stat ed  No real pressure to change content  Major changes  Increased computational resources for data collection and analysis  Recognition of the utility of simulation to enhance student understanding of random processes  Assessment results illustrating that students don’t really (a) improve much pre to post-course on standardized tests of statistical thinking or (b) retain much from a typical introductory statistics course

Intro Stat as a Cathedral of Tweaks (A George Cobb analogy) Boswell, a biographer, famously described Samuel Johnson as a “cathedral of tics” due to his gesticulations and tics Thesis: The usual normal distribution- worshipping intro course is a cathedral of tweaks.

The orthodox doctrine  The orthodox doctrine is simple  Central limit theorem justifies use of normal distribution  If observed statistic is in the tails (>2SEs), reject null  Confidence interval is estimate +/- 2SEs

The Cathedral of Tweaks (a ) One tower: z vs t If we know the population SD we use z If we estimate the SD we use t… except for proportions; then we use z, not t, even when we estimate the SD… …except when you do tests on proportions, then use the null value

Still More Tweaks  Another tower: If your data set is not normal you may need to transform  Another tower: If you work with small samples there are guidelines for when you can use methods based on the normal, e.g., n > 30, or np > 5 and n(1-p) > 5

The consequence  Few students ever leave our course seeing statistics as this

The consequence  The better students may get a fuzzy impression

The consequence  All too many noses stay too close to the canvas, and see disconnected details

A potential solution?  ‘Simulation-based methods’ = simulation, bootstrapping and/or permutation tests (Alt: Resampling, Randomization, etc.)  Use of these methods to:  Estimate/approximate the null distribution for significance tests  Estimate/approximate the margin of error for confidence intervals

General trends  Momentum behind simulation-based approach to inference in last 8-10 years  Cobb 2005 talk (USCOTS)  Cobb 2007 paper (TISE)  2011 USCOTS: The Next Big Thing  Continued workshops, sessions – e.g., numerous at eCOTS!

General trends  Recent curricula  Lock5 (theory and randomization, more traditional sequence of topics)  Tintle et al. ISI (theory and randomization, four pillars of inference and then chapters based on type of data)  CATALST (emphasis on modelling)  OpenIntro  Others--- Statistical Reasoning in Sports (Tabor- geared to HS students)

General trends  Many sessions at conferences talking about approach, benefits, questions/concerns  Assessment: Multiple papers (Tintle et al. 2011, Tintle et al. 2012, Tintle et al. 2014, Chance et al. 2014, Swanson et al. 2014); Better on many things, do no harm on others; more papers coming

Simulating a single proportion

Set-up: Can dogs understand human cues?  A dog is shown two cups (on ground, 2.5 meters from dog) and then given a choice of which one to approach.  Before approaching the cups the researcher leans in one direction or the other  The dog (Harley) chooses the correct cup 9 out of 10 times  Is the dog ‘understanding’ the researcher?

Questions for students  What do you think?  Why?

In class dialogue  Probably ‘understanding’ the researcher  Assuming some things about the study design  Not always the same cup; same color/kinds of cups; object underneath doesn’t have a scent, etc.  Why ‘understanding the researcher’?  9 out of 10 is ‘convincing’  Why convincing?  Unlikely to happen by chance

In class dialogue  What about people not convinced? How would you convince them of your ‘gut feeling’ that 9 out of 10 is ‘rare’ and ‘not likely to happen by chance’  What would happen by chance is 5 or 6 or 4 or …  Flip a coin

In class tactile simulation  Flip coins  Students come to front and put dots on dotplot  Illustrate that 9 out of 10 heads is rare ->confirming intuition that 9 out of 10 correct is rare

Applet  (or our Wiley textbook site Introduction to Statistical Investigations) for links to rossmanchance.com applets  One proportion applet demo

Take homes  Logic of inference very early in the class  No technical lingo  Follow-up with 6 out of 10. Mechanical arm points at a cup. Dog just guessing?

Another quick example  Eight out of last 10 patients with heart transplants at St. George’s Hospital died within 30 days. Made news because heart transplant surgeries were suspended pending an investigation  Historical national data is ~15% 30 day mortality rate after heart transplant  What do think? Would you suspend heart transplants at that hospital? Could there be another explanation?  How can we investigate the “random chance” explanation?

St. George’s  Simulation  Coin tossing?  Ross a die?  Spinner?  Observations  Where is distributed centered?  Why is it not symmetric?  Do I care?  Where does 8 fall in this distribution?

Take homes  Follow-up: 71 out 361 patients at St. George’s died since 1986 (19.67%)

Take homes  Where do you go from here  P-value/null-alt hypothesis language  What impacts strength of evidence  Standardized statistics  Normal approx. to binomial (“Theory based approach” )  St. George’s  Process to population

Take homes  Have them design their own simulations for a while  Technology – not a black box; directly connects to tactile in class simulation  Contrast with traditional approach  Lots of probability scaffolding; abstract theory; disconnection from real data; technical language and notation, etc.  Less ‘spiraling’ and less opportunity to do inference (the main objective?)

More take homes  SBI  Integration of GAISE (content and pedagogy)  Keeping data front and center (e.g., 6 steps of inference)  Build on strong conceptual foundation of what inference is  Layer confidence intervals, generalizability and causation on top of this foundation  Through choice of examples they see many other important issues dealing with data collection and messy data, but always in the context of a full statistical investigation

Q+A

In our course…  Chapter 1 – simulating one proportion (logic of inference – significance testing)  Chapter 2- importance of random samples (scope of inference - generalizing) (one proportion)  Chapter 3- estimation (logic of inference - confidence intervals) (one proportion)  Chapter 4 – randomized experiments vs. observational studies (scope of inference – causation) (two groups)  Chapters 5-7 – comparing two groups (proportions, quant variable, paired)  Chapters 8-10 – comparing multiple groups/regression (association)

In our course…  Chapters 5-10  Focus on overall statistical process  Six steps  Integrated presentation of descriptive and inferential statistics  Shuffling to break the association  3S process: Statistic, Simulate, Strength of Evidence  Theory-based approaches predict ‘what would happen if you simulated’ (more or less) and have valid predictions if certain data conditions are met  Simplified versions of those conditions, can always verify with simulation!

Lingering effects of sleep deprivation  Participants were trained on a visual discrimination task on the computer and then half were not allowed to sleep that night. Everyone got as much sleep as they wanted on nights 2 and 3 and then the subjects were retested. The response variable is the improvement in their reaction times (positive values indicate how much faster they were on the task the second time)

Lingering effects of sleep deprivation  Key question: Could this have happened by random chance alone?  Now: randomness is from the random assignment in the experiment  So what do we need to know?  How does our statistic behave by random chance alone when there really is no treatment effect?  How can we simulate this?

Lingering effects of sleep deprivation  Key question: Could this have happened by random chance alone?  Students take 21 index cards and write down each improvement score  The cards are shuffled and 11 are dealt to be the “sleep deprived” group and the remaining 10 are the “unrestricted sleep” group  Assuming nothing special about which group you are assigned to, your outcome is not going to change, there is no treatment effect  After each shuffle we calculate the new statistic and produce a distribution of the different values of the statistic under this model

Lingering effects of sleep deprivation  Applet demo

Follow up Original dataFake data

Follow up Real dataFake data

Take home messages  Core logic of inference is the same  From this point on, practically a “downhill” slope  Standardized statistic is simply statistic/SE (SE from simulation)  “Quick and dirty” 95% CI is simply +/- 2*SE (SE from simulation)  Alternative choice of statistic is nice and easy  “Why are we using the mean instead of the median if the median is better?”  Students are ‘ready’ to confront different situations  Theory-based is convenient prediction when certain conditions are met – overlay of distribution

How do you do assessment?  May ask students to use applets on exam  Applets can be used on personal devices, most can be downloaded locally in advance  But not required  Can be asked to interpret results  Can be asked to design the simulation  Do ask more conceptual questions about logic and scope of inference  Interpretation of p-value

What kinds of questions do you ask?  Screen capture and fill in blanks/interpret output  “What values would you use in the applet to…”  “Which graph represents the null distribution?” (e.g., where centered)  “Circle the dots that represent the p-value.” or “Indicate on the graph how to find the p-value”  “Based on the simulated null distribution, how strong is the evidence against the null hypothesis”  What–if questions  Show a skewed simulated distribution and ask ‘what’s wrong’ with theory-based p- value  How would the null distribution change if we increased the sample size

Another example assessment question  Two different approaches were taken in order to yield a p-value.  Option # sets of 20 “coin tosses” were generated where the probability of heads was 10%. Out of the 1000 sets of tosses 129 sets had at least 4 heads occur, and so a p-value of is obtained, showing little evidence that more than 10% of Dordt students study more than 35 hours a week.  Option #2. The Theory-Based Inference applet was used, generating a z- score of 1.49 with a p-value of 0.068, yielding moderate evidence that more than 10% of Dordt students study more than 35 hours a week.

Another example assessment question One Proportion applet results (Option #1) Theory Based Inference Applet (Option #2) Student question: Briefly explain which p-value (Option #1 or Option #2) is more valid and why.

Assessment results  Major assessment studies are underway  Evidence is mounting for  Improved student conceptual understanding of numerous inferential outcomes  “No harm” on other outcomes  For both stronger and weaker students  Regardless of institution or level of instructor experience with SBI

 Methods  Traditional curriculum (Moore 2010) - 94 students; spring 2011  New curriculum (ISI, 2011 version) – 155 students; fall 2011 and spring 2012  All students completed the 40-question CAOS test during the first week of the semester and again during the last week of the semester. Students were given course credit for completing the assessment test, but not for their performance, and the test was administered electronically outside of class.  Two instructors taught the course each semester, with one instructor the same each semester, and one different in spring 2011 than in fall 2011/spring 2012 Dordt’s before and after story

 Overall performance Dordt’s before and after story Very similar to Tintle et al (2011) results at another institution Approx. twice the gains using new curriculum as compared to traditional (11.6% vs. 5.6%; p<0.001) Pre-test Post-test

Subscale CohortPretestPosttestDiff.Paired t- test p- value Cohort p-value 95% CI for cohort Data Collection and Design Random. Tradition. 34.8% 34.9% 53.1% 36.5% 18.2% 1.6% < <0.001 (9.2%, 23.9%) Descript. Statistics Random. Tradition. 55.1% 53.5% 61.1% 69.6% 6.0% 16.1% < (-2.1%, -18.1%) Graphical Representati ons Random. Tradition. 55.8% 58.5% 64.4% 60.9% 8.6% 2.4% < (0.6%, 11.4%) Boxplots Random. Tradition. 35.0% 32.4% 41.6% 34.1% 6.6% 1.6% (-2.3%, 12.3%) Bivariate Data Random. Tradition. 58.1% 56.4% 60.7% 64.8% 2.6% 8.4% (-13.3%, 1.6%) Dordt’s before and after story

Averages by Topic Subscale CohortPrePostDiff.Paired t-test p-value Cohort p-value 95% CI for cohort Prob. Random. Tradition. 31.9% 32.4% 56.5% 35.2% 24.5% 2.7% < <0.001 (10.8%, 32.7%) Samp Var. Random. Tradition. 36.7% 38.7% 39.4% 43.5% 2.7% 4.8% (-9.4%, 5.2%) CIs Random. Tradition. 37.9% 42.9% 51.8% 47.8% 13.9% 4.9% < (1.1%, 16.7%) Tests of Sig. Random. Tradition. 46.1% 50.0% 70.0% 60.6% 23.9% 10.6% <0.001 (6.6%, 19.9%) Dordt’s before and after story

 Fall 2013 and Spring 2014  22 different instructor-semesters  17 different instructors  12 different institutions  N=725; pre-post on 30 question ISI assessment (adapted from CAOS)  Many different instructional styles (traditional classroom, active learning pedagogy, computer lab, flipped classroom)  Many different institutions (high school, community college, large university, mid-sized university, small liberal arts college) Transferability

 Similar findings to author’s institutions; Significantly better overall post- course performance Transferability- Overall

Subscale PretestPosttestDiff.Paired t-test p- value Overall 48.7%57.8%9.1% <0.001 Data Collection and Design 64.7%67.2%2.4% 0.03 Descript. Statistics 36.8%44.5%7.7% <0.001 Graphical Representations 50.9%59.0%8.1% <0.001 Probability 35.8%47.2%11.4% <0.001 Sampling Variability 20.9%24.8%4.0% CIs 52.7%64.2%11.5% <0.001 Tests of Sig. 58.7%70.5%11.8% <0.001 Transferability – by subscale

data

 Student and instructor variables; within section clustering; etc. (‘13-’14; ‘14- ’15) data

 Student and instructor variables; within section clustering; etc. (‘13-’14; ‘14- ’15) data

 Better only for weak students? Only strong students? Thinking about student ability levels

ACT GroupCurriculum Pre-test Mean (SD) Post-test Mean (SD) Change Mean (SD) 2 Difference in curriculum means 3 Low Consensus (n=21) 41.7 (10.2)46.3 (10.1)4.0 (11.7)8.2*** Early-SBI (n=55) 42.7 (10.1)54.9 (11.9) 12.2 (10.5)*** Middle Consensus (n=34) 46.0 (8.2)52.4 (10.3)6.5 (9.2)***4.7* Early-SBI (n=48) 43.4 (10.0)55.1 (10.8) 11.2 (11.4)*** High Consensus (n=36) 51.3 (7.7)57.1 (7.7)5.8 (9.2)**6.0* Early-SBI (n=49) 47.8 (9.8)59.5 (12.0) 11.8 (10.1)*** Overall Consensus (n=91) 46.4 (9.3)52.0 (11.0)5.6 (9.8)6.0*** Early-SBI (n=152) 44.9 (10.1)56.5 (11.6)11.6 (10.7) Pre- and post-course conceptual understanding stratified by ACT score

How grouped Grouping (n) Pre-test Mean (SD) Post-test Mean (SD) Change Mean (SD) 1 Pre-test concept score Less- preparation (291) 35.0 (5.0)49.6 (12.2)14.7 (12.4)*** Typical preparation (586) 50.9 (5.6)57.7 (12.4)6.8 (12.1)*** Higher preparation (201) 68.9 (6.4)73.3 (10.7)4.3 (9.6)*** Self- reported college GPA Weaker (193)45.6 (12.3)52.9 (13.1)7.3 (11.8)*** Typical (654)50.0 (12.0)58.1 (13.6)8.1 (12.6)*** Stronger (231)53.8 (13.6)64.9 (14.7)11.1 (12.2)*** Overall50.0 (12.6)58.6 (14.3)8.6 (12.5)*** Pre- and post-course conceptual understanding stratified by pre- course performance among SBI students in (n=1078)

Group Grouping Pre-test Mean (SD) Post-test Mean (SD) Change Mean (SD) Graphical Representations Pre-test30.5 (18.4)46.5 (20.9)15.8 (24.4)*** GPA44.0 (25.1)51.3 (25.0)6.5 (25.9)** Data collection and design Pre-test50.9 (21.2)56.6 (23.9)5.2 (31.6)** GPA62.6 (23.5)60.4 (25.9)-2.6 (33.4) Descriptive statistics Pre-test17.2 (27.2)31.8 (35.1)14.9 (43.6)*** GPA31.3 (34.8)36.1 (33.4)4.4 (43.0) Tests of significance Pre-test40.7 (13.6)57.5 (17.4)16.6 (21.9)*** GPA49.1 (17.0)60.0 (17.3)10.7 (21.4)*** Confidence Intervals Pre-test33.4 (16.2)50.2 (22.7)17.1 (25.8)*** GPA40.0 (18.5)50.4 (23.9)10.7 (25.6)*** Sampling Variability Pre-test28.0 (29.3)40.6 (34.8)12.6 (44.6)*** GPA44.8 (35.3)44.4 (39.2)-0.0 (46.4) Probability/Simul ation Pre-test19.9 (28.4)38.7 (34.9)18.2 (44.4)*** GPA29.8 (31.2)41.9 (36.4)10.9 (41.1)*** Pre- and post-course conceptual understanding by subscale - Less prepared and/or weaker SBI students in

 What we know  Increasing interest in the approach  Including high school and Common Core State Standards  The ISI version of the curriculum (early, middle and current versions) have demonstrated  Improved learning gains and retention in logic and scope of inference compared to traditional curriculum at same institutions  These results appear to translate reasonably well to other institutions  ‘Do no harm’ in descriptive statistics and other areas  Preliminary evidence that the more SBI you do the more beneficial effect you will see (analysis ongoing) Discussion

 What we don’t know  Pedagogy? Content? Spiraling?  Conflated!  What you should ‘take’ and what you can ‘leave’; student learning trajectories  Key instructor/institutional requirements for success  How the approach can be improved even further for greater success Discussion

 Assessment initiative  Pre- and post- concepts and attitudes; common exam questions  Goal: What works, what doesn’t, comparisons by institution, instructor, style, etc. Individualized instructor reports to learn about your own students outcomes  First edition of ISI curriculum available via Wiley  (sample materials; applets)  Continued conversation  Blog and listserv re: teaching with SBI (  Numerous articles and FAQ on blog  Upcoming longer workshops  Philadelphia (June 4-5), Atlanta GA (July 7-9), JSM (July 30), Mathfest (August 5-6), AMATYC (Nov 17-20) Our plans…

  Getting started – pilots; on-site training, etc.  Convincing colleagues (in and out of the department)  Technology integration (applets; stat package)  Large class sizes  Why so much time on proportions and not quantitative?  What about bootstraping? Q+A

 Acknowledgments: Entire ISI Team (Tintle, Chance, Cobb, Rossman, Roy, Swanson and VanderStoep)  Funding: NSF (DUE and DUE ), Wiley, other funding agencies (HHMI; Teagle Foundation, etc.) Acknowledgments

 Cobb, G. (2007). The Introductory Statistics Course: A Ptolemaic Curriculum? Technology Innovations in Statistics Education, 1(1),  delMas, R., Garfield, J., Ooms, A., and Chance, B., (2007). Assessing Students’ Conceptual Understanding after a First Course in Statistics, Statistics Education Research Journal, 6(2),  Holcomb, J., Chance, B. Rossman, A., & Cobb, G. (2010a). Assessing Student Learning About Statistical Inference, Proceedings of the 8 th International Conference on Teaching Statistics.  Holcomb, J., Chance, B. Rossman, A., Tietjen, E., & Cobb, G. (2010b), Introducing Concepts of Statistical Inference via Randomization Tests, Proceedings of the 8 th International Conference on Teaching Statistics.  Tintle, N., Chance, B., Cobb, G., Rossman, A., Roy, S., Swanson, T., & VanderStoep, J (2016). Introduction to Statistical Investigations. Hoboken, NJ: John Wiley and Sons.  Tintle, N., VanderStoep, J., Holmes, V-L., Quisenberry, B., & Swanson, T. (2011). Development and assessment of a preliminary randomization-based introductory statistics curriculum. Journal of Statistics Education, 19(1).  Tintle, N., Topliff, K., VanderStoep, J., Holmes, V-L., & Swanson, T. (2012). Retention of Statistical Concepts in a Preliminary Randomization-Based Introductory Statistics Curriculum. Statistics Education Research Journal, 11(1). References