Adventures in teaching and learning data analysis with R Bill Sundstrom and Michael Kevane Department of Economics October 9, 2017
Agenda What is R? Arguments for and against using R in teaching How we use R in teaching ECON 41/42: Data Analysis and Econometrics Looking forward: How we could adopt R as a standard across the university and make it work Illustrations of the power and beauty of R
What is R? “R is a free software environment for statistical computing and graphics.” Large user community: statisticians, academics, business users Open source. “The current R is the result of a collaborative effort with contributions from all over the world. R was initially written by Robert Gentleman and Ross Ihaka—also known as "R & R" of the Statistics Department of the University of Auckland. Since mid-1997 there has been a core group with write access to the R source.” Bay Area R Users Group.
Why teach with R? Instead of Excel, SPSS, Stata Powerful: R does everything Open-source: Free Open-source: Dynamic Script based Replicability; collaborative verification and trouble-shooting; portability Not easy, but not as hard as it looks R is a great skill for our students!
Essentials of R Download R Download R Studio (interface) Script based language Easy to read in .csv files and data files off the Internet ggplot command for graphing Basic statistics very simple commands
Common concerns “It’s too hard” “It’s too easy” “It’s too anarchic”
“It’s too hard” Command-driven rather than pull-down menus Scripts (command files) BUT: These are virtues! Most R commands are actually intuitive Scripts are important: more efficient in long run, documentation, replication, robustness/sensitivity, sharing, algorithmic thinking Intuitive: For instance, suppose you want a new variable equal to the sample median of a variable… just do it. Not just for regressions. Try replicating excel plots with new data…
“It’s too easy” The availability of canned routines and packages makes it too easy to use techniques without understanding them. BUT: This is a problem with any statistical package You can do everything from scratch in R! (Example: OLS) Run the OLS script…
“It’s too anarchic” The Wikipedia diss: How can we trust open source? R is kind of “wild west” compared with Stata or Excel Often multiple ways to do the same thing Can we trust the user community? BUT: There are many sophisticated users vetting the most common routines and packages. Welcome to the Silicon Valley!
Genuine issues If we teach with R, downstream instructors will need to learn it… … but it’s easy if you know any other stat software How to make it even easier: More adopters… tipping point Stata data management seems better. Will talk about Michael’s project in a bit.
How we make it work ECON 41: Data Analysis and Econometrics Econ majors now take this in place of OMIS 41 ECON 42: Data Analysis Applications 2-unit lab course: R-based
Specific challenges Many students have little programming exposure Most of our students don’t know much about own computers, e.g.folder structure and where their files are If R is taught in the context of a statistics course, double challenge of mastering both difficult statistical concepts and the software Hence the lab section extremely helpful: partitions the cognitive load, creates a low-stress environment for learning the software
In the lab We do not try to teach coding from scratch We do insist that students run R from scripts We provide sample scripts and tutorials for all the basic procedures they will need Trouble-shooting for the first two weeks, then running regressions and interpreting results Culminates in short data analysis project The sample scripts accompany tutorials in our Guide to R. The lab follows econ 41 concepts closely. Example: tutorial_4 replication of table in textbook. Explain project.
Examples Tutorial #6: Replicate a table from their textbook Data analysis project: Replicate and reevaluate a classic economics article using up-to-date data Barry Chiswick, “The Effect of Americanization on the Earnings of Foreign-Born Men” (JPE, 1978)
Continuing work… Guide to R Community of practice Fall 17 : Mondays 4:00-5:15 Econ major data analysis concentration Psychology switching to R for intro stats Goals: Help desk in library with excellent and trained student assistants Also, would be nice to support drop-in tutoring center employing top students.