Adventures in teaching and learning data analysis with R

Slides:



Advertisements
Similar presentations
Statistical Software Packages: How do I get this into that? Gillian Byrne Memorial University of Newfoundland Atlantic DLI Training - April 23, 2004.
Advertisements

1 Adding a statistics package Module 2 Session 7.
A very brief introduction to R - Matthew Keller Some material cribbed from: UCLA Academic Technology Services Technical Report Series (by Patrick Burns)
Statistics Using StatCrunch in a Large Enrollment Course Roger Woodard Department of Statistics NC State University.
Teaching Statistics Using Stata Software Susan Hailpern BSN MPH MS Department of Epidemiology and Population Health Albert Einstein College of Medicine.
Teaching Courses in Scientific Computing 30 September 2010 Roger Bielefeld Director, Advanced Research Computing.
R Mohammed Wahaj. What is R R is a programming language which is geared towards using a statistical approach and graphics Statisticians and data miners.
Learning about software Interfaces.  In this lab, you will examine  Excel Spreadsheet Interface  Access Database Interface  You will also learn about.
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) School of Social Sciences (SSS) Jawaharlal Nehru University (JNU) New Delhi -
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
Computing in Statistical Education Pang Du Department of Statistics Virginia Tech.
Introducing Data to History Students A. Michelle Edwards, Ph.D. University of Guelph.
Writing tips Based on Michael Kremer’s “Checklist”,
September 7, Concepts of Programming Languages Hongwei Xi Comp. Sci. Dept. Boston University.
Seven good reasons why everyone should be using R.
1 Delivering Management Science Tutorials via Excel Jane Hagstrom Department of Information and Decision Sciences University of Illinois at Chicago Jane.
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
By: Jade Wright, Garth Lo Bello, Andrew Roberts, Prue Tinsey and Tania Young.
Data Visualization using R
SAS: The last of the great mainframe stats packages STA431 Winter/Spring 2015.
 Overview of SPSS  Interface  Getting Started  Managing Data  Descriptive Statistics  Basic Analysis  Additional Resources.
Good Statistics with Microsoft Excel Howard Grubb, Roger Stern and Colin Grayer Department of Applied Statistics 6th June 2001.
Python – May 11 Briefing Course overview Introduction to the language Lab.
WRITING REPORTS Introduction Section 0 Lecture 1 Slide 1 Lecture 6 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall 2015.
Feature Engineering Studio September 9, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
WEB DESIGN CONCEPTS Brayden Burr. UNDERSTANDING THE CONTENT.
With the support of the LPP programme of the European Union 1 This project has been funded with support from the European Commission. This publication.
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Introduction to Data Manipulation, Analysis, and Visualization with R Patrick Grof-Tisza.
Building Comfort With MATLAB
Research Methods in Financial Economics
A quick guide to other statistical software
Get Your Head in the Cloud
Creativity of Algorithms & Simple JavaScript Commands
JavaScript/ App Lab Programming:
SPSS: Using statistical software — a primer
Development Environment
R Brown-Bag Seminar 2.1 Topic: Introduction to R Presenter: Faith Musili ICRAF-Geoscience Lab.
A data visualization course for undergraduate data science students
Week 1: Ungraded review questions
Operating System Concepts
Pre-Workshop Survey Results
SAS: The last of the great mainframe stats packages
Lesson Objectives Aims You should be able to:
Center for Open Science: Practical Steps for Increasing Openness
Data Virtualization Demoette… Data Lineage Reporting
It’s called “wifi”! Source: Somewhere on the Internet!
Weather Forecast Verification Using
Department of Economics University of Leicester 2010/11 SO’H
Vocabulary Big Data - “Big data is a broad term for datasets so large or complex that traditional data processing applications are inadequate.” Moore’s.
R Programming.
Application Development Theory
Lesson 10: Building an App: Color Sleuth
Peer Reviewed Journal Articles in the Community College Classroom
Computer Science I CSC 135.
How to enter the world of Python Programming for ArcGIS
Today’s Beginner Workshop
OPS235: Week 1 Installing Linux ( Lab1: Investigations 1-4)
Crash course in R – short introduction
Siva R Venna (sxv6878) Satya Katragadda (sxk6389)
Introduction to school IT systems
UNIT 3 CHAPTER 1 LESSON 4 Using Simple Commands.
Internet and Community Resources
CSCI N207 Data Analysis Using Spreadsheet
Welcome to the University of Alberta
Simulation And Modeling
Business concentration, minor and certificate programs
Using R for Data Analysis and Data Visualization
Games Development 2 Entity / Architecture Review
The Basics: EViews Desktop, Workfiles and Objects
Presentation transcript:

Adventures in teaching and learning data analysis with R Bill Sundstrom and Michael Kevane Department of Economics October 9, 2017

Agenda What is R? Arguments for and against using R in teaching How we use R in teaching ECON 41/42: Data Analysis and Econometrics Looking forward: How we could adopt R as a standard across the university and make it work Illustrations of the power and beauty of R

What is R? “R is a free software environment for statistical computing and graphics.” Large user community: statisticians, academics, business users Open source. “The current R is the result of a collaborative effort with contributions from all over the world. R was initially written by Robert Gentleman and Ross Ihaka—also known as "R & R" of the Statistics Department of the University of Auckland. Since mid-1997 there has been a core group with write access to the R source.” Bay Area R Users Group.

Why teach with R? Instead of Excel, SPSS, Stata Powerful: R does everything Open-source: Free Open-source: Dynamic Script based Replicability; collaborative verification and trouble-shooting; portability Not easy, but not as hard as it looks R is a great skill for our students!

Essentials of R Download R Download R Studio (interface) Script based language Easy to read in .csv files and data files off the Internet ggplot command for graphing Basic statistics very simple commands

Common concerns “It’s too hard” “It’s too easy” “It’s too anarchic”

“It’s too hard” Command-driven rather than pull-down menus Scripts (command files) BUT: These are virtues! Most R commands are actually intuitive Scripts are important: more efficient in long run, documentation, replication, robustness/sensitivity, sharing, algorithmic thinking Intuitive: For instance, suppose you want a new variable equal to the sample median of a variable… just do it. Not just for regressions. Try replicating excel plots with new data…

“It’s too easy” The availability of canned routines and packages makes it too easy to use techniques without understanding them. BUT: This is a problem with any statistical package You can do everything from scratch in R! (Example: OLS) Run the OLS script…

“It’s too anarchic” The Wikipedia diss: How can we trust open source? R is kind of “wild west” compared with Stata or Excel Often multiple ways to do the same thing Can we trust the user community? BUT: There are many sophisticated users vetting the most common routines and packages. Welcome to the Silicon Valley!

Genuine issues If we teach with R, downstream instructors will need to learn it… … but it’s easy if you know any other stat software How to make it even easier: More adopters… tipping point Stata data management seems better. Will talk about Michael’s project in a bit.

How we make it work ECON 41: Data Analysis and Econometrics Econ majors now take this in place of OMIS 41 ECON 42: Data Analysis Applications 2-unit lab course: R-based

Specific challenges Many students have little programming exposure Most of our students don’t know much about own computers, e.g.folder structure and where their files are If R is taught in the context of a statistics course, double challenge of mastering both difficult statistical concepts and the software Hence the lab section extremely helpful: partitions the cognitive load, creates a low-stress environment for learning the software

In the lab We do not try to teach coding from scratch We do insist that students run R from scripts We provide sample scripts and tutorials for all the basic procedures they will need Trouble-shooting for the first two weeks, then running regressions and interpreting results Culminates in short data analysis project The sample scripts accompany tutorials in our Guide to R. The lab follows econ 41 concepts closely. Example: tutorial_4 replication of table in textbook. Explain project.

Examples Tutorial #6: Replicate a table from their textbook Data analysis project: Replicate and reevaluate a classic economics article using up-to-date data Barry Chiswick, “The Effect of Americanization on the Earnings of Foreign-Born Men” (JPE, 1978)

Continuing work… Guide to R Community of practice Fall 17 : Mondays 4:00-5:15 Econ major data analysis concentration Psychology switching to R for intro stats Goals: Help desk in library with excellent and trained student assistants Also, would be nice to support drop-in tutoring center employing top students.