Fall 2018 Research Workshop Cindy Traub, PhD

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

Programming Paradigms and languages
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode Final Project Dataset! –“Housekeeping” commands vs. data.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Attribute databases. GIS Definition Diagram Output Query Results.
Concepts of Database Management Sixth Edition
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
End Show Introduction to Electronic Spreadsheets Unit 3.
Database Software Application
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Database Models: Flat Files and the Relational Database Objectives: Understand the fundamental structure of the relational database model Learn the circumstances.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln Data Analysis Using R Week2: Data Structure, Types and Manipulation in R.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith Mast, and Mary Anne.
SPSS Presented by Chabalala Chabalala Lebohang Kompi Balone Ndaba.
Concepts of Database Management Seventh Edition
A Powerful Python Library for Data Analysis BY BADRI PRUDHVI BADRI PRUDHVI.
Institute for Personal Robots in Education (IPRE)‏ CSC 170 Computing: Science and Creativity.
1 Database Basics: Filemaker 7 Introduction Center for Faculty Development, SJSU Steve Sloan
Learning Objectives Understand the concepts of Information systems.
DATA TYPES, VARIABLES AND CONSTANTS. LEARNING OBJECTIVES  Be able to identify and explain the difference between data and information  Be able to identify,
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Computers Are Your Future Tenth Edition Spotlight 5: Microsoft Office Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Pinellas County Schools
Section 3 Computing with confidence. The purpose of this section The purpose of this section is to develop your skills to achieve two goals: 1-Becoming.
The purpose of a CPU is to process data Custom written software is created for a user to meet exact purpose Off the shelf software is developed by a software.
Microsoft Access 2013 ®® Case Study Creating a Database.
AP CSP: Making Visualizations & Discovering a Data Story
The M&M Mystery National University.
Python for data analysis Prakhar Amlathe Utah State University
Visualizing Stats Canada Data
Software.
AP CSP: Cleaning Data & Creating Summary Tables
Office tool for creating tables and charts
Power BI Performance Tips & Tricks
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
Introduction to R Carolina Salge March 29, 2017.
Data Types and Structures
Performing Mail Merges
Welcome! Power BI User Group (PUG)
Introduction to R Studio
PHP / MySQL Introduction
Microsoft FrontPage 2003 Illustrated Complete
Welcome! Open Firefox Go to: usability-sf.
Databases.
Power BI Performance …Tips and Techniques.
Preliminaries: -- vector, raster, shapefiles, feature classes.
What is a Database and Why Use One?
Network Visualization
Lab 1 Introductions to R Sean Potter.
Search Techniques and Advanced tools for Researchers
System And Application Software
Case Study Creating a Database
Unit# 6: ICT Applications
Agenda About Excel/Calc Spreadsheets Key Features
Lecture 6: Data Quality and Pandas
Smart Integration Express
Lecture 2 Components of GIS
Spreadsheets, Modelling & Databases
The ultimate in data organization
R Course 1st Lecture.
Educational Computing
Databases This topic looks at the basic concept of a database, the key features and benefits of a Database Management System (DBMS) and the basic theory.
Programming with Data Lab 7
ESRM 250/CFR 520 Autumn 2009 Phil Hurvitz
Introduction to MS ACCESS
The Data of Visualization
Excel 2007 Level 1 Cathy September 24, 2009
Presentation transcript:

Fall 2018 Research Workshop Cindy Traub, PhD R for Mere Mortals Fall 2018 Research Workshop Cindy Traub, PhD

What is R? “a language and environment for statistical computing and graphics” [https://www.r-project.org/about.html] Open source Extendable by independently written packages Addresses computational needs of different disciplines

Useful beyond traditional statistics Accessing and/or “Cleaning” data Working with tabular data too large for Excel to handle Retrieving data from websites Analyzing text Creating visualizations, databases, maps …and much more

What we will learn about R today How do you get data in and out of R? Where does your data live? (Before, during, after your analysis) How can you “see” your data in R? How can you get help when you get stuck?

Where is R in the research cycle? Ask a Good Question Collect/ Obtain Raw Data Clean and Prep Data Generate Statistical Results Adapted in part from: “An introduction to data cleaning with R” (de Jonge and van der Loo, 2013)

Tiny Example: City/State/Zip What do you notice?

Tiny Example: City/State/Zip Missouri vs. MO St. vs. Saint Right data, wrong spot (CaseNo 5) Wrong data (zip in CaseNo 3) Q: How would you find these errors in a “big” data set?

Simple tasks R does well: Read in (or write out) a csv file Display/count unique values in a column Select rows matching 1 or more conditions Aggregate (group together) data by 1 or more features Plot your data Split apart or glue together text strings Reproducibility (aka “show your work”) through scripts or R Markdown Can handle tabular data too big for Excel to open

How does R compare to Excel? Similarities Differences Good with tabular data Can define new values based on old Can create charts Can take actions based on values Can filter/subset data based on attributes Many different functions available Hard to track/describe steps taken in Excel (winner for reproducibility: R) Entries are static in R, can be dynamic in Excel (winner: Excel) Capacity to handle large files (winner: R) “Fancy” formatting/visual display of tabular data (winner: Excel)

What does R look like? Unlike traditional office productivity software, there are many ways to access and interact with the software: -Command line -Point-click graphical interface like R Studio -Learning: many options -Datacamp, TryR -Coursera, swirl

What else can I do in R? CRAN Task Views Look for vignettes Sorted by topic Look for vignettes Make a sound choice of technique

R Studio intro Please open R Studio on your machine. Go to http://libguides.wustl.edu/R to get files: Within R Studio, open sampleRcode.R Tips included in file gettingstaRted.pdf

Cleaning your data[frame] Look at your data. Quantities as numbers (not text)? One column  one type of data? One row  one observation? Any typos/name standardization? Any extra characters? Values make sense in context? Create/consolidate any columns? Reshape (wide to tall or opposite)? Did dates read in correctly? Useful commands: head(mydata) tail(mydata) str(mydata) summary(mydata) names(mydata) unique(mydata$col1) table(mydata$col1) mydata[row(s), col(s)] mydata$newCol_name<- … plot(mydata$col2) boxplot(mydata$col3) pairs(mydata)

Thanks for coming! Slides are available on my R libguide: http://libguides.wustl.edu/R Any questions? Please complete the survey about today's session and sign the attendance sheet. Thanks! Contact Cindy at ct@wustl.edu

Bonus slides follow More common functions Other useful data tools Data structures in R

Common useful functions Assuming "obj" is name of a data frame, vector, etc. length(obj) str(obj) class(obj) names(obj) c(obj1, obj2, …) cbind(obj1, obj2, …) rbind(obj1, obj2, …) ls() (note: that is lowercase LS) dir()

What can you store and where? Data Types Storage Objects Numbers Integers (1, 2, -15, etc.) Decimal-valued (3.14, -2.9) Text Character strings (“Atticus”, “C:/Users/Labuser/Desktop”) Boolean TRUE or FALSE Constant Vector List Matrix Data Frame Data Table Shapefile

Useful tools + technology for data R (Resources at http://libguides.wustl.edu/R) Excel (Data Validation, Filters) Python and Jupyter Notebooks (https://www.python.org, http://jupyter.org) OpenRefine (http://openrefine.org/) Gephi for network viz (https://gephi.org/) Mallet for NLP (http://mallet.cs.umass.edu/) D3 JS (https://d3js.org)