Introduction to R and RStudio

Slides:



Advertisements
Similar presentations
Guy Griffiths. General purpose interpreted programming language Widely used by scientists and programmers of all stripes Supported by many 3 rd -party.
Advertisements

Introduction to S-Plus by Francesco Ferretti Analysis of Biological Data Course Winter term 2007 Dalhousie University.
R for Macroecology Aarhus University, Spring 2011.
Writing functions in R Some handy advice for creating your own functions.
Introduction to Matlab Workshop Matthew Johnson, Economics October 17, /13/20151.
Perl Practical Extraction and Report Language Senior Projects II Jeff Wilson.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Experiences in Integration of the 'R' System into Kepler Dan Higgins – National Center for Ecological Analysis and Synthesis (NCEAS), UC Santa Barbara.
Russell Taylor Lecturer in Computing & Business Studies.
1 Open Source Audit Software IIA District Conference Durham, NC 2/27/2009 Track 1 – Internal Audit Mike Blakley, EZ-R Stats, LLC.
1 CHAPTER 4 LANGUAGE/SOFTWARE Hardware Hardware is the machine itself and its various individual equipment. It includes all mechanical, electronic.
Introduction to SAS Math 3200 Jan Jimin Ding.
What is R Muhammad Omer. What is R  R is the programing language software for statistical computing and data analysis  The R language is extensively.
Activity 1 - WBs 5 mins Go online and spend a moment trying to find out the difference between: HIGH LEVEL programming languages and LOW LEVEL programming.
CSC 142 A 1 CSC 142 Introduction to Java [Reading: chapter 0]
1 An Introduction – UCF, Methods in Ecology, Fall 2008 An Introduction By Danny K. Hunt & Eric D. Stolen Getting Started with R (with speaker notes)
Introduction to MATLAB adapted from Dr. Rolf Lakaemper.
Data Visualization using R
 Overview of SPSS  Interface  Getting Started  Managing Data  Descriptive Statistics  Basic Analysis  Additional Resources.
SDA: a tool for teaching and research with microdata Laine Ruus University of Toronto. Data Library Service.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
An Introduction to Visual Basic
PHP TUTORIAL. HISTORY OF PHP  PHP as it's known today is actually the successor to a product named PHP/FI.  Created in 1994 by Rasmus Lerdorf, the very.
Intro to R R is a free version of S-plus R is a free version of S-plus Can be used interactively but script or syntax files are commonly used to record.
Integrated Development Environment (IDE)
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Introduction to MATLAB adapted from Dr. Rolf Lakaemper.
An Introduction to R Statistical Computing AMS 597 Stony Brook University Spring 2009 By Tianyi Zhang.
Comparison of different output options from Stata
Introduction Selenium IDE is a Firefox extension that allows you to record, edit, and debug tests for HTML Easy record and playback Intelligent field selection.
KEEP THIS TEXT BOX this slide includes some ESRI fonts. when you save this presentation, use File > Save As > Tools (upper right) > Save Options > Embed.
© 2015 by Wade Rogers Introduction to R Cytomics Workshop December, 2015.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
Introduction to R Aedín Culhane
R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
MIS2502: Data Analytics Introduction to Advanced Analytics and R.
Pinellas County Schools
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
CMPT 201 Computer Science II for Engineers
Block 1: Introduction to R
R Brown-Bag Seminar 2.1 Topic: Introduction to R Presenter: Faith Musili ICRAF-Geoscience Lab.
Data Tools: R and RStudio
ITM352 PHP and Dynamic Web Pages: Server Side Processing 1.
Introduction to Eclipse
Second Annual Cytomics Workshop April, 2017
Introduction to R Carolina Salge March 29, 2017.
Introduction to R.
Programming Basics Web Programming.
PYTHON: AN INTRODUCTION
A451 Theory – 7 Programming 7A, B - Algorithms.
Adventures in teaching and learning data analysis with R
R Programming.
R Programming Language
Ggplot2 I EPID 799C Mon Sep
TRANSLATORS AND IDEs Key Revision Points.
Introduction to MATLAB
Lab 1 Introductions to R Sean Potter.
Introduction to R.
An introduction to data analysis using R
Today’s Beginner Workshop
Siva R Venna (sxv6878) Satya Katragadda (sxk6389)
CSCI N207 Data Analysis Using Spreadsheet
Installing Packages Introduction to R, Part II
Tutorial 6 PHP & MySQL Li Xu
Lecture 7 – Delivering Results with R
Simulation And Modeling
R Statistical Language
Using R for Data Analysis and Data Visualization
Web Application Development Using PHP
Presentation transcript:

Introduction to R and RStudio Jeff Witmer 9 March 2016

A software package for statistical computing and graphics R is A software package for statistical computing and graphics A collection of 6,700 packages (as of June 2015, so more now) A (not ideal) programming language A work environment Widely used Powerful Free R is an interpreted language, but with much of it compiled in C.

Some history S was developed at Bell Labs, starting in the 1970s R was created in the 1990s by Ross Ihaka and Robert Gentleman R was based on S, with code written in C S largely was used to make good graphs – not an easy thing in 1975. R, like S, is quite good for graphing. For lots of examples, see http://rgraphgallery.blogspot.com/ or http://www.r-graph-gallery.com/ See ggplot2-cheatsheet-2.0.pdf (Or for more detail, see http://docs.ggplot2.org/current/

A few simple graphs using the ggplot2 package

An example of graphing using the GGally package in R

Who uses R?

RStudio is An Integrated Development Environment (IDE) for R A gift, from J.J. Allaire (Macalester College, ‘91) to the world An easy (easier) way to use R Available as a desktop product or, as used at OC, run off of a file server. Free – unless you want the newest version, with more bells and whistles, and you are not eligible for the educational discount (= free) R supports rpubs – see http://rpubs.com/jawitmer

RStudio screen shot

R is object-oriented e.g., MyModel <- lm(wt ~ ht, data = mydata) then hist(MyModel$residuals) Note: lm(wt ~ ht*age + log(bp), data = mydata) regresses wt on ht, age, the ht-by-age interaction, and log(bp). There is no need to create the interaction or the lob(bp) variable outside of the lm() command. Comparing nested models: mod1 <- lm(wt ~ ht*age + log(bp), data = mydata) mod2 <- lm(wt ~ ht + log(bp), data = mydata) anova(mod2, mod1) gives a nested F-test

R as a programming language If you want R to be (relatively) fast, take advantage of vector operations; e.g., use the replicate command (rather than a loop) or the tapply function. E.g., replicate(k=25,addingLines(n=10)) calls the addingLines function (something I wrote) 25 times. > with(Dabbs, tapply(testosterone, occupation, mean)) Actor MD Minister Prof 12.7 11.6 8.4 10.6

If you want to know how to do something in R See the “Minimal R.pdf” handout Go to the Quick-R.com page (http://www.statmethods.net/) Google “How do I do xxx in R?” A standing joke among R users is that the answer is always “There are many ways to do that in R.” See http://swirlstats.com/ See https://www.datacamp.com/home

Speaking of many ways to do something in R… (1) mean(mydata$ht) (2) with(mydata, mean(ht)) (3) mean(ht, data=mydata) However (1) plot(mydata$ht,mydata$wt) works plot(wt~ht, data=mydata) feeds the plot command a function, whereas plot(ht, wt, data=mydata) doesn’t (2) with(mydata, plot(ht,wt)) works (3) plot(ht, wt, data=mydata) does not work (3a) plot(wt~ht, data=mydata) works

The mosaic package (Kaplan, Pruim, Horton) was created to make R easy to use for intro stats. mosaic package syntax: goal(y ~ x|z, data=mydata) E.g.: tally(~sex, data=HELPrct) E.g.: test(age ~ sex, data=HELPrct) E.g.: t.test(age ~ sex, data=HELPrct)$p.value E.g.: favstats(age ~ substance|sex, data=HELPrct) See MinimalR-2pages.pdf

The mosaic package mPlot() command makes graphing easy. mPlot(SaratogaHouses)

The openintro package edaPlot() command makes exploring data graphically easy to do. edaPlot(SaratogaHouses)

The mosaic tidyr and dplyr packages handle SQL-ytpe work: merging files, extracting subsets, etc. data(NCHS) #loads in the NCHS data frame newNCHS <- NCHS %>% sample_n(size=5000) %>% filter(age > 18) #takes a sample of size 5000, extracts only the rows for which age > 18, and saves the result in newNCHS See data-wrangling-cheatsheet.pdf

I use R, and the do() command in the mosaic package, for simulations. data(FirstYearGPA) #loads in the data frame FY <- FirstYearGPA) #rename the data frame lm(GPA ~ SATM, data=FY) #gives 0.0012 as slope lm(GPA ~ SATM, data=FY)$coeff[2] #just look at the slope do(3)*lm(GPA ~ shuffle(SATM), data=FY)$coeff[2] #break link b/w GPA and SATM null.dist <- do(1000)*lm(GPA ~ shuffle(SATM), data=FY)$coeff[2] #1000 random slopes histogram(null.dist$SATM, v=0.0012) #look at the 1000 slopes with(null.dist, tally(abs(SATM.)>=0.0012)) #How many are far from zero? with(null.dist, tally(abs(SATM.)>=0.0012, format='prop')) #What proportion are far from zero?

Using Predict.Plot to show Pr(win) as SaveDiff varies, for a fixed set of values for sixother predictors. plot(jitter(Win,amount=.05)~SaveDiff,data=LaXdata) Predict.Plot(modelDiff,pred.var="SaveDiff",DrawDiff=-11, ShotDiff=6, TODiff=-3, ClearPctDiff=0.0952, ShotGoalDiff=1, GroundDiff=5, add=TRUE,plot.args=list(col='blue')) #OCWLaX game vs BW myx=data.frame(DrawDiff=-11, ShotDiff=6, TODiff=-3, SaveDiff = 0, ClearPctDiff=0.0952, ShotGoalDiff=1, GroundDiff=5) predict.glm(modelDiff,myx,type="response") #gives 0.896