Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with the Department of Statistics and the Center.

Slides:



Advertisements
Similar presentations
Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
Advertisements

Downloading and Installing WinID3 Dental Training Module I Richard M. Scanlon, D.M.D.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
Pasewark & Pasewark Microsoft Office XP: Introductory Course 1 INTRODUCTION Lesson 1 – Microsoft Office XP Basics and the Internet.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
Microsoft Excel 2010 Chapter 7
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
EGR 106 – Week 2 – Arrays & Scripts Brief review of last week Arrays: – Concept – Construction – Addressing Scripts and the editor Audio arrays Textbook.
A Simple Guide to Using SPSS© for Windows
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 11 1 Microsoft Office Excel 2003 Tutorial 11 – Importing Data Into Excel.
1 Computing for Todays Lecture 22 Yumei Huo Fall 2006.
Alternative text for elementary statistics –Elementary Concepts –Basic Statistics.
Generation of atlas graphs & charts. Objective The major objective this training session is to equip participants with the knowledge and skills of creating.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Python plotting for lab folk Only the stuff you need to know to make publishable figures of your data. For all else: ask Sourish.
Introduction to R Statistical Software Anthony (Tony) R. Olsen USEPA ORD NHEERL Western Ecology Division Corvallis, OR (541)
TrendReader Standard 2 This generation of TrendReader Standard software utilizes the more familiar Windows format (“tree”) views of functions and file.
Panorama High School E.G.P./ Training to Put Students’ Grades on the Website Wednesday, September 29,
How to Download and Install a Sharp Print Driver on a Mac.
Chapter 5 Review: Plotting Introduction to MATLAB 7 Engineering 161.
Quantitative Research in Education Sohee Kang Ph.D., lecturer Math and Statistics Learning Centre.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
Intro to R R is a free version of S-plus R is a free version of S-plus Can be used interactively but script or syntax files are commonly used to record.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Introduction to R Part 2. Working Directory The working directory is where you are currently saving data in R. What is the current working directory?
P366: Lecture #1 Use of Excel for analysis Lei Chen, MD Jan 6, 2002.
1 Lab of COMP 406 Teaching Assistant: Pei-Yuan Zhou Contact: Lab 1: 12 Sep., 2014 Introduction of Matlab (I)
Math 15 Lecture 10 University of California, Merced Scilab Programming – No. 1.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Installing R CRAN: –(R homepage: –Windows 95 and later  Base –rw2001.exe.
Chapter 17 Creating a Database.
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
Ann Arbor ASA ‘Up and Running’ With R Prepared by volunteers of the Ann Arbor chapter of the American statistical association, in cooperation with the.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
Chapter 3 MATLAB Fundamentals Introduction to MATLAB Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 1 – Matlab Overview EGR1302. Desktop Command window Current Directory window Command History window Tabs to toggle between Current Directory &
WFM 6311: Climate Risk Management © Dr. Akm Saiful Islam WFM 6311: Climate Change Risk Management Akm Saiful Islam Lecture-7:Extereme Climate Indicators.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Files: By the end of this class you should be able to: Prepare for EXAM 1. create an ASCII file describe the nature of an ASCII text Use and describe string.
Introduction to Matlab  Matlab is a software package for technical computation.  Matlab allows you to solve many numerical problems including - arrays.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Digital Image Processing Introduction to MATLAB. Background on MATLAB (Definition) MATLAB is a high-performance language for technical computing. The.
Bioinformatics for biologists
Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with the Department of Statistics and the Center.
Learn R Toolkit D Kelly O'DayInstall & SetupMod 1 - Setup: 1 Module 1 Installing & Setting Up R Do See & HearRead Learn PowerPoint must be in View Show.
Introductory Data Analysis F73DA2. Contact Times (Spring Term 2008) Monday 4: : Lecture in LT3 Tuesday 2: : Lecture in LT3 Wednesday
Chris Knight Beginners’ workshop.
1-2 What is the Matlab environment? How can you create vectors ? What does the colon : operator do? How does the use of the built-in linspace function.
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
XP Creating Web Pages with Microsoft Office
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Development Environment
Using a template to create a document
DEPARTMENT OF COMPUTER SCIENCE
Introduction to R Studio
Managing Multiple Worksheets and Workbooks
Windows Internet Explorer 7-Illustrated Essentials
Lab 1 Introductions to R Sean Potter.
CSCI N207 Data Analysis Using Spreadsheet
Installing Packages Introduction to R, Part II
Basics of R, Ch Functions Help Managing your Objects
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Have you signed up (or had) your meeting?
Microsoft Office Illustrated Fundamentals
Presentation transcript:

Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with the Department of Statistics and the Center for Statistical Consultation and Research of the University of Michigan

 Introduction  R Help  Functions  Working with Data  Importing/Exporting Data  Graphs + Statistics  Practice Problems  Further Resources Ann Arbor ASA Up and Running Series: R 2

3  Presentation Materials  R Class Materials  Select files:  R Workshop.pptx  furniture.csv  furniture.txt  R code.docx  Short-refcard.pdf  Save upload of each files to Desktop Ann Arbor ASA Up and Running Series: R

 R is open source with code available to users  R is object-oriented programming  involves the S computer language  R is a commonly used for statistical analysis  R is a free software package  R-project.org R-project.org Ann Arbor ASA Up and Running Series: R 4

5

 Statistical analysis is done using pre-defined functions in R.  Upon download of the ‘base’ package, you have access to many functions.  More advanced functions will require the of download other packages. Ann Arbor ASA Up and Running Series: R 6

 Topics in statistics are readily available  Linear modeling, linear mixed modeling, clustering, multivariate analysis, non-parametric methods, classification, among others.  R produces high quality graphics  Simple plots are easy  With more practice, users can produce publishable graphics! Ann Arbor ASA Up and Running Series: R 7

 Start  All Programs  Math & Statistics  R Ann Arbor ASA Up and Running Series: R 8 Workspace

 Get Editor window: File  New script  More convenient than workspace Ann Arbor ASA Up and Running Series: R 9 Workspace Editor window

 Users create different data objects in R  Data objects refer to variables, arrays of numbers, character strings, functions and other more complicated data manipulations  <- allows you to assign data objects  Type in your editor window: a <- 7  Submit this command by highlighting it and pressing ctrl+r  Practice creating different data objects and submit them to the workspace Ann Arbor ASA Up and Running Series: R 10

 Type objects ()  This allows you to see that you have created the object a during this R session  You can view previously submitted commands by using the up/down arrow  You can remove this object by typing rm(a)  Try removing some objects you created and then type objects() to see if they are listed Ann Arbor ASA Up and Running Series: R 11

 Set up vector named x:  x <- c(5,4,3,6)  This is an assignment statement  the function c() creates a vector by concatenating its arguments  Perform vector/matrix arithmetic:  v <- 3*x - 5 Ann Arbor ASA Up and Running Series: R 12

Ann Arbor ASA Up and Running Series: R 13 Questions?

 Introduction  R Help  Functions  Working with Data  Importing/Exporting Data  Graphs + Statistics  Practice Problems  Further Resources Ann Arbor ASA Up and Running Series: R 14

 CRAN: Search  R archives (manuals, mail, help files, etc.)  faced with a tough analysis question  see if another R user has addressed the question before Ann Arbor ASA Up and Running Series: R 15

 To get help on any specific function:  help( function.name )  ?( function.name )  Sometimes help is not available from the packages downloaded  ??( function.name ) Ann Arbor ASA Up and Running Series: R 16

 To see a list of all of the functions that come with the base R package  library(help = “base”)  Error: unexpected input in "library(help = ““  library(help = "base") Ann Arbor ASA Up and Running Series: R 17

 Two popular R resource websites:  Rseek.org  nabble.com Ann Arbor ASA Up and Running Series: R 18

 For help via the Internet submit  help.start() Ann Arbor ASA Up and Running Series: R 19

Ann Arbor ASA Up and Running Series: R 20 Questions?

 Introduction  R Help  Functions  Working with Data  Importing/Exporting Data  Graphs + Statistics  Practice Problems  Further Resources Ann Arbor ASA Up and Running Series: R 21

 There are thousands of available functions in R  Reference Card provides a strong working knowledge  Look at the organization of the Reference Card  Try out a few of the functions available! Ann Arbor ASA Up and Running Series: R 22

Ann Arbor ASA Up and Running Series: R 23

 Sequences  seq(-5, 5, by=.2)  seq(length=51, from=-5, by=.2)  Both produce a sequence from -5 to 5 with a distance of.2 between objects Ann Arbor ASA Up and Running Series: R 24

 Replications  rep(“x”, times=5)  rep(“x”, each=5)  Both produce x replicated 5 times Ann Arbor ASA Up and Running Series: R 25

Ann Arbor ASA Up and Running Series: R 26 Questions?

 Introduction  R Help  Functions  Working with Data  Importing/Exporting Data  Graphs + Statistics  Practice Problems  Further Resources Ann Arbor ASA Up and Running Series: R 27

 There are many data sets available for use in R  data()  to see what’s available  We will work with the trees data set  data(trees)  This data set is now ready to use in R  The following are useful commands:  summary(trees) – summary of variables  dim(trees) – dimension of data set  names(trees) – see variable names  attach(trees) – attaches variable names Ann Arbor ASA Up and Running Series: R 28

 R has saved the data set trees as a data frame object  Check this by typing - class(trees)  R stores this data in matrix row/column format: data.frame[rows,columns]  trees[c(1:2),2]  first 2 rows and 2 nd column  trees[3,c(“Height”, “Girth”)]  reference column names  trees[-c(10:20), “Height”]  skips rows for variable Height Ann Arbor ASA Up and Running Series: R 29

 The subset() command is very useful to extract data in a logical manner, where the 1 st argument is data, and the 2 nd argument is logical subset requirement  subset(trees, Height>80)  subset where all tree heights >80  subset(trees, Height 10)  subset where all tree heights 10  subset(trees, Height 11)  subset where all tree heights 11 Ann Arbor ASA Up and Running Series: R 30

Ann Arbor ASA Up and Running Series: R 31 Questions?

 Introduction  R Help  Functions  Working with Data  Importing/Exporting Data  Graphs + Statistics  Practice Problems  Further Resources Ann Arbor ASA Up and Running Series: R 32

 The most common (and easiest) file to import is a text file with the read.table() command  R needs to be told where the file is located  set the working directory setwd("C:\\Users\\akazanis\\Desktop")  tells R where all your files are located  OR point to working directory  File  Change dir… and choosing the location of the files  OR include the physical location of your file in the read.table() command Ann Arbor ASA Up and Running Series: R 33

 Include the physical location of your file in the read.table() command read.table("C:\\Users\\akazanis\\Desktop\\furniture.txt",header=TRUE,sep="")  Important to use double slashes \\  rather than single slash \  header=TRUE or header=FALSE  Tells R whether you have column names on data Ann Arbor ASA Up and Running Series: R 34

 Another way of specifying the file’s location is to set the working directory first and then read in the file  setwd(“C:\\Users\\akazanis\\Desktop”)  read.table(“furniture.txt”,header=TRUE,sep=“”)  OR point to the location File  Change dir… pointing to the file’s location  Then, read in the data file read.table(“furniture.txt”,header=TRUE,sep=“”) Ann Arbor ASA Up and Running Series: R 35

 It is also popular to import csv files since excel files are easily converted to csv files  read.csv() and read.table() are very similar although, they handle missing values differently  read.csv() automatically assigns an ‘NA’ to missing values  read.table() will not load data with missing values  Assign ‘NA’ to missing values before reading it into R Ann Arbor ASA Up and Running Series: R 36

 Let’s remove a data entry from both furniture.txt and furniture.csv  From the first row, erase 100 from the Area column  Now read in the data from these two files using read.table() and read.csv()  You should see that you cannot read the data in using the read.table() command unless you input an entry for the missing value Ann Arbor ASA Up and Running Series: R 37

 *** When you download R, automatically obtain the foreign package***  Submit library(foreign)  many more options for importing data:  read.xport(), read.spss(), read.dta(), read.mtp()  For more information on these options, submit help(read.xxxx) Ann Arbor ASA Up and Running Series: R 38

 You can export data by using the write.table() command write.table(trees,“treesDATA.txt”,row.names=FALSE,sep=“,”)  Specifies that we want the trees data set exported  Type in name of file to be exported.  By default R writes file to working directory already specified unless you give a location  row.names=FALSE  tells R that we do not wish to preserve the row names  sep=“,”  data set is comma delimited Ann Arbor ASA Up and Running Series: R 39

Ann Arbor ASA Up and Running Series: R 40 Questions?

 Introduction  R Help  Functions  Working with Data  Importing/Exporting Data  Graphs + Statistics  Practice Problems  Further Resources Ann Arbor ASA Up and Running Series: R 41

 Assign a name to the furniture data set, as we read it in, to do some analysis furn<-read.table(“furniture.txt”,sep=“”,h=T)  To examine data set  dim(furn)  summary(furn)  names(furn)  attach(furn)  It is important to attach before subsequent steps with the data Ann Arbor ASA Up and Running Series: R 42

 R can produce very simple and very complex graphs  Make a simple scatter plot of the Area and Cost variables from the furniture data set  plot(Area,Cost,main=“Area vs Cost”,xlab=“Area”,ylab=“Cost”)  Area on the x-axis  Cost on the y-axis  Title and labels the axes Ann Arbor ASA Up and Running Series: R 43

 Variables distribution using graphs in R  hist(Area) – histogram of Area  hist(Cost) – histogram of Cost  boxplot(Cost ~ Type) – boxplot of Cost by Type Ann Arbor ASA Up and Running Series: R 44

 We can make the boxplot much prettier boxplot(Cost~Type,main=“Boxplot of Cost by Type”, col=c(“orange”,“green”,“blue”), xlab=“Type”, ylab=“Cost”) Ann Arbor ASA Up and Running Series: R 45

 Scatter plot matrix of all variables in a data set using the pairs() function  pairs(furn)  Correlation/covariance matrix of numeric variables  cor(furn[,c(2:3)])  cov(furn[,c(2:3)]) Ann Arbor ASA Up and Running Series: R 46

 Simple linear regression using the furniture data  m1<-lm(Cost ~ Area)  summary(m1)  coef(m1)  fitted.values(m1)  residuals(m1) Ann Arbor ASA Up and Running Series: R 47

 Plot the residuals against the fitted values  plot(fitted.values(m1), residuals(m1)) Ann Arbor ASA Up and Running Series: R 48

 Scatter plot of Area and Cost plot(Area,Cost,main=“Cost Regression Example”,xlab=“Cost”, ylab=“Area”)  abline(lm(Cost~Area), col=3, lty=1)  lines( lowess(Cost~Area), col=3, lty=2)  Interactively add a legend  legend(locator(1),c(“Linear”,“Lowess”),lty=c(1,2),col=3)  point to graph and place legend where you wish! Ann Arbor ASA Up and Running Series: R 49

 Identify different points on the graph  identify(Area, Cost, row.names(furn))  Makes it easy to identify outliers  Use the locator() command to quantify differences between the regression fit and the loess line  locator(2)  Example - Compare predicted values of Cost when Area is equal to 50 Ann Arbor ASA Up and Running Series: R 50

 Multivariate regression with Area and Type as predictors and Cost as response variable in model  m2<-lm(Cost ~ Area + Type)  summary(m2)  Summary of regression results, including coefficients Ann Arbor ASA Up and Running Series: R 51

 Let’s see if the multivariate model is significantly better than the simple model by using ANOVA  anova(m1, m2)  The ANOVA table compares the two nested regression models by testing the null hypothesis that the Type predictor did not need to be in the model.  Result - the p-value<.05, there is evidence that Type is a predictor Ann Arbor ASA Up and Running Series: R 52

 Introduction  Using R Help  Functions  Working with Data  Importing/Exporting Data  Graphs + Statistics  Practice Problems  Further Resources Ann Arbor ASA Up and Running Series: R 53

a) Create a sequence: start at 0 and go to 5 with a step of 0.5 b) Replicate ‘a b c’ 3 times c) Replicate ‘a’ 3 times, ‘b’ 3 times, ‘c’ 3 times in one command Ann Arbor ASA Up and Running Series: R 54

a) Create a sequence: start at 0 and go to 5 with a step of 0.5 b) Replicate ‘a b c’ 3 times c) Replicate ‘a’ 3 times, ‘b’ 3 times, ‘c’ 3 times in one command Ann Arbor ASA Up and Running Series: R 55

a) Make a histogram of Girth from trees data set. Include a title. b) Make a boxplot of Height from trees data set. Color it blue and label your axes. c) Make a scatter plot of Girth and Height. Ann Arbor ASA Up and Running Series: R 56

a) Make a histogram of Girth from trees data set. Include a title. b) Make a boxplot of Height from trees data set. Color it blue and label your axes. c) Make a scatter plot of Girth and Height. Ann Arbor ASA Up and Running Series: R 57

a) Create a simple linear model with Girth as predictor and Height as response. Extract the coefficients. b) Add Volume to the model. c) How can we tell if this model is preferred to the simpler model? Ann Arbor ASA Up and Running Series: R 58

a) Create a simple linear model with Girth as predictor and Height as response. Extract the coefficients. b) Add Volume to the model. c) How can we tell if this model is preferred to the simpler model? Ann Arbor ASA Up and Running Series: R 59

##Problem 1 seq(0, 5, by=.5) seq(length=51,from=-5,by=.2) rep("a b c", each=3) rep(c("a", "b", "c"), each=3) ##Problem 2 data(trees) attach(trees) names(trees) hist(Girth, main="Histogram of Trees Girth") ##Problem 2 (cont) boxplot(Height, main="Boxplot of Height of Trees", col=c("blue"),xlab="Trees",ylab="Height") plot(Girth, Height, main="Girth vs Height of Trees", xlab="Height",ylab="Girth") ##Problem 3 m1<-lm(Height~Girth) summary(m1) m2<-lm(Height~Girth+Volume) summary(m2) anova(m1,m2) Ann Arbor ASA Up and Running Series: R 60

Ann Arbor ASA Up and Running Series: R 61 Questions?

 Introduction  Using R Help  Functions  Working with Data  Importing/Exporting Data  Graphs + Statistics  Practice Problems  Further Resources Ann Arbor ASA Up and Running Series: R 62

 R Project Web Page -  Left hand side of the screen,  Click on the CRAN link:  Download, Packages CRANCRAN (Comprehensive R Archive Network)  Choose one of the U.S. mirrors ( is recommended) Ann Arbor ASA Up and Running Series: R 63

 Download and Install R  Click on the folder that best describes your operating system.  When using Windows, click on the “base” subdirectory. This will allow you to download the “base R” package.  Download R for Windows.  R is updated quite frequently, and the version number is always changing.  Save the *.exe file in your computer.  Double-click on the *.exe. A wizard will appear to guide through the setup of the R software on your machine.  An R icon on your desktop/taskbar gives a shortcut to R.  Double-click on this icon, and you are ready to go! Ann Arbor ASA Up and Running Series: R 64

 Project home  Documentation  Help forum  Journal  Graphical Gallery  Graphical Manual  Seek Ann Arbor ASA Up and Running Series: R 65

 UCLA:   Harvard/MIT:   An Introduction to R:  Ann Arbor ASA Up and Running Series: R

 Ann Arbor Chapter of the American Statistical Association (Ann Arbor ASA)  R  SAS  SAS’ JMP  SPSS  Stata  Statistics with Excel  MS Access 67 Ann Arbor ASA Up and Running Series: R

 Center for Statistical Consultation and Research (CSCAR)  Statistical Analysis with R  Intermediate SAS  Using ArcGIS  Applied Structural Equation Modeling  Introduction to NVivo  Applications of Hierarchical Linear Models  Introduction to Programming in Stata  Regression Analysis  Classification and Regression Trees Using JMP  Introduction to SPSS 68 Ann Arbor ASA Up and Running Series: R

69 Ann Arbor ASA Up and Running Series: R