Welcome to the R intro Workshop Before we begin, please download the “SwissNotes.csv” and “cardiac.txt” files from the ISCC website, under the R workshop.

Slides:



Advertisements
Similar presentations
Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
Advertisements

Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
An Introduction to R: Logic & Basics. The R language Command line Can be executed within a terminal Within Emacs using ESS (Emacs Speaks Statistics)
R for Macroecology Aarhus University, Spring 2011.
R Language. What is R? Variables in R Summary of data Box plot Histogram Using Help in R.
Lab # 03- SS Basic Graphic Commands. Lab Objectives: To understand M-files principle. To plot multiple plots on a single graph. To use different parameters.
StatLab Workshop Yale University Maximiliano Appendino, Economics October 18 th, 2013.
Introduction to MATLAB The language of Technical Computing.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Introduction to R Graphics
Introduction to Matlab Workshop Matthew Johnson, Economics October 17, /13/20151.
Introduction to MATLAB for Biomedical Engineering BME 1008 Introduction to Biomedical Engineering FIU, Spring 2015 Lesson 2: Element-wise vs. matrix operations.
Jack Davis Andrew Henrey FROM N00B TO PRO. PURPOSE Create a simulator from scratch that: Generates data from a variety of distributions Makes a response.
Chapter 7 Introduction to Procedures. So far, all programs written in such way that all subtasks are integrated in one single large program. There is.
P1PMF Split1 QBASIC. P1PMF Split2QBasic Command Prompt Will launch the emulator DOS operating system? Press Alt + Enter to display the widescreen.
MATLAB – What is it? Computing environment / programming language Tool for manipulating matrices Many applications, you just need to get some numbers in.
MATLAB TUTORIAL Dmitry Drutskoy Some material borrowed from the departmental MATLAB info session by Philippe Rigollet Kevin Wayne.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Assumption of Homoscedasticity
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Introduction to MATLAB ENGR 1187 MATLAB 1. Programming In The Real World Programming is a powerful tool for solving problems in every day industry settings.
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Introduction to R Part 2. Working Directory The working directory is where you are currently saving data in R. What is the current working directory?
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 6 Value- Returning Functions and Modules.
1 Lab of COMP 406 Teaching Assistant: Pei-Yuan Zhou Contact: Lab 1: 12 Sep., 2014 Introduction of Matlab (I)
MEGN 536 – Computational Biomechanics MATLAB: Getting Started Prof. Anthony J. Petrella Computational Biomechanics Group.
ECE 1304 Introduction to Electrical and Computer Engineering Section 1.1 Introduction to MATLAB.
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
Input, Output, and Processing
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
I❤RI❤R Kin Wong (Sam) Game Plan Intro R Import SPSS file Descriptive Statistics Inferential Statistics GraphsQ&A.
Installing R CRAN: –(R homepage: –Windows 95 and later  Base –rw2001.exe.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Introduction to Programming with RAPTOR
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
Chapter 2 Analysis using R. Few Tips for R Commands included here CANNOT ALWAYS be copied and pasted directly without alteration. –One major reason is.
Chapter 1 – Matlab Overview EGR1302. Desktop Command window Current Directory window Command History window Tabs to toggle between Current Directory &
STAT 251 Lab 1. Outline Lab Accounts Introduction to R.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
STAT 534: Statistical Computing Hari Narayanan
Overview Excel is a spreadsheet, a grid made from columns and rows. It is a software program that can make number manipulation easy and somewhat painless.
INTRODUCTION TO MATLAB Dr. Hugh Blanton ENTC 4347.
Digital Image Processing Introduction to MATLAB. Background on MATLAB (Definition) MATLAB is a high-performance language for technical computing. The.
SCRIPTS AND FUNCTIONS DAVID COOPER SUMMER Extensions MATLAB has two main extension types.m for functions and scripts and.mat for variable save files.
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
MIS2502: Data Analytics Introduction to Advanced Analytics and R.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Programming in R Intro, data and programming structures
R programming language
Introduction to R Samal Dharmarathna.
DEPARTMENT OF COMPUTER SCIENCE
MATLAB DENC 2533 ECADD LAB 9.
Lab 1 Introductions to R Sean Potter.
Use of Mathematics using Technology (Maltlab)
Communication and Coding Theory Lab(CS491)
Topics Introduction to Value-returning Functions: Generating Random Numbers Writing Your Own Value-Returning Functions The math Module Storing Functions.
MIS2502: Data Analytics Introduction to R and RStudio
R Course 1st Lecture.
Stat 251 (2009, Summer) Lab 2 TA: Yu, Chi Wai.
Data analysis with R and the tidyverse
Presentation transcript:

Welcome to the R intro Workshop Before we begin, please download the “SwissNotes.csv” and “cardiac.txt” files from the ISCC website, under the R workshop (more info).

Introduction to R Workshop in Methods from the Indiana Statistical Consulting Center Thomas A. Jackson February 15, 2013

Overview The R Project for Statistical Computing “R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and Colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.” - Description from CRAN Website

Benefits R … is free is interactive: we can type something in and work with it ▫How we analyze data can be broken into small steps is interpretative: we give it commands and it translates them into mathematical procedures or data management steps can be used in a batch: nice because it is documented is a calculator: it is unlike other calculators though because you can create variables and objects

Let’s Get R Started How to open R → Start Menu → Programs → Departmentally Supported → Stat/Math → R

Graphical User Interface (GUI) Three Environments Command Window (aka Console) Script Window Plot Window

Command Window Basics To quit: type q() Save workspace image? Moves from memory to hard- drive Storing variable in memory, or = a<- 5 stores the number 5 in the object “a” pi -> b stores the number π= in “b” x = stores the result of the calculation (3) in “x” “=“ requires left-hand assignment Try not to overwrite reserved names such as t, c, and pi!

Command Window Basics Printing to output Calculations that are not stored print to output > [1] 8 Type name to view stored object > a [1] 5 Use print() > print(a) [1] 5 View objects in workspace objects() or ls()

Command Window Basics Clearing the console (command window) Mac: Edit → Clear Console Windows: Edit → Clear Console or Mac: Alt + Command + L Windows: Ctrl + L Removing variables from memory rm() or remove() > x <- 4 > rm(x) rm(list = ls()) remove all variables

Script Window Basics Saving syntax (code) Mac: File → New Windows: File → New Script Documenting code: # Comments out everything on line behind Running code from Script Window Mac: Apple + Enter Windows: F5 or Ctrl + r

Working Directory Obtaining working directory getwd() Mac: Misc → Get Working Directory Windows: File → Change dir... Changing working directory setwd() Mac: Misc → Change Working Directory Windows: File → Change dir...

Path Names Specify with forward slashes or double backslashes Enclose in single or double quotation marks Examples setwd(“C:/Program Files/R/R-2.6.1”) setwd(‘C:\\Program Files\\R\\R-2.6.1’)

R Help Helpful commands If you know the function name: help() or ? > help(log) > ?exp If you do not know the function name: help.search() or ?? > help.search(“anova”) > ??regression

Documentation Elements of a documentation file Function{Package} Description Usage: What your code should look like, “=“ gives default Arguments: Inputs to the function Details Value: What the function will return See Also: Related functions Examples

Online Resources CRAN Website: R Seek: Quick-R tutorial: R Tutor: UCLA: R listservs Google Google tip: include “[R]” (instead of just “R”) with search topic to help filter out non-R websites

Additional Packages Over 2,500 listed on the CRAN website! Use with caution Initial download of R: base, graphics, stats, utils 1) Installing a package: Mac: Packages & Data → Package Installer Use Package Search to locate and press ‘Install Selected’ Windows: Packages → Install Packages Locate desired package and press ‘OK’ install.packages(“MASS”) 2) Using an installed package: You MUST call it into active memory with library() > library(MASS)

Data Structures R has several basic types (or “classes”) of data: Numeric - Numbers Character – Strings (letters, words, etc.) Logical – TRUE or FALSE Vector Matrix Array Data Frame List NOTE: There are other classes, but these are most common. Understanding differences will save you some headache.

Data Structures Find class of data Unknown class: class() Check particular class: is.“classname”() > a <- 5 > class(a) [1] “numeric” > is.character(a) [1] FALSE Change class: as.classname() > as.character(a) [1] “5”

Vectors Combine items into vector: c() > c(1,2,3,4,5,6) [1] Repeat number of sequence of numbers: rep() > rep(1,5) [1] > rep (c(2,5,7), times = 3) [1]

Vectors Sequence generation: seq() > seq(1,5) [1] > seq(1,5, by =.5) [1] Try 1:10 or 10:1

Matrices Create matrix: matrix() 6 x 1 matrix: matrix(1:6, ncol = 1) 2 x 3 matrix: matrix(1:6, nrow =2, ncol =3) 2 x 3 matrix filling across rows first: matrix(1:6, nrow = 2, ncol = 3, byrow = TRUE) Create matrix of more than two dimensions (array): array()

Lists Create a list: list() Holds vectors, matrices, arrays, etc. of varying lengths Objects in the list can be named or unnamed > list(matrix(0, 2, 2), y = rep(c(“A”, “B”), each = 2)) [[1]] [,1] [,2] [1,]00 [2,]00 $y [1] “A” “A” “B” “B” Data Frame: specialized list that holds variables of same length

Data Frames Create a data frame: data.frame() Like a matrix, holds specified number of rows and columns > x <- 1:4 > y <- rep(c(“A”, ”B”), each = 2) > data.frame(x,y) x y 1 1 A 2 2 A 3 3 B 4 4 B Unnamed variables get assigned names > data.frame(1:2, c(“A”, “B”)) X1.2 c..A….B A 2 2 B

Basic Operations Arithmetic: +, -, *, / Order of operations: () Exponentiaition: ^, exp() Other: log(), sqrt Evaluate standard Normal density curve, at x = 3 > x <- 3 > 1/sqrt(2*pi)*exp(-(x^2)/2) [1]

Vectorization R is great at vectorizing operations Feed a matrix or vector into an expression Receive an object of similar dimension as output For example, evaluate at x = 0,1,2,3 > x <- c(0,1,2,3) > 1/sqrt(2*pi)*exp(-(x^2)/2) [1]

Logical Operations Compare: ==, >, =, <=, != > a <- c(1,1,2,4,3,1) > a == 2 [1] FALSE FALSE TRUE FALSE FALSE FALSE And: & or && Or: | or || Find location of TRUEs: which() > which(a == 1) [1] 1 2 6

Subsetting > a <- 1:5 > b <- matrix(1:12,nrow = 3) Use Square brackets [] Pick range of elements: a[1:3] Pick particular elements: a[c(1,3,5)] Do not include elements: a[-c(1,4)]

Subsetting (cont.) Use commas in more than on dimension (matrices & data frames) Pick particular elements: B[1:2,2:4] Give all rows and specified columns: B[,1:2] Give all columns and specified rows: B[1:2,] Note: B[2] coerces into a vector then gives specified element

Reading External Data Files SwissNotes.csv Data set Complied by Bernard Flury Contains measurements on 200 Swiss Bank Notes 100 genuine and 100 counterfeit notes

Reading External Data Files (cont.) Most general function: read.table() read.table(file,header=FALSE,sep = “”,…) Creates a data frame File name must be in quotes, single or double File name is case sensitive Include file name extension if data not in working directory > read.table(“C:/Users/jacksota/Desktop/SwissNotes.csv”, T,“,”) Don’t know the file extension? Try: file.choose() > read.table(file.choose(), header = TRUE, sep = ”,”) sep defines the separator, e.g. “,” or “\t” or “” header indicates variable names should be read from first row

Reading External Data Files For comma delimited files: read.csv() For tab delimited files: read.delim() For Minitab, SPSS, SAS, STATA, etc. data: foreign package Contains functions to read variety of file formats Functions operate like read.data() Contains functions for writing data into these file formats

Data Frame Hints Identify variable names in data frame: names() > data1 <- read.table(“SwissNotes.csv”, sep=“,”, header =TRUE) > names(data1) [1] “Length” “LeftHeight” “RightHeight” “LowerInner.Frame” [5] “UpperInner.Frame” “Diagonal” “Type” Assign name to data frame variables > names(data1) <- c(“Length”, “LeftHeight”, “RightHeight”, “LowerInner..Frame”, “UpperInner.Frame”, “Diagonal”, “Type”) Note: names are strings and MUST be contained in quotes

Data Frame Hints (cont.) Create objects out of each data frame variable: attach() In the Swiss Note data, to refer to Type as its own object > attach(data1) > Type [1] GenuineGenuineGenuine ….

Data Frame Hints (cont.) Remove attached objects from workspace: detach() > detach(data1) > Type Error: object “Type” not found Note: Type is still part of original data frame, but is no longer a separate object.

plot() function plot() is the primary plotting function Calling plot will open a new plotting window Documentation: ?plot For complete list of graphical parameters to manipulate: ?par

plot() function Let’s visualize the SwissNotes.csv data. After loading the data into R, attach the data frame using attach(data). Let’s try a scatter plot of LeftHeight by RightHeight. >plot(LeftHeight, RightHeight)

plot() function Change symbols: Option pch=. See ?par for details. >plot(LeftHeight,RightHeight,pch=2)

plot() Function Change symbol color: Option col= Specify by number or by name: col=2 or col=“red” Hint: Type palette() to see colors associated with number Type colors() to see all possible colors > plot(LeftHeight, RightHeight, col=“red”)

What types of points can we get?

plot() Function Change plot type: Option type = “p” for points “l” for lines “b” for both “c” for lines part alone of “b” “o” for both overplotted “h” for histogram like (or high-density) vertical lines “s” for stair steps “S” for other steps, see Details below “n” for no plotting

Plot() Function Points with lines…works better on sorted list of points >plot(LeftHeight,RightHeight,type=“o”)

Scatterplots for Multiple Groups Use plot() with points() to plot different groups in same plot Genuine notes vs. Counterfeit notes >plot(LeftHeight[Type==“Genuine”],Rightheight[Type==“Genuine”], col=“red”) >points(LeftHeight[Type==“Counterfeit”],RightHeight[Type==“Counterfeit”],col=“blue”)

Axis Labels and Plot Titles The plot() command call has options to Specify x-axis label: xlab = “X Label” Specify y-axis label: ylab = “Y Label” Specify plot title: main = “Main Title” Specify subtitle: sub = “Subtitle”

Axis Labels and Plot Titles >plot(LeftHeight[Type==”Genuine”],RightHeight[Type==“Genuine”], col=“red”,main=“Plot of Bank Note Heights”,sub=“Measurements are in mm”,xlab=“Height of Left Side”,ylab=“Height of Right Side”) >points(LeftHeight[Type==“Counterfeit”], RightHeight[Type=“Counterfeit”],col=“blue”)

Legends  legend(“topleft”,c(“Genuine Notes”, ”Counterfeit Notes”),pch=c(21,21),col=c(“red”,”blue”))

Adding Lines To add straight lines to plot: abline() abline() refers to standard equation for a line: y = bx + a Horizontal line: abline(h= ) Vertical Line: abline(v= ) Otherwise: abline(a=, b= ) or abline(coef=c(a,b))

Adding Lines > abline(coef=c( ,0.8319))

Histograms Histograms are another popular plotting option. > hist(Length)

pairs() Function Using the SwissNote Data > pairs(swiss)

Boxplots To create boxplots: boxplot() Specify one or more variables to plot. > boxplot(swiss$Length) > boxplot(swiss[,2:3])

Boxplots Use a formula specification for side-by-side boxplots. Note: boxplot() has many options, e.g. notches. See ?boxplot. > boxplot(Length~Type,notch=TRUE,data=swiss)

Mean or Average Mean() > mean(swiss[,”Length”]) > mean(swiss) rowMeans() > rowMeans(swiss[,1:6]) colMeans > colMeans(swiss[,7])

Variability Variance: var() > var(swiss[,”Length”]) > var(swiss) Covariance() > cov(swiss) Correlation() > cor(swiss[,1:6])

Five-number Summary >summary(swiss[1:3]) Length LeftHeight RightHeight Min. :213.8 Min. :129.0 Min. : st Qu.: st Qu.: st Qu.:129.7 Median :214.9 Median :130.2 Median :130.0 Mean :214.9 Mean :130.1 Mean : rd Qu.: rd Qu.: rd Qu.:130.2 Max. :216.3 Max. :131.0 Max. :131.1

Creating Tables table() produces crosstabs of factors or categorical variables Using the cardiac data: > table(cardiac[,7:9]),, newMI = 0 chestpain gender 0 1 F 6 10 M 4 8,, newMI = 1 chestpain gender 0 1 F M

Univariate t-tests t.test() produces 1- and 2-sample (paired or independent) t- tests. 1-sample t-test > t.test(x,alternative=“two.sided”,mu=0,conf.level=0.95) 2 independent samples t-test > t.test(x,y,alternative=“two.sided”,mu=0,paired=FALSE, conf.level=0.95) paired t-test > t.test(x,y,alternative=“two.sided”,mu=0,paired=TRUE, var.equal=TRUE,conf.level=0.95)

2 Independent Samples t-test x: diagonal measurements for Genuine bank notes y: diagonal measurements for Counterfeit bank notes > x = swiss[Type==“Genuine”,”Diagonal”] > y = swiss[Type==“Counterfeit”,”Diagonal”] > t.test(x,y,alternative=“greater”,mu=0, paired=FALSE,var.equal=TRUE)

2 Independent Samples t-test > t.test(x,y,alternative=“greater”,mu=0, paired=FALSE,var.equal=TRUE) Two Sample t-test data: x and y T = , df = 198, p-value < 2.2e-16 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: Inf sample estimates: mean of x mean of y

Generating Random Numbers R contains functions for generating random numbers from many well-known distributions. Random number from standard normal distribution: > rnorm(1,mean=0,sd=1) [1] Vector of random numbers from uniform distribution: > runif(3, min=0, max=1) [1] To reproduce results: set.seed()

Function Basics if() statement > n = rnorm(1) > if(n < 0){ n = abs(n) } if() statement with else() > n = rnorm(1) >if (n < 0){ n = abs(n) } else{n = 0}

Function Basics for() loop > temp = rep(0,10) > for (i in 1:10){ temp[i] = i+1 } > temp [1]

Function Basics while() loop > n = 1 > while (n < 10 ){ n = n+1 }

Creating Functions test.function = function(input arguments){ commands to execute }

Creating Functions For example, let’s define a new function average to find the average of a set of numbers. average = function(x){ n = length(x) average = sum(x)/n print(average) }

Sourcing After writing a function in a script file, bring it into working memory using source(). Source(“pathname/test.function.R”)