Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

R for Macroecology Aarhus University, Spring 2011.
Writing functions in R Some handy advice for creating your own functions.
M AT L AB Programming: scripts & functions. Scripts It is possible to achieve a lot simply by executing one command at a time on the command line (even.
MATLAB – What is it? Computing environment / programming language Tool for manipulating matrices Many applications, you just need to get some numbers in.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Data in R. General form of data ID numberSexWeightLengthDiseased… 112m … 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX!
Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
Matlab DIY Lesson 1: Reading Data. Purpose of this Seminar Basic Ability to handle Data Analysis and Presentation in Matlab Understand how data is organized.
MATLAB Cell Arrays Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University.
Lecture 2 LISAM. Statistical software.. LISAM What is LISAM? Social network for Creating personal pages Creating courses  Storing course materials (lectures,
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
3. Functions and Arguments. Writing in R is like writing in English Jump three times forward Action Modifiers.
Mathcad Variable Names A string of characters (including numbers and some “special” characters (e.g. #, %, _, and a few more) Cannot start with a number.
AN ENGINEER’S GUIDE TO MATLAB
ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?
Introduction to Dror Hollander Gil Ast Lab Sackler Medical School
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
REVIEW 2 Exam History of Computers 1. CPU stands for _______________________. a. Counter productive units b. Central processing unit c. Copper.
Introduction to R. J. Charles Victor – Intro to R Workshop Plan The R interface The Console The Console The Script Editor The Script Editor The “Workspace”
4/22/2017 5:36 PM EViews Training Creating Workfiles.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Introduction to R Lecture 3: Data Manipulation Andrew Jaffe 9/27/10.
1 Lab of COMP 406 Teaching Assistant: Pei-Yuan Zhou Contact: Lab 1: 12 Sep., 2014 Introduction of Matlab (I)
Data Objects in R Vector1 dimensionAll elements have the same data types Data types: numeric, character logic, factor Matrix2 dimensions Array2 or more.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Installing R CRAN: –(R homepage: –Windows 95 and later  Base –rw2001.exe.
R Programming Yang, Yufei. Normal distribution.
Introduction to R. Why use R Its FREE!!! And powerful, fairly widely used, lots of online posts about it Uses S -> an object oriented programing language.
Ann Arbor ASA ‘Up and Running’ With R Prepared by volunteers of the Ann Arbor chapter of the American statistical association, in cooperation with the.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
R Introduction, Data Structures. An Excellent R Book (among many others) R in Action Data Analysis and Graphics with R Robert I. Kabacoff
Introduction to R Carol Bult The Jackson Laboratory Functional Genomics (BMB550) Spring 2011.
STAT 534: Statistical Computing Hari Narayanan
Access Queries Agenda 6/16/14 Review Access Project Part 1, answer questions Discuss queries: Turning data stored in a database into information for decision.
Bioinformatics for biologists
Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with the Department of Statistics and the Center.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
MULTI-DIMENSIONAL ARRAYS 1. Multi-dimensional Arrays The types of arrays discussed so far are all linear arrays. That is, they all dealt with a single.
Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with the Department of Statistics and the Center.
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Programming in R Intro, data and programming structures
Introduction to R Samal Dharmarathna.
Introduction to R Carolina Salge March 29, 2017.
Introduction to R.
Introduction to R Studio
R basics workshop Sohee Kang Math and Stats Learning Centre
Lab 1 Introductions to R Sean Potter.
PHP.
Statistics 540 Computing in Statistics
MIS2502: Data Analytics Introduction to R and RStudio
R Course 1st Lecture.
Stat 251 (2009, Summer) Lab 2 TA: Yu, Chi Wai.
Data analysis with R and the tidyverse
Creating a dataset in R Instructor: Li, Han
Presentation transcript:

Basics of Using R Xiao He 1

AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2

AGENDA 1.What is R? 2.Arithmetic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 3

WHAT IS R? 1.Free open source statistical programming language. 2.Comes with many statistical functions. 3.Thousands of statistical packages users can download. 4.Requires users to write code. 4

WHAT IS R? 1.Free open source statistical programming language. 2.Comes with many statistical functions. 3.Thousands of statistical packages users can download. 4.Ability to produce high quality plots. 1.Requires users to write code. 5

WHAT IS R? 1.Free open source statistical programming language. 2.Comes with many statistical functions. 3.Thousands of statistical packages users can download. 4.Ability to produce high quality plots. 5.Requires users to write code. 6.CASE SENSITIVE! 6

WHAT IS R? 5.Download: (choose a mirror)  Choose a version compatible with your OS 7

WHAT IS R? 6.Command-line style 8

WHAT IS R? 6.Command-line style If you are working on some more complicated or longer scripts, or if you want to save the scripts you are working on, it’s a good practice to write your code in a script editor. (In R, go to File > “New Document” (Mac) or “New Script” (Windows)). 9

AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 10

BASIC OPERATIONS 1.Arithmetic operations:  +, -, * (elem.-wise mult.), /, ^ or **, sqrt(), abs()  %*% (matrix mult.)  Order of operations applies!!  Use parentheses to order operations if needed.  (2 - 3)/4 vs /4 2.Assignment:  "<-" : Assigning a value (on the right side of <- to a name on the left side of <-.  Data objects can be created using <-.  E.g., a <- 2 (assigning 2 to an object named a ) 11

BASIC OPERATIONS EXERCISE 1: Arithmetic operations and assignment Ex1.1: Ex1.2: Ex1.3: Assign the result of Ex1.1 to an object named ex1.1 12

AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 13

DATA OBJECTS 1.Vectors 2.Matrices 3.Data frames (tables) 14

DATA OBJECTS 1.Vectors 2.Matrices 3.Data frames (tables) a.Dimensionless b.Data points of the same type: e.g., numeric or character string, but not both. How do we create vectors? Use c(…) 15

DATA OBJECTS EXERCISE 2: Creating vectors Ex2.1: Create a vector named v1 that stores the following values: 2, 4, 1, 4, 6, 1 Ex2.2: Create a vector named v2 that stores the following character strings: "apple", "pear", "kiwi", ”plum” Ex2.3: Create a vector named v3 that stores the following values: 1.3, 0.2, 3.2, 5.1, 4.3, 6.7 Ex2.4: Create a vector named v4 that stores the following Booleans: TRUE, FALSE, FALSE, TRUE Ex2.5: Concatenate v1 and v3, and name the resulting vector v5. Ex2.6: Check the number of elements in a vector using length(). 16

DATA OBJECTS 1.Vectors 2.Matrices 3.Data frames (tables) a.2-dimensional b.Data points of the same type: e.g., numeric or character string, but not both. How do we create matrices? 17

DATA OBJECTS EXERCISE 3: Create matrices Create a 3 by 2 matrix that stores the following values: Column 1: 2.3, 2.1, 3.4 Column 2: 4.3, 1.2, 5.2 **There are a few ways of doing this. 18

BASIC OPERATIONS EXERCISE 2: Creating data objects Ex2.2: Create a 3 by 2 matrix named m1 that stores the following values: Column 1: 2.3, 2.1, 3.4 Column 2: 4.3, 1.2, 5.2 **There are a few ways of doing this. EXERCISE 3 Column 1: 2.3, 2.1, 3.4 Column 2: 4.3, 1.2, 5.2 1). Create two vectors and then use cbind(). 2). Use cbind() without explicitly creating vectors. 3). Create one vector to store all 6 values, and use matrix() to convert it into a matrix. 4). Use matrix() without explicitly creating a vector. 5). Check the dimensions of a matrix using dim(), nrow(), and ncol(). 19

DATA OBJECTS 1.Vectors 2.Matrices 3.Data frames a.2-dimensional b.Can store different data types. How do we create data frames? 20

DATA OBJECTS EXERCISE 4: Creating data frames Ex4.1: Convert a matrix into a data frame: Ex4.2: Create a data frame using data.frame(). Suppose we have 2 variables: the 1 st variable is called `score`, and the 2 nd variable is called `id`. score: 68, 70, 82, 96 id: "subj1", "subj2", "subj3", "subj4" Ex4.2: Check the dimensions using dim(), nrow(), and ncol(). 21

AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 22

IMPORT DATA  Natively supported data files:.txt,.dat,.csv  Some R packages extend support to data formats of other popular statistical programs, such as SPSS, STATA, and SAS. e.g., the R package `foreign` and the R package `RODBC` (Excel) (There are additional ways to import data that are not discussed here) 23

IMPORT DATA: VECTORS & MATRICES 1.Import vectors and matrices using scan(). (Due to time constraint, won’t discuss this here) scan() reads data points from a file (e.g.,.txt and.dat). 24

IMPORT DATA: DATA FRAMES 2.Import data frames using read.table(). read.table(file, header = FALSE, sep = "",...) file: path and the name of the file to be read in.* header: whether the 1 st row contains column names. sep: a character that separates values in a row. *You can use file.choose() instead typing out the file path and file name. 1. Let’s import the dataset vocab.txt and save it as vocab. First, open the text file using a text editor to see what the dataset looks like. vocab <- read.table(file="path/to/vocab.txt", header=FALSE) Is the code above correct or wrong given what you saw in the data file? vocab <- read.table(file="path/to/vocab.txt", header=TRUE) #Correct code head(vocab) str(vocab) #str() lets us display the structure of an R #object. 25 Windows: "C:\Users\XiaoHe\Desktop\my_data_file.csv” Mac: "/Users/xiaohe/Dropbox/R workshop/my_data_file.csv” NOTE: On windows, the path cannot be used as is, you have to change the slashes from backward slash “\” to forward slashes “/”; OR you can change all the single backward slashes to DOUBLE backward slashes. "C:\Users\XiaoHe\Desktop\my_data_file.csv"  "C:/Users/XiaoHe/Desktop/my_data_file.csv” Or  "C:\\Users\\XiaoHe\\Desktop\\my_data_file.csv” Windows: "C:\Users\XiaoHe\Desktop\my_data_file.csv” Mac: "/Users/xiaohe/Dropbox/R workshop/my_data_file.csv” NOTE: On windows, the path cannot be used as is, you have to change the slashes from backward slash “\” to forward slashes “/”; OR you can change all the single backward slashes to DOUBLE backward slashes. "C:\Users\XiaoHe\Desktop\my_data_file.csv"  "C:/Users/XiaoHe/Desktop/my_data_file.csv” Or  "C:\\Users\\XiaoHe\\Desktop\\my_data_file.csv”

IMPORT DATA: DATA FRAMES 2.Import data frames using read.table(). read.table(file, header = FALSE, sep = "",...) file: path and the name of the file to be read in.* header: whether the 1 st row contains column names. sep: a character that separates values in a row. *You can use file.choose() instead typing out the file path and file name. 2. Let’s import another set of data, called pima.csv and save it as pima. First, open the text file using a text editor to see what the dataset looks like. pima <- read.table(file=file.choose(), header=TRUE, sep=",") head(pima) str(pima) 26

IMPORT DATA: DATA FRAMES 3.Import datasets stored in formats not natively supported, using the package `foreign`. `foreign` must be installed. In R, installing a package can be done using install.packages("pkg_name") After installing a package, we need to load it using library(pkg_name) when we want to use it. So to install `foreign`, we do install.packages("foreign") To use the functions in `foreign`, we do library(foreign) 27

IMPORT DATA: DATA FRAMES 3.Import datasets stored in formats not natively supported, using the package `foreign`. read.spss()  SPSS read.dta()  STATA read.xport()  SAS Let’s now import an SPSS dataset called boston.sav. 28

IMPORT DATA: DATA FRAMES 3.Import datasets stored in formats not natively supported, using the package `foreign`. read.spss()  SPSS read.dta()  STATA read.xport()  SAS Let’s now import an SPSS dataset called boston.sav. boston <- read.spss(file.choose(), to.data.frame=TRUE) head(boston) 29

AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 30

MANIPULATE DATA OBJECTS  Subsetting 1.Vectors: (we will use the vector v1 we created earlier) > v1 [1] a). Selecting observations using ` [index] `. b). Delete observations using ` [-index] ` (negative index). Exercise 5 Ex5.1: Select one observation: Select the 2 nd obs. Ex5.2: Select contiguous observations: Select the 3 rd, 4 th, and 5 th obs. Ex5.3: Select non-contiguous observations: Select the 1 st, 4 th & 5 th obs. 31

MANIPULATE DATA OBJECTS  Subsetting 1.Vectors: (we will use the vector v1 we created earlier) > v1 [1] a). Selecting observations using ` [index] `. b). Delete observations using ` [-index] ` (negative index). Exercise 5 (cont’d) Ex5.4: Delete one observation: delete the 2 nd obs. Ex5.5: Delete contiguous observations: delete the 3 rd, 4 th, & 5 th obs. Ex5.6: Delete non-contiguous observations: delete the 1 st, 4 th, & 5 th obs. 32

MANIPULATE DATA OBJECTS  Subsetting 2.Matrices: (we will use the matrix m1a we created earlier) > m1a [,1] [,2] [1,] [2,] [3,] Matrices are 2-D, so we can use both the row index and the col index for sub- setting – [row_index, col_index]. Exercise 5 (cont’d) Ex5.7: Select a single data point: select the 3 rd row in the 2 nd column Ex5.8: Select an entire column/row: select the 3 rd row; select the 1 st column. 33

MANIPULATE DATA OBJECTS  Subsetting 2.Matrices: (we will use the matrix m1a we created earlier) > m1a [,1] [,2] [1,] [2,] [3,] Matrices are 2-D, so we can use both the row index and the col index for sub- setting – [row_index, col_index]. Exercise 5 (cont’d) Ex5.9: An example involving non-contiguous rows: select the 1 st and the 3 rd rows in the 1 st col. (Negative indices also work for matrices, but won’t be shown here) 34

MANIPULATE DATA OBJECTS  Subsetting 3.Data frames: (we will use the data frame vocab we imported earlier) > head(vocab) #display the first 6 rows year sex education vocabulary Female Female Male Female Male Male 14 7 Since data frames are 2-D, we can also use the row index and the col index to extract and subset data: [row_index, col_index] Ex5.10: Save the 2 nd to the 4 th row in a new data frame named vocab.a. 35

MANIPULATE DATA OBJECTS  Subsetting 3.Data frames: (we will use the data frame vocab we imported earlier) > head(vocab) #display the first 6 rows year sex education vocabulary Female Female Male Female Male Male 14 7 Since data frames are 2-D, we can also use the row index and the col index to extract and subset data: [row_index, col_index] Ex5.11: Save the 2 nd and the 3 th rows of columns 2 and 4. 36

MANIPULATE DATA OBJECTS  Subsetting 3.Data frames: (we will use the data frame vocab we imported earlier) > head(vocab) #display the first 6 rows year sex education vocabulary Female Female Male Female Male Male 14 7 We can also use ` df_name$col_name ` to extract an individual column. Ex5.12: Extract the year column. 37

MANIPULATE DATA OBJECTS  Subsetting 3.Data frames: (we will use the data frame vocab we imported earlier) > head(vocab) #display the first 6 rows year sex education vocabulary Female Female Male Female Male Male 14 7 We can also use ` df_name[, "col_name"] ` to extract columns. Ex5.13: (a) Extract the education column (b) Extract both the vocabulary and the education columns, NOTE: This method will also work with matrices that have column names. 38

MANIPULATE DATA OBJECTS  Subsetting data frames using subset() subset(x, subset, select) x: data frame subset: logical expr. indicating elements or rows to keep. select: column(s) to be selected; default: all columns. Ex5.14: Let’s select a subset of pima for women with more than 10 pregnancies: 39

MANIPULATE DATA OBJECTS  Subsetting data frames using subset() subset(x, subset, select) x: data frame subset: logical expr. indicating elements or rows to keep. select: column(s) to be selected; default: all columns. Ex5.15: Select a subset of pima for women with more than 10 pregnancies AND at least 44 years of age. 40

MANIPULATE DATA OBJECTS  Subsetting data frames using subset() subset(x, subset, select) x: data frame subset: logical expr. indicating elements or rows to keep. select: column(s) to be selected; default: all columns. Ex5.16: Select a subset of pima for women who were either never pregnant or women who had more than 12 pregnancies, and we only want the first 3 cols. Ex5.17: Select a subset of pima for women who had more than 10 pregnancies and did not have diabetes. 41

MISC. 1. Check what objects are currently in your workspace ls() objects() 2. Remove objects rm(object1_name, object2_name) rm(list=ls()) #removes all objects, so be careful!! 3. Unload a previously loaded package detach("package:package_name", unload=TRUE) 4. Check the arguments of a function args(function_name) 5. Help file ?function_name 6. Write a data frame to file ?write.table(df_name, "file_name") check ?write.table for additional arguments. 42

Thanks! 43