Download presentation
Presentation is loading. Please wait.
1
Introduction to R Carolina Salge March 29, 2017
2
R R is a free software environment for statistical computing and graphics Object-oriented It runs on a wide variety of platforms Highly extensible (through packages)
3
Files, plots, packages, & help
R Studio Datasets Scripts Results Files, plots, packages, & help
4
Agenda for Intro to R You will learn a few R basics, including how to create vectors, matrices, arrays, data frames, and lists but also how to omit missing values from a dataset You will also learn about packages (how to install and load tem, for example). Further, you will learn how to read a file into R (e.g., csv), inspect it, reference it, manipulate it, and save it to your machine. Finally, we will finish off with sqldf, notebook compiling, and more resources that you can exploit on your own!
5
Script A script is a set of R commands A program
c is short for combine in c(369.40, …) # CO2 parts per million for co2 <- c(369.40,371.07,373.17,375.78,377.52,379.76,381.85,383.71,385.57,384.78) year <- (2000:2009) # A range of values # Show values co2 year # Compute mean and standard deviation mean(co2) sd(co2) plot(year,co2)
6
Exercise Plot kWh per square foot by year for the following University of Georgia data year sqfeet kWh 2007 14,214,216 2,141,705 2008 14,359,041 2,108,088 2009 14,752,886 2,150,841 2010 15,341,886 2,211,414 2011 15,573,100 2,187,164 2012 15,740,742 2,057,364
7
Datasets A dataset is a table Same as the relational model
One row for each observation Columns contain observation values Same as the relational model R supports multiple data structures and multiple data types
8
Data structures Vector Matrix
A single row table where data are all of the same type (here it is numeric) Matrix A table where all data are of the same type co2 <- c(369.40,371.07,373.17,375.78,377.52,379.76,381.85,383.71,385.57,384.78) year <- (2000:2009) co2[2] # get the second value m <- matrix(1:12, nrow=4,ncol=3) m m[4,3] # fourth row in third column
9
Exercise Create a matrix with 6 rows and 3 columns containing the numbers 1 through 18
10
Data structures Array Data frame
Extends a matrix beyond two dimensions Data frame Same as a relational table Columns can have different data types Typically, read a file to create a data frame a <- array(1:24, c(4,3,2)) # 4 rows, 3 columns, 2 dimensions a[1,1,1] # row 1 of column 1 in dimension 1 gender <- c("m","f","f") age <- c(5,8,3) df <- data.frame(gender,age) df[1,2] # first row of column 2 df[1,] # all columns in row 1 df[,2] # all rows in column 2
11
Data structures List An ordered collection of objects
Can store a variety of objects under one name l <- list(co2,m,df) # a list with a vector, a matrix, and a data frame l[[3]] # list 3 or df l[[2]] # list 2 or m l[[1]] # list 1 or co2 l[[1]][2] # second element of list 1 (or co2) l[[2]][2,2] # second row of second column of list 2
12
Are Celsius and Fahrenheit interval or ratio data?
Types of data Classification Nominal (high, medium, or low) Sorting or ranking Ordinal (ranking of tennis players) Intervals between ordinal data are not necessarily equal. Murray (ranked 1) maybe a lot better than Djokovic (ranked 2) but Djokovic may be not a lot better than Wawrinka (ranked 3) Measurement Interval Ratio (time, distance) Ratio data have equal intervals Are Celsius and Fahrenheit interval or ratio data? See here
13
Factors Nominal and ordinal data are factors
By default, strings are treated as factors Determine how data are analyzed and presented Failure to realize a column contains a factor, can cause confusion Use str() to find out a frame’s data structure
14
Missing values Missing values are indicated by NA (not available)
Arithmetic expressions and functions containing missing values generate missing values Use the na.rm=T option to exclude missing values from calculations sum(c(1,NA,2)) sum(c(1,NA,2),na.rm=T)
15
Missing values You remove rows with missing values by using na.omit()
gender <- c("m","f","f","f") age <- c(5,8,3,NA) df <- data.frame(gender,age) df2 <- na.omit(df) View(df2) View(df)
16
Packages R’s base set of packages can be extended by installing additional packages Over 4,000 packages Search the R Project site to identify packages and functions Install using R studio Packages must be installed prior to use and their use specified in a script require(packagename) library(packagename)
17
Packages # install ONCE on your computer
# can also use Rstudio to install install.packages("knitr") # require EVERY TIME before using a package in a session # loads the package to memory require(knitr) library(knitr)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.