Reading a file R can read a wide variety of input formats Text, Statistical package formats (e.g., SAS) DBMS
Reading a text file Delimited text file, such as CSV Creates a data frame Specify as required Presence of header Separator Row names It will not find this local file on your computer. Mac require(readr) t <- read.csv("~/Dropbox/Carolina/Paper2/Fixed Encoding Data/changeBrasil.txt", stringsAsFactors=FALSE) t <- read.csv("C:\\Dropbox\Carolina\\Paper2\\Fixed Encoding Data\\changeBrasil.txt", stringsAsFactors=FALSE) PC
Reading a text file Can read a file using a URL t <- read.table(url, header=T, sep=',')
Learning about an object Click on the name of the file in the top-right window to see its content url <- "http://people.terry.uga.edu/rwatson/data/centralparktemps.txt" t <- read.table(url, header=T, sep=',') head(t) # first six rows tail(t) # last six rows dim(t) # dimension str(t) # structure of a dataset class(t) #type of object Click on the blue icon of the file in the top-right window to see its structure
Referencing data datasetName$columName Column Data set # Referencing your data # Qualify with tablename to reference fields mean(t$temperature) sd(t$temperature) max(t$year) range(t$month) Column Data set
Creating a new column Formula to transform Fahrenheit to Celsius http://www.manuelsweb.com/temp.htm # Creating a new column t$Ctemp <- round((t$temperature-32)*5/9,1) head(t)
Renaming a column and writing a file # Renaming a column colnames(t)[3] <- 'Ftemp' # rename third column to indicate Fahrenheit head(t) # Save a file write.table(t,"centralparktempsCF.txt") The file is stored in your default location (maybe documents or the folder where you save the script)
sqldf A R package for using SQL with data frames Returns a data frame Supports MySQL
Subset and Sort Selecting rows Selecting columns Selecting rows and columns Sorting on column name library(sqldf) options(sqldf.driver = "SQLite") # to avoid a conflict with RMySQL trowSQL <- sqldf("select * from t where year = 1999") tcol <- t[,c(1:2,4)] tcolSQL <- sqldf("select year, month, Ctemp from t”) trowcolSQL <- sqldf("select year, month, Ctemp from t where year > 1989 and year < 2000") sSQL <- sqldf("select * from t order by year desc, month")
Recoding Some analyses might be facilitated by the recoding of data Split a continuous measure into two categories t$Category <- 'Other’ head(t) t$Category[t$Ftemp >= 30] <- 'Hot’
Deleting information on a column Assign NULL t$Category <- NA
Aggregate data Summarize data using a specified function Compute the mean monthly temperature for each year # Average F temperate for each year a <- aggregate(t$Ftemp, by=list(t$year), FUN=mean) # Name columns colnames(a) = c('year', 'mean') a sqldf("select year, avg(Ftemp) as mean from t group by year")
Exercise Using sqldf Compute the maximum temperature for year 2000
Compile a notebook A notebook is a report of an analysis Interweaves R code and output File > Compile Notebook … Select html, pdf, or Word output Install knitr before use Install suggested packages
HTML
Resources R books Reference card Quick-R DataCamp If you ever use R and get an error, DO NOT PANIC. Google your error and search for answers in StackOverFlow—they are usually very good!
Key points R is a platform for a wide variety of data analytics Statistical analysis Data visualization HDFS and MapReduce Text mining Energy Informatics R is a programming language Much to learn