Download presentation
Presentation is loading. Please wait.
Published byStephen Campbell Modified over 6 years ago
1
Reading a file R can read a wide variety of input formats Text,
Statistical package formats (e.g., SAS) DBMS
2
Reading a text file Delimited text file, such as CSV
Creates a data frame Specify as required Presence of header Separator Row names It will not find this local file on your computer. Mac require(readr) t <- read.csv("~/Dropbox/Carolina/Paper2/Fixed Encoding Data/changeBrasil.txt", stringsAsFactors=FALSE) t <- read.csv("C:\\Dropbox\Carolina\\Paper2\\Fixed Encoding Data\\changeBrasil.txt", stringsAsFactors=FALSE) PC
3
Reading a text file Can read a file using a URL
t <- read.table(url, header=T, sep=',')
4
Learning about an object
Click on the name of the file in the top-right window to see its content url <- " t <- read.table(url, header=T, sep=',') head(t) # first six rows tail(t) # last six rows dim(t) # dimension str(t) # structure of a dataset class(t) #type of object Click on the blue icon of the file in the top-right window to see its structure
5
Referencing data datasetName$columName Column Data set
# Referencing your data # Qualify with tablename to reference fields mean(t$temperature) sd(t$temperature) max(t$year) range(t$month) Column Data set
6
Creating a new column Formula to transform Fahrenheit to Celsius
# Creating a new column t$Ctemp <- round((t$temperature-32)*5/9,1) head(t)
7
Renaming a column and writing a file
# Renaming a column colnames(t)[3] <- 'Ftemp' # rename third column to indicate Fahrenheit head(t) # Save a file write.table(t,"centralparktempsCF.txt") The file is stored in your default location (maybe documents or the folder where you save the script)
8
sqldf A R package for using SQL with data frames Returns a data frame
Supports MySQL
9
Subset and Sort Selecting rows Selecting columns
Selecting rows and columns Sorting on column name library(sqldf) options(sqldf.driver = "SQLite") # to avoid a conflict with RMySQL trowSQL <- sqldf("select * from t where year = 1999") tcol <- t[,c(1:2,4)] tcolSQL <- sqldf("select year, month, Ctemp from t”) trowcolSQL <- sqldf("select year, month, Ctemp from t where year > 1989 and year < 2000") sSQL <- sqldf("select * from t order by year desc, month")
10
Recoding Some analyses might be facilitated by the recoding of data
Split a continuous measure into two categories t$Category <- 'Other’ head(t) t$Category[t$Ftemp >= 30] <- 'Hot’
11
Deleting information on a column
Assign NULL t$Category <- NA
12
Aggregate data Summarize data using a specified function
Compute the mean monthly temperature for each year # Average F temperate for each year a <- aggregate(t$Ftemp, by=list(t$year), FUN=mean) # Name columns colnames(a) = c('year', 'mean') a sqldf("select year, avg(Ftemp) as mean from t group by year")
13
Exercise Using sqldf Compute the maximum temperature for year 2000
14
Compile a notebook A notebook is a report of an analysis
Interweaves R code and output File > Compile Notebook … Select html, pdf, or Word output Install knitr before use Install suggested packages
15
HTML
16
Resources R books Reference card Quick-R DataCamp
If you ever use R and get an error, DO NOT PANIC. Google your error and search for answers in StackOverFlow—they are usually very good!
17
Key points R is a platform for a wide variety of data analytics
Statistical analysis Data visualization HDFS and MapReduce Text mining Energy Informatics R is a programming language Much to learn
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.