Presentation is loading. Please wait.

Presentation is loading. Please wait.

R Data Import/Export Dr. Jieh-Shan George YEH

Similar presentations


Presentation on theme: "R Data Import/Export Dr. Jieh-Shan George YEH"— Presentation transcript:

1 R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

2 Save and Load R Data Data in R can be saved as.Rdata files with function save(). getwd() setwd("c:\\temp") a <- 1:10 save(a, file="dumData.Rdata") rm(a) load("dumData.Rdata") print(a)

3 Scan() - Read data into a vector or list from the console or file cat("2 3 5 7", "11 13 17 19", file="ex1.data", sep="\n") scan(file="ex1.data", what=list(x=0, y="", z=0), flush=TRUE) cat("TITLE extra line", "2 3 5 7", "11 13 17", file = "ex2.data", sep = "\n") pp <- scan("ex2.data", skip = 1, quiet = TRUE) scan("ex2.data", skip = 1) scan("ex2.data", skip = 1, nlines = 1) # only 1 line after the skipped one pp2 read "7" pp3<-scan("ex2.data", what = list("","",""), flush = TRUE) unlink("ex2.data") # unlink deletes the file

4 Import from and Export to.CSV Files Create a dataframe df1 and save it as a.CSV le with write.csv(). The dataframe is loaded from file to df2 with read.csv() var1 <- 1:5 var2 <- (1:5) / 10 var3 <- c("R", "and", "Data Mining", "Examples", "Case Studies") df1 <- data.frame(var1, var2, var3) names(df1) <- c("VariableInt", "VariableReal", "VariableChar") write.csv(df1, "dummmyData.csv", row.names = FALSE) df2 <- read.csv("dummmyData.csv") print(df2)

5 read.table() - Reads a file in table format and creates a data frame from it Usage: read.table(file, header = FALSE, sep = "", row.names, col.names, nrows = -1, skip = 0) Example: HousePrice <- read.table("houses.data") HousePrice <- read.table("houses.data", header=TRUE)

6 Running Time Comparison (1/3) One common use of scan is to read in a large matrix. Suppose file matrix.dat just contains the numbers for a 200 x 2000 matrix. A <- matrix(scan("matrix.dat", n = 200*2000), 200, 2000, byrow = TRUE) On one test this took 1 second (under Linux, 3 seconds under Windows on the same machine) Whereas A <- as.matrix(read.table("matrix.dat")) took 10 seconds (and more memory), and A <- as.matrix(read.table("matrix.dat", header = FALSE, nrows = 200,comment.char = "", colClasses = "numeric")) took 7 seconds.

7 Running Time Comparison (2/3) Note that timings can depend on the type read and the data. writeLines(as.character((1+1e6):2e6), "ints.dat") xi <- scan("ints.dat", what=integer(0), n=1e6) # 0.77s xn <- scan("ints.dat", what=numeric(0), n=1e6) # 0.93s xc <- scan("ints.dat", what=character(0), n=1e6) # 0.85s xf <- as.factor(xc) # 2.2s DF <- read.table("ints.dat") # 4.5s

8 Running Time Comparison (3/3) code <- c("LMH", "SJC", "CHCH", "SPC", "SOM") writeLines(sample(code, 1e6, replace=TRUE), "code.dat") y <- scan("code.dat", what=character(0), n=1e6) # 0.44s yf <- as.factor(y) # 0.21s DF <- read.table("code.dat") # 4.9s

9 PACKAGE ‘XLSX’

10 Package ‘xlsx’ http://cran.r-project.org/web/packages/xlsx/xlsx.pdf install.packages("xlsx") require(xlsx) # example of reading xlsx sheets file <- system.file("tests", "test_import.xlsx", package = "xlsx") res <- read.xlsx(file, 2) # read the second sheet # example of writing xlsx sheets file <- paste(tempfile(), "xlsx", sep=".") write.xlsx(USArrests, file=file) #This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas. res <- read.xlsx("mydata.xlsx", 1, encoding="utf-8") # read the sheet1

11 Output to connections zz <- file("ex.data", "w") # open an output file connection cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") cat("One more line\n", file = zz) close(zz)

12 Output to connections ## capture R output: use examples from help(lm) zz <- textConnection("ex.lm.out", "w") sink(zz) example(lm, prompt.prefix = "> ") sink() close(zz) ## now ‘ex.lm.out’ contains the output for futher processing. ## Look at it by, e.g., cat(ex.lm.out, sep = "\n")

13 Input from connections ## read in file created in last examples readLines("ex.data") unlink("ex.data") ## read listing of current directory (Unix) readLines(pipe("ls -1")) ## read listing of current directory (windows) readLines(pipe(“dir"))

14 PACKAGE ‘XML’

15 Parsing XML library(XML) u<- "http://www.w3schools.com/xml/cd_catalog.xml" xml_data <- xmlToList(u) # Convert an XML node/document to a more R-like list xml_data class(xml_data) xml_data[["CD"]][["TITLE"]] library(plyr) df<-ldply(xml_data, data.frame) # Split list to data frame

16 PACKAGE ‘JSONLITE’

17 Parsing JSON library(jsonlite) jsoncars <- toJSON(mtcars) #Convert R objects to JSON mtcars2 <- fromJSON(jsoncars) #Convert R objects from JSON All.equal(mtcars, mtcars2) Reference: https://cran.r- project.org/web/packages/jsonlite/vignettes/json- aaquickstart.htmlhttps://cran.r- project.org/web/packages/jsonlite/vignettes/json- aaquickstart.html

18 BUILT IN DATASETS

19 Accessing built in datasets Around 100 datasets are supplied with R (in package datasets) data() data(infert) To access data from a particular package, use the package argument data(package="rpart") data(Puromycin, package="datasets")

20 Editing data This is useful for making small changes once a data set has been read. The command data(car90, package="rpart") xnew <- edit(car90) If you want to alter the original dataset xold, the simplest way is to use fix(xold), which is equivalent to xold <- edit(xold). to enter new data via the spreadsheet interface. xnew <- edit(data.frame())

21 OPEN DATA ONLINE undata, http://data.un.org/http://data.un.org/ Data.gov, https://www.data.gov/https://www.data.gov/ European Union Open Data Portal, https://open-data.europa.eu/ https://open-data.europa.eu/


Download ppt "R Data Import/Export Dr. Jieh-Shan George YEH"

Similar presentations


Ads by Google