Presentation is loading. Please wait.

Presentation is loading. Please wait.

R Data Import/Export Dr. Jieh-Shan George YEH

Similar presentations


Presentation on theme: "R Data Import/Export Dr. Jieh-Shan George YEH"— Presentation transcript:

1 R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

2 Save and Load R Data Data in R can be saved as.Rdata files with function save(). getwd() setwd("c:\\temp") a <- 1:10 save(a, file="dumData.Rdata") rm(a) load("dumData.Rdata") print(a)

3 Fixed-width-format files cat("2 3 5 7", "11 13 17 19", file="ex1.data", sep="\n") scan(file="ex1.data", what=list(x=0, y="", z=0), flush=TRUE) cat("TITLE extra line", "2 3 5 7", "11 13 17", file = "ex2.data", sep = "\n") pp <- scan("ex2.data", skip = 1, quiet = TRUE) scan("ex2.data", skip = 1) scan("ex2.data", skip = 1, nlines = 1) # only 1 line after the skipped one pp2 read "7" pp3<-scan("ex2.data", what = list("","",""), flush = TRUE) unlink("ex2.data") # unlink deletes the file

4 Import from and Export to.CSV Files Create a dataframe df1 and save it as a.CSV le with write.csv(). The dataframe is loaded from file to df2 with read.csv() var1 <- 1:5 var2 <- (1:5) / 10 var3 <- c("R", "and", "Data Mining", "Examples", "Case Studies") df1 <- data.frame(var1, var2, var3) names(df1) <- c("VariableInt", "VariableReal", "VariableChar") write.csv(df1, "dummmyData.csv", row.names = FALSE) df2 <- read.csv("dummmyData.csv") print(df2)

5 Scan One common use of scan is to read in a large matrix. Suppose file matrix.dat just contains the numbers for a 200 x 2000 matrix. Then we can use A <- matrix(scan("matrix.dat", n = 200*2000), 200, 2000, byrow = TRUE) On one test this took 1 second (under Linux, 3 seconds under Windows on the same machine) Whereas A <- as.matrix(read.table("matrix.dat")) took 10 seconds (and more memory), and A <- as.matrix(read.table("matrix.dat", header = FALSE, nrows = 200, comment.char = "", colClasses = "numeric")) took 7 seconds.

6 Note that timings can depend on the type read and the data. writeLines(as.character((1+1e6):2e6), "ints.dat") xi <- scan("ints.dat", what=integer(0), n=1e6) # 0.77s xn <- scan("ints.dat", what=numeric(0), n=1e6) # 0.93s xc <- scan("ints.dat", what=character(0), n=1e6) # 0.85s xf <- as.factor(xc) # 2.2s DF <- read.table("ints.dat") # 4.5s

7 code <- c("LMH", "SJC", "CHCH", "SPC", "SOM") writeLines(sample(code, 1e6, replace=TRUE), "code.dat") y <- scan("code.dat", what=character(0), n=1e6) # 0.44s yf <- as.factor(y) # 0.21s DF <- read.table("code.dat") # 4.9s

8 zz <- read.csv("mr.csv", strip.white = TRUE) zzz <- cbind(zz[gl(nrow(zz), 1, 4*nrow(zz)), 1:2], stack(zz[, 3:6]))

9

10 read.table HousePrice <- read.table("houses.data") HousePrice <- read.table("houses.data", header=TRUE)

11 scan() function inp <- scan("input.dat", list("",0,0)) inp <- scan("input.dat", list(id="", x=0, y=0)) X <- matrix(scan("light.dat", 0), ncol=5, byrow=TRUE)

12 BUILT IN DATASETS

13 Accessing built in datasets Around 100 datasets are supplied with R (in package datasets) data() data(infert) To access data from a particular package, use the package argument data(package="rpart") data(Puromycin, package="datasets")

14 Editing data This is useful for making small changes once a data set has been read. The command data(car90, package="rpart") xnew <- edit(car90) If you want to alter the original dataset xold, the simplest way is to use fix(xold), which is equivalent to xold <- edit(xold). to enter new data via the spreadsheet interface. xnew <- edit(data.frame())

15 PACKAGE ‘XLSX’

16 Package ‘xlsx’ http://cran.r-project.org/web/packages/xlsx/xlsx.pdf install.packages("xlsx") require(xlsx) # example of reading xlsx sheets file <- system.file("tests", "test_import.xlsx", package = "xlsx") res <- read.xlsx(file, 2) # read the second sheet # example of writing xlsx sheets file <- paste(tempfile(), "xlsx", sep=".") write.xlsx(USArrests, file=file) #This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas. res <- read.xlsx("mydata.xlsx", 1, encoding="utf-8") # read the sheet1

17 Output to connections zz <- file("ex.data", "w") # open an output file connection cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") cat("One more line\n", file = zz) close(zz)

18 Output to connections ## capture R output: use examples from help(lm) zz <- textConnection("ex.lm.out", "w") sink(zz) example(lm, prompt.prefix = "> ") sink() close(zz) ## now ‘ex.lm.out’ contains the output for futher processing. ## Look at it by, e.g., cat(ex.lm.out, sep = "\n")

19 Input from connections ## read in file created in last examples readLines("ex.data") unlink("ex.data") ## read listing of current directory (Unix) readLines(pipe("ls -1")) ## read listing of current directory (windows) readLines(pipe(“dir"))


Download ppt "R Data Import/Export Dr. Jieh-Shan George YEH"

Similar presentations


Ads by Google