R Data Import/Export Dr. Jieh-Shan George YEH

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

R for Macroecology Aarhus University, Spring 2011.
Writing functions in R Some handy advice for creating your own functions.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Data in R. General form of data ID numberSexWeightLengthDiseased… 112m … 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX!
S Programming in R Bill Venables CSIRO Mathematics and Information Sciences Auckland, 7 July 2006.
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
Computing for Data Analysis R statistics programming environment Ming Ni 11/14/2014.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
Basic Data Input. To get started, you can give students binary data already in the R format. – save() one or more R objects to a file (with.rda extension)
Evan Girvetz Winkenwerder Intro to R Programming: Lecture 2 © R Foundation, from
CIS 240 Introduction to UNIX Instructor: Sue Sampson.
Computer Skills Preparatory Year Presented by: L.Obead Alhadreti.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
LISA Short Course Series R Basics
Statistical Software An introduction to Statistics Using R Instructed by Jinzhu Jia.
Systems Software Operating Systems.
LISA Short Course Series Basics of R Lin Zhang Feb. 16, 2015 LISA: Basics of RFeb. 16, 2015.
3. Functions and Arguments. Writing in R is like writing in English Jump three times forward Action Modifiers.
ATM 315 Environmental Statistics Course Goto Follow the link and then choose the desktop application.
LISA Short Course Series R Basics Ana Maria Ortega Villa Fall 2013 LISA: R BasicsFall 2013.
First Screen : First window form will always remain open, for the user to select menu options. 1.
An introduction to R: get familiar with R Guangxu Liu Bio7932.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
Standard Grade Computing System Software & Operating Systems.
The Original and Current Basic R “Console” command line interface….
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Data Objects in R Vector1 dimensionAll elements have the same data types Data types: numeric, character logic, factor Matrix2 dimensions Array2 or more.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Linux+ Guide to Linux Certification, Third Edition
Systems Software Operating Systems. What is software? Software is the term that we use for all the programs and data that we use with a computer system.
R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task.
R-Studio and Revolution Analytics have built additional functionality on top of base R.
R Programming Yang, Yufei. Normal distribution.
Outline Comparison of Excel and R R Coding Example – RStudio Environment – Getting Help – Enter Data – Calculate Mean – Basic Plots – Save a Coding Script.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
File Input and Output July 2nd, Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)
WDO-It! 102 Workshop: Using an abstraction of a process to capture provenance UTEP’s Trust Laboratory NDR HP MP.
Define and describe operating systems which contain a Command Line Interface (CLI) Define and describe operating systems which contain a Graphical User.
R Data Import/Export Dr. Jieh-Shan George YEH
Introduction to R Carol Bult The Jackson Laboratory Functional Genomics (BMB550) Spring 2011.
Bioinformatics for biologists
Linux+ Guide to Linux Certification, Second Edition
Introduction to Programming on MATLAB Ecological Modeling Course Sep 11th, 2006.
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
1 Introduction to Perl / R 26 – 27 April, 2010 By Fabian Glaser and Michael Shmoish Bioinformatics Knowledge Unit, The Lorry I. Lokey Interdisciplinary.
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
G-scan PC Utility Viewer Manual
R basics workshop Sohee Kang Math and Stats Learning Centre
Arko Barman COSC 6335 Data Mining Fall 2014
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
Welcome to Math’s Tutorial Session-3 Data handling
Prepare data for importing
Uploading and handling databases
R basics workshop Sohee Kang Math and Stats Learning Centre
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
Introduction to javadoc
Code is on the Website Outline Comparison of Excel and R
Devtools and package building Phuse Non-clinical Scripts
funCTIONs and Data Import/Export
Basics of R, Ch Functions Help Managing your Objects
Testthat package testing Phuse Non-clinical Scripts
Stata Basic Course.
Creating a dataset in R Instructor: Li, Han
Topic: Loops Loops Idea While Loop Introduction to ranges For Loop
Recapitulation of Lecture 5
Finding Statistics from Data
Presentation transcript:

R Data Import/Export Dr. Jieh-Shan George YEH

Save and Load R Data Data in R can be saved as.Rdata files with function save(). getwd() setwd("c:\\temp") a <- 1:10 save(a, file="dumData.Rdata") rm(a) load("dumData.Rdata") print(a)

Fixed-width-format files cat(" ", " ", file="ex1.data", sep="\n") scan(file="ex1.data", what=list(x=0, y="", z=0), flush=TRUE) cat("TITLE extra line", " ", " ", file = "ex2.data", sep = "\n") pp <- scan("ex2.data", skip = 1, quiet = TRUE) scan("ex2.data", skip = 1) scan("ex2.data", skip = 1, nlines = 1) # only 1 line after the skipped one pp2 read "7" pp3<-scan("ex2.data", what = list("","",""), flush = TRUE) unlink("ex2.data") # unlink deletes the file

Import from and Export to.CSV Files Create a dataframe df1 and save it as a.CSV le with write.csv(). The dataframe is loaded from file to df2 with read.csv() var1 <- 1:5 var2 <- (1:5) / 10 var3 <- c("R", "and", "Data Mining", "Examples", "Case Studies") df1 <- data.frame(var1, var2, var3) names(df1) <- c("VariableInt", "VariableReal", "VariableChar") write.csv(df1, "dummmyData.csv", row.names = FALSE) df2 <- read.csv("dummmyData.csv") print(df2)

Scan One common use of scan is to read in a large matrix. Suppose file matrix.dat just contains the numbers for a 200 x 2000 matrix. Then we can use A <- matrix(scan("matrix.dat", n = 200*2000), 200, 2000, byrow = TRUE) On one test this took 1 second (under Linux, 3 seconds under Windows on the same machine) Whereas A <- as.matrix(read.table("matrix.dat")) took 10 seconds (and more memory), and A <- as.matrix(read.table("matrix.dat", header = FALSE, nrows = 200, comment.char = "", colClasses = "numeric")) took 7 seconds.

Note that timings can depend on the type read and the data. writeLines(as.character((1+1e6):2e6), "ints.dat") xi <- scan("ints.dat", what=integer(0), n=1e6) # 0.77s xn <- scan("ints.dat", what=numeric(0), n=1e6) # 0.93s xc <- scan("ints.dat", what=character(0), n=1e6) # 0.85s xf <- as.factor(xc) # 2.2s DF <- read.table("ints.dat") # 4.5s

code <- c("LMH", "SJC", "CHCH", "SPC", "SOM") writeLines(sample(code, 1e6, replace=TRUE), "code.dat") y <- scan("code.dat", what=character(0), n=1e6) # 0.44s yf <- as.factor(y) # 0.21s DF <- read.table("code.dat") # 4.9s

zz <- read.csv("mr.csv", strip.white = TRUE) zzz <- cbind(zz[gl(nrow(zz), 1, 4*nrow(zz)), 1:2], stack(zz[, 3:6]))

read.table HousePrice <- read.table("houses.data") HousePrice <- read.table("houses.data", header=TRUE)

scan() function inp <- scan("input.dat", list("",0,0)) inp <- scan("input.dat", list(id="", x=0, y=0)) X <- matrix(scan("light.dat", 0), ncol=5, byrow=TRUE)

BUILT IN DATASETS

Accessing built in datasets Around 100 datasets are supplied with R (in package datasets) data() data(infert) To access data from a particular package, use the package argument data(package="rpart") data(Puromycin, package="datasets")

Editing data This is useful for making small changes once a data set has been read. The command data(car90, package="rpart") xnew <- edit(car90) If you want to alter the original dataset xold, the simplest way is to use fix(xold), which is equivalent to xold <- edit(xold). to enter new data via the spreadsheet interface. xnew <- edit(data.frame())

PACKAGE ‘XLSX’

Package ‘xlsx’ install.packages("xlsx") require(xlsx) # example of reading xlsx sheets file <- system.file("tests", "test_import.xlsx", package = "xlsx") res <- read.xlsx(file, 2) # read the second sheet # example of writing xlsx sheets file <- paste(tempfile(), "xlsx", sep=".") write.xlsx(USArrests, file=file) #This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in Also given is the percent of the population living in urban areas. res <- read.xlsx("mydata.xlsx", 1, encoding="utf-8") # read the sheet1

Output to connections zz <- file("ex.data", "w") # open an output file connection cat("TITLE extra line", " ", "", " ", file = zz, sep = "\n") cat("One more line\n", file = zz) close(zz)

Output to connections ## capture R output: use examples from help(lm) zz <- textConnection("ex.lm.out", "w") sink(zz) example(lm, prompt.prefix = "> ") sink() close(zz) ## now ‘ex.lm.out’ contains the output for futher processing. ## Look at it by, e.g., cat(ex.lm.out, sep = "\n")

Input from connections ## read in file created in last examples readLines("ex.data") unlink("ex.data") ## read listing of current directory (Unix) readLines(pipe("ls -1")) ## read listing of current directory (windows) readLines(pipe(“dir"))