R Data Import/Export Dr. Jieh-Shan George YEH

Slides:



Advertisements
Similar presentations
Google Refine Tutorial April, Sathishwaran.R - 10BM60079 Vijaya Prabhu - 10BM60097 Vinod Gupta School of Management, IIT Kharagpur This Tutorial.
Advertisements

Baburao Kamble (Ph.D) University of Nebraska-Lincoln Data Analysis Using R Week3: Data Input/Output (Import/Export) in R.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Data in R. General form of data ID numberSexWeightLengthDiseased… 112m … 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX!
S Programming in R Bill Venables CSIRO Mathematics and Information Sciences Auckland, 7 July 2006.
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.
Computing for Data Analysis R statistics programming environment Ming Ni 11/14/2014.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
Basic Data Input. To get started, you can give students binary data already in the R format. – save() one or more R objects to a file (with.rda extension)
Evan Girvetz Winkenwerder Intro to R Programming: Lecture 2 © R Foundation, from
Statistical Software An introduction to Statistics Using R Instructed by Jinzhu Jia.
3. Functions and Arguments. Writing in R is like writing in English Jump three times forward Action Modifiers.
R Data Import/Export Dr. Jieh-Shan George YEH
First Screen : First window form will always remain open, for the user to select menu options. 1.
An introduction to R: get familiar with R Guangxu Liu Bio7932.
Introduction to Dror Hollander Gil Ast Lab Sackler Medical School
The Original and Current Basic R “Console” command line interface….
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Programming in R Getting data into R. Importing data into R In this session we will learn: Some basic R commands How to enter data directly into R How.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Sébastien Lê Agrocampus Rennes A very short introduction to “R” The “Rcmdr” package and its environment.
Data Objects in R Vector1 dimensionAll elements have the same data types Data types: numeric, character logic, factor Matrix2 dimensions Array2 or more.
The WinMine Toolkit Max Chickering. Build Statistical Models From Data Dependency Networks Bayesian Networks Local Distributions –Trees Multinomial /
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Systems Software Operating Systems. What is software? Software is the term that we use for all the programs and data that we use with a computer system.
R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task.
R-Studio and Revolution Analytics have built additional functionality on top of base R.
R Programming Yang, Yufei. Normal distribution.
Outline Comparison of Excel and R R Coding Example – RStudio Environment – Getting Help – Enter Data – Calculate Mean – Basic Plots – Save a Coding Script.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
File Input and Output July 2nd, Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)
WDO-It! 102 Workshop: Using an abstraction of a process to capture provenance UTEP’s Trust Laboratory NDR HP MP.
Performing statistical analyses using the Rshell processor Original material by Peter Li, University of Birmingham, UK Adapted by Norman.
Introduction to R Carol Bult The Jackson Laboratory Functional Genomics (BMB550) Spring 2011.
Agenda Positional Parameters / Continued... Command Substitution Bourne Shell / Bash Shell / Korn Shell Mathematical Expressions Bourne Shell / Bash Shell.
Intro to SPIM Justin Fiore Nathan Parish. Installing SPIM on Windows Download pcspim.zip from the SPIM website:
Bioinformatics for biologists
Actor Heights 1)Create Vectors of Actor Names, Heights, Date of Birth, Gender 2) Combine the 4 Vectors into a DataFrame.
Introduction to Programming on MATLAB Ecological Modeling Course Sep 11th, 2006.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Introduction to R August 2016.
G-scan PC Utility Viewer Manual
R basics workshop Sohee Kang Math and Stats Learning Centre
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
Welcome to Math’s Tutorial Session-3 Data handling
Prepare data for importing
Bulk Loading Documents* into Windchill
Uploading and handling databases
R basics workshop Sohee Kang Math and Stats Learning Centre
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
Introduction to javadoc
Advanced Data Import & Export Jeff Henrikson
Reading a CSV file in R.
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Code is on the Website Outline Comparison of Excel and R
Devtools and package building Phuse Non-clinical Scripts
funCTIONs and Data Import/Export
Basics of R, Ch Functions Help Managing your Objects
MIS2502: Data Analytics Introduction to R and RStudio
CSCI N207 Data Analysis Using Spreadsheet
Testthat package testing Phuse Non-clinical Scripts
An Introduction to R Rob Sippel September 2018.
Creating a dataset in R Instructor: Li, Han
Presentation transcript:

R Data Import/Export Dr. Jieh-Shan George YEH

Save and Load R Data Data in R can be saved as.Rdata files with function save(). getwd() setwd("c:\\temp") a <- 1:10 save(a, file="dumData.Rdata") rm(a) load("dumData.Rdata") print(a)

Scan() - Read data into a vector or list from the console or file cat(" ", " ", file="ex1.data", sep="\n") scan(file="ex1.data", what=list(x=0, y="", z=0), flush=TRUE) cat("TITLE extra line", " ", " ", file = "ex2.data", sep = "\n") pp <- scan("ex2.data", skip = 1, quiet = TRUE) scan("ex2.data", skip = 1) scan("ex2.data", skip = 1, nlines = 1) # only 1 line after the skipped one pp2 read "7" pp3<-scan("ex2.data", what = list("","",""), flush = TRUE) unlink("ex2.data") # unlink deletes the file

Import from and Export to.CSV Files Create a dataframe df1 and save it as a.CSV le with write.csv(). The dataframe is loaded from file to df2 with read.csv() var1 <- 1:5 var2 <- (1:5) / 10 var3 <- c("R", "and", "Data Mining", "Examples", "Case Studies") df1 <- data.frame(var1, var2, var3) names(df1) <- c("VariableInt", "VariableReal", "VariableChar") write.csv(df1, "dummmyData.csv", row.names = FALSE) df2 <- read.csv("dummmyData.csv") print(df2)

read.table() - Reads a file in table format and creates a data frame from it Usage: read.table(file, header = FALSE, sep = "", row.names, col.names, nrows = -1, skip = 0) Example: HousePrice <- read.table("houses.data") HousePrice <- read.table("houses.data", header=TRUE)

Running Time Comparison (1/3) One common use of scan is to read in a large matrix. Suppose file matrix.dat just contains the numbers for a 200 x 2000 matrix. A <- matrix(scan("matrix.dat", n = 200*2000), 200, 2000, byrow = TRUE) On one test this took 1 second (under Linux, 3 seconds under Windows on the same machine) Whereas A <- as.matrix(read.table("matrix.dat")) took 10 seconds (and more memory), and A <- as.matrix(read.table("matrix.dat", header = FALSE, nrows = 200,comment.char = "", colClasses = "numeric")) took 7 seconds.

Running Time Comparison (2/3) Note that timings can depend on the type read and the data. writeLines(as.character((1+1e6):2e6), "ints.dat") xi <- scan("ints.dat", what=integer(0), n=1e6) # 0.77s xn <- scan("ints.dat", what=numeric(0), n=1e6) # 0.93s xc <- scan("ints.dat", what=character(0), n=1e6) # 0.85s xf <- as.factor(xc) # 2.2s DF <- read.table("ints.dat") # 4.5s

Running Time Comparison (3/3) code <- c("LMH", "SJC", "CHCH", "SPC", "SOM") writeLines(sample(code, 1e6, replace=TRUE), "code.dat") y <- scan("code.dat", what=character(0), n=1e6) # 0.44s yf <- as.factor(y) # 0.21s DF <- read.table("code.dat") # 4.9s

PACKAGE ‘XLSX’

Package ‘xlsx’ install.packages("xlsx") require(xlsx) # example of reading xlsx sheets file <- system.file("tests", "test_import.xlsx", package = "xlsx") res <- read.xlsx(file, 2) # read the second sheet # example of writing xlsx sheets file <- paste(tempfile(), "xlsx", sep=".") write.xlsx(USArrests, file=file) #This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in Also given is the percent of the population living in urban areas. res <- read.xlsx("mydata.xlsx", 1, encoding="utf-8") # read the sheet1

Output to connections zz <- file("ex.data", "w") # open an output file connection cat("TITLE extra line", " ", "", " ", file = zz, sep = "\n") cat("One more line\n", file = zz) close(zz)

Output to connections ## capture R output: use examples from help(lm) zz <- textConnection("ex.lm.out", "w") sink(zz) example(lm, prompt.prefix = "> ") sink() close(zz) ## now ‘ex.lm.out’ contains the output for futher processing. ## Look at it by, e.g., cat(ex.lm.out, sep = "\n")

Input from connections ## read in file created in last examples readLines("ex.data") unlink("ex.data") ## read listing of current directory (Unix) readLines(pipe("ls -1")) ## read listing of current directory (windows) readLines(pipe(“dir"))

PACKAGE ‘XML’

Parsing XML library(XML) u<- " xml_data <- xmlToList(u) # Convert an XML node/document to a more R-like list xml_data class(xml_data) xml_data[["CD"]][["TITLE"]] library(plyr) df<-ldply(xml_data, data.frame) # Split list to data frame

PACKAGE ‘JSONLITE’

Parsing JSON library(jsonlite) jsoncars <- toJSON(mtcars) #Convert R objects to JSON mtcars2 <- fromJSON(jsoncars) #Convert R objects from JSON All.equal(mtcars, mtcars2) Reference: project.org/web/packages/jsonlite/vignettes/json- aaquickstart.htmlhttps://cran.r- project.org/web/packages/jsonlite/vignettes/json- aaquickstart.html

BUILT IN DATASETS

Accessing built in datasets Around 100 datasets are supplied with R (in package datasets) data() data(infert) To access data from a particular package, use the package argument data(package="rpart") data(Puromycin, package="datasets")

Editing data This is useful for making small changes once a data set has been read. The command data(car90, package="rpart") xnew <- edit(car90) If you want to alter the original dataset xold, the simplest way is to use fix(xold), which is equivalent to xold <- edit(xold). to enter new data via the spreadsheet interface. xnew <- edit(data.frame())

OPEN DATA ONLINE undata, Data.gov, European Union Open Data Portal,