Welcome to Math’s Tutorial Session-3 Data handling
R scripts #Write a R script to check whether a person is eligible to vote or not. num = as.integer(readline(prompt = "Enter a number: ")) if(num>=18) { print("eligible") } else{ print("not eligible") }
R scripts contd… num = as.integer(readline(prompt = "Enter a number: ")) if(num < 0) { print("Enter a positive number") } else { sum = 0 # use while loop to iterate until zero while(num > 0) { sum = sum + num num = num - 1 } print(paste("The sum is", sum))
Packages required Packages should be installed and then load packages using library function. install(rJava) install(XLSConnectjars) install(xls) Data.table
Types of files Text files. Excel files. CSV files.
How to Import file in RStudio Steps: Create file (text file, excel file, csv file,…) Importing into Rstudio: Click on Import Dataset in Environmet workspace Choose your saved file View it in top level workspace Command to run the file: source(“file_path”)
Work with Table File A data table can resides in a text file. The cells inside the table are separated by blank characters. Example: 100 a1 b1 200 a2 b2 Save with .txt extension. Then load the data into the workspace with the function read.table. Command: mydata = read.table(“file_path") # read text file mydata # print data frame
R – Excel File Microsoft Excel is the most widely used spreadsheet program which stores data in the .xls or .xlsx format. R can read directly from these files using some excel specific packages. Few such packages are - XLConnect, xlsx, gdata etc. We will be using xlsx package. R can also write into excel file using this package.
Work with excel File An excel file can be imported in R Studio. In import dataset, --select excel file from the browse option. --next import the file to R Studio. The excel file will be saved in variable. Type variable name to view the contents of Excel file. OR library(readxl) xyz <- read_excel("C:/Users/admin/Desktop/R/xyz.xlsx") view(xyz)
Reading from and writing to excel file # Reading and writing excel file it need the 3 packages. #First Install the Packages install.packages("rJava") install.packages("xlsxjars") install.packages("xlsx") #Load the packages into R library(rJava) library(xlsxjars) library(xlsx)
Reading the excel file #In path address replace backslashes with slashes. #If you have dates in dataset then set detectDates argument TRUE wipotrends <- read.xlsx("C:/Documents and Settings/Archana/Desktop/R Tutorial/wipotrends.xlsx", sheetIndex = 1, startRow = 1, endRow = 23, as.data.frame = TRUE, header=TRUE) #print the file wipotrends
Contd.. #If you wnat to start from 5th row then mention startRow=5 wipotrends <- read.xlsx("C:/Documents and Settings/Archana/Desktop/RTutorial/wipotrends.xlsx", sheetIndex = 1, startRow = 5, endRow = 23, as.data.frame = TRUE, header=TRUE) #pribt the file wipotrends
writing to excel file write.xlsx(wipotrends, "C:/Documents and Settings/Archana/Desktop/RTutorial/file1_new.xlsx ", sheetName="Sheet1", col.names = TRUE, row.names = TRUE, append = FALSE, showNA = TRUE) Check the path it automatically created the file1_new.xlsx file in your directory.
Work with .csv file The sample data can be in comma separated values (CSV) format. Each cell inside such data file is separated by comma. The first row of the data file should contain the column names instead of the actual data Example: Col1,Col2,Col3 100,a1,b1 200,a2,b2 Save with .csv extension Command: mydata = read.csv(“Filename") # read csv file Each cell inside such data file is separated by a special character, which usually is a comma, although other characters can be used as well.
Example id,name,salary,start_date,dept 1,Rick,623.3,2012-01-01,IT Mydata.csv id,name,salary,start_date,dept 1,Rick,623.3,2012-01-01,IT 2,Dan,515.2,2013-09-23,Operations 3,Michelle,611,2014-11-15,IT 4,Ryan,729,2014-05-11,HR ,Gary,843.25,2015-03-27,Finance 6,Nina,578,2013-05-21,IT 7,Simon,632.8,2013-07-30,Operations 8,Guru,722.5,2014-06-17,Finance Each cell inside such data file is separated by a special character, which usually is a comma, although other characters can be used as well.
Reading a CSV File read.csv() function to read a CSV file available in your current working directory: data <- read.csv(“mydata.csv") print(data)
Analyzing the CSV File By default the read.csv() function gives the output as a data frame. Also we can check the number of columns and rows. data <- read.csv(“mydata.csv") print(is.data.frame(data)) print(ncol(data)) print(nrow(data))
Contd.. Once we read data in a data frame, we can apply all the functions applicable to data frames. Get the maximum salary: # Create a data frame. data <- read.csv("mydata.csv") # Get the max salary from data frame. sal <- max(data$salary) print(sal)
Contd.. Get the details of the person with max salary We can fetch rows meeting specific filter criteria similar to a SQL where clause. # Create a data frame. data <- read.csv("mydata.csv") # Get the max salary from data frame. sal <- max(data$salary) # Get the person detail having max salary. retval <- subset(data, salary == max(salary)) print(retval)
Contd.. Get the people who joined on or after 2014. # Create a data frame. data <- read.csv("mydata.csv") retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01")) print(retval)
Writing into a CSV File R can create csv file form existing data frame. The write.csv() function is used to create the csv file. This file gets created in the working directory. # Create a data frame. data <- read.csv("mydata.csv") retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))
Contd.. # Write filtered data into a new file. write.csv(retval,"output.csv") newdata <- read.csv("output.csv") print(newdata) Here the column X comes from the data set newper. This can be dropped using additional parameters while writing the file.
Contd.. # Write filtered data into a new file. write.csv(retval,"output.csv", row.names=FALSE) newdata <- read.csv("output.csv") print(newdata)
To extract specific columns from csv dataFile = read.csv("filename.csv",header= TRUE); #suppose u want col 1, col 3 col1 = 1; col3 = 3; modifiedDataFile1 = dataFile[,c(col1, col3)]; #write extracted data to file write.csv(modifiedDataFile1, file = "Myinfo.csv")
example vec_rev=c(100,20,500) vec_mar=vec_rev*0.02 vec_city=c("HUBLI ","DHARWAD",“BELGAUM") salesdf=data.frame(vec_rev,vec_mar,vec_city) write.csv(salesdf,"mydataframe.csv", row.names=FALSE) salesdf_2=read.csv("mydataframe.csv") salesdf_2
R – Data Reshaping Data Reshaping in R is about changing the way data is organized into rows and columns. Most of the time data processing in R is done by taking the input data as a data frame. It is easy to extract data from the rows and columns of a data frame but there are situations when we need the data frame in a format that is different from format in which we received it. R has many functions to split, merge and change the rows to columns and vice-versa in a data frame.
Contd.. We can join multiple vectors to create a data frame using the cbind() function. Also we can merge two data frames using rbind() function.
Rbind and cbind functions first_row <- c(1,2,3) second_row <- c(10,20,30) third_row <- c(100,200,300) fourth_row <- c(1000,1000,1000) tmp <- rbind(first_row, second_row, third_row, fourth_row) row_scores <- rowSums(tmp) scores <- cbind(tmp, row_scores) rownames(scores) <- c("row1", "row2", "row3", "row4") colnames(scores) <- c("c1", "c2", "c3", "total") scores
example #rbind rd1=data.frame(cI=c(1:6),product=c(rep("toster",3),rep("radio" ,3))) rd2=data.frame(cI=c(1:4),product=c(rep("TV",3),rep("Mobile", 1))) rd3=rbind(rd1,rd2) rd3 #cbind cd1=data.frame(cI=c(1:6),product=c(rep("toster",3),rep("radio" ,3))) cd2=data.frame(cI=c(rep("IND"))) cd3=cbind(cd1,cd2) cd3
data frames (inner, outer, left, right)
inner join rdf11=data.frame(cI=c(1:6),product=c(rep("toster",5),r ep("radio",1))) rdf22=data.frame(cI=c(2,4,6),state=c(rep("goa",2),rep(" karnatak",1))) rdf11 rdf22 merge(rdf11,rdf22,by="cI")
outer join rdf111=data.frame(cI=c(1:6),product=c(rep("toster", 3),rep("radio",3))) rdf222=data.frame(cI=c(1:6),state=c(rep("goa",3),re p("karnatak",3))) merge(x=rdf111,y=rdf222,by="cI",all=TRUE)
Data tables package called data.table that extends and enhances the functionality of data.frames. data.tables have an index like databases. This allows faster value accessing, group by operations and joins.
example theDF<-data.frame( A=1:10, B=letters[1:10], C=LETTERS[11:20], D=rep(c("one","two","three"),length.out=10)) theDF class(theDF$B) write.csv(theDF,"datatable2.csv",row.names=FALSE) theDT<-data.table( A=1:10, theDT class(theDT$B) write.csv(theDT,"datatable1.csv",row.names=FALSE)
R – Strings Any value written within a pair of single quote or double quotes in R is treated as a string. Internally R stores every string within double quotes, even when you create them with single quote. Examples of Valid Strings a <- 'Start and end with single quote' b <- "Start and end with double quotes" c <- "single quote ' in between double quotes" d <- 'Double quotes " in between single quote'
String Manipulation Concatenating Strings - paste() function Syntax: paste(..., sep = " ", collapse = NULL) ... represents any number of arguments to be combined. sep represents any separator between the arguments. It is optional. collapse is used to eliminate the space in between two strings. But not the space within two words of one string.
example a <- "Hello" b <- 'How' c <- "are you? " print(paste(a,b,c)) print(paste(a,b,c, sep = "-")) print(paste(a,b,c, sep = "", collapse = ""))
String manipulation paste("good","bad") paste(c("good","bad"),c("morning","evening")) paste(c("good","bad"),c("morning","evening"),sep= "/") paste("good",c("morning","evening"))
Counting number of characters in a string - nchar() function This function counts the number of characters including spaces in a string. Syntax : nchar(x) x is the vector input. Examle: result <- nchar("Count the number of characters") print(result)
Changing the case – toupper() & tolower() functions These functions change the case of characters of a string. Syntax : toupper(x) tolower(x) Example # Changing to Upper case. result <- toupper("Changing To Upper") print(result) # Changing to lower case. result <- tolower("CHANGING TO LOWER")
Extracting parts of a string - substring() function Syntax : substring(x,first,last) x is the character vector input. first is the position of the first character to be extracted. last is the position of the last character to be extracted. Example: # Extract characters from 5th to 7th position. result <- substring("Extract", 5, 7) print(result)