Welcome to Math’s Tutorial Session-3 Data handling

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

CC SQL Utilities.
COMP 116: Introduction to Scientific Programming Lecture 37: Final Review.
Computer Science & Engineering 2111 Text Functions 1CSE 2111 Lecture-Text Functions.
Introduction to Powerschool and Excel Jared Schatz Staff Accountant (509)
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
Spreadsheets and Non- Spatial Databases Unit 4: Module 15, Lecture 2- Advanced Microsoft Excel.
C Lecture Notes 1 Program Control (Cont...). C Lecture Notes 2 4.8The do / while Repetition Structure The do / while repetition structure –Similar to.
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 11 1 Microsoft Office Excel 2003 Tutorial 11 – Importing Data Into Excel.
Tutorial 5 Creating Advanced Queries and Enhancing Table Design
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
Statistical Software An introduction to Statistics Using R Instructed by Jinzhu Jia.
3. Functions and Arguments. Writing in R is like writing in English Jump three times forward Action Modifiers.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.
Chapter 06: Lecture Notes (CSIT 104) 1 Copyright © 2008 Pearson Prentice Hall. All rights reserved. 1 1 Copyright © 2008 Prentice-Hall. All rights reserved.
AN ENGINEER’S GUIDE TO MATLAB
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
Analyzing Data For Effective Decision Making Chapter 3.
INSERT BOOK COVER 1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Excel 2010 by Robert Grauer, Keith.
CIS 338: Using Queries in Access as a RecordSource Dr. Ralph D. Westfall May, 2011.
Copyright © 2008 Pearson Prentice Hall. All rights reserved Chapter 6 Data Tables and Amortization Tables Exploring Microsoft Office Excel 2007.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
How can we optimize our working in Microsoft Excel?
Using Text Files in Excel File I/O Methods. Working With Text Files A file can be accessed in any of three ways: –Sequential access: By far the most common.
©Colin Jamison 2004 Shell scripting in Linux Colin Jamison.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Access Queries Agenda 6/16/14 Review Access Project Part 1, answer questions Discuss queries: Turning data stored in a database into information for decision.
Overview Excel is a spreadsheet, a grid made from columns and rows. It is a software program that can make number manipulation easy and somewhat painless.
1 Copyright © Oracle Corporation, All rights reserved. Writing Basic SQL SELECT Statements.
Bioinformatics for biologists
More Oracle SQL Scripts. Highlight (but don’t open) authors table, got o External data Excel, and make an external spreadsheet with the data.
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Arko Barman COSC 6335 Data Mining Fall  Free, open source statistical analysis software  Competitor to commercial softwares like MATLAB and SAS.
Block 1: Introduction to R
Formulas, Functions, and other Useful Features
Retrieving Data Using the SQL SELECT Statement
Miscellaneous Excel Combining Excel and Access.
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Arko Barman COSC 6335 Data Mining Fall 2014
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
DATA MANAGEMENT MODULE: Concatenating, Stacking and Merging
Arrays and files BIS1523 – Lecture 15.
Dynamic Input with SQL Queries
CQG XData Walkthrough.
Data Management Module: Concatenating, Stacking, Merging and Recoding
Uploading and handling databases
Intro to PHP & Variables
Working with Data in Windows
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
ECONOMETRICS ii – spring 2018
Exporting & Formatting Budgets from NextGen o Excel
Python I/O.
Sirena Hardy HRMS Trainer
T. Jumana Abu Shmais – AOU - Riyadh
Fundamentals of Data Structures
CSE 491/891 Lecture 21 (Pig).
CSCI N317 Computation for Scientific Applications Unit R
Lab 2 and Merging Data (with SQL)
Fundamentals of Functional Programming
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
MIS2502: Data Analytics Introduction to R and RStudio
Data analysis with R and the tidyverse
CIS 136 Building Mobile Apps
Presentation transcript:

Welcome to Math’s Tutorial Session-3 Data handling

R scripts #Write a R script to check whether a person is eligible to vote or not. num = as.integer(readline(prompt = "Enter a number: ")) if(num>=18) { print("eligible") } else{ print("not eligible") }

R scripts contd… num = as.integer(readline(prompt = "Enter a number: ")) if(num < 0) { print("Enter a positive number") } else { sum = 0 # use while loop to iterate until zero while(num > 0) { sum = sum + num num = num - 1 } print(paste("The sum is", sum))

Packages required Packages should be installed and then load packages using library function. install(rJava) install(XLSConnectjars) install(xls) Data.table

Types of files Text files. Excel files. CSV files.

How to Import file in RStudio Steps: Create file (text file, excel file, csv file,…) Importing into Rstudio: Click on Import Dataset in Environmet workspace Choose your saved file View it in top level workspace Command to run the file: source(“file_path”)

Work with Table File A data table can resides in a text file. The cells inside the table are separated by blank characters. Example: 100   a1   b1  200   a2   b2  Save with .txt extension. Then load the data into the workspace with the function read.table. Command: mydata = read.table(“file_path")   # read text file  mydata                              # print data frame

R – Excel File Microsoft Excel is the most widely used spreadsheet program which stores data in the .xls or .xlsx format. R can read directly from these files using some excel specific packages. Few such packages are - XLConnect, xlsx, gdata etc. We will be using xlsx package. R can also write into excel file using this package.

Work with excel File An excel file can be imported in R Studio. In import dataset, --select excel file from the browse option. --next import the file to R Studio. The excel file will be saved in variable. Type variable name to view the contents of Excel file. OR library(readxl) xyz <- read_excel("C:/Users/admin/Desktop/R/xyz.xlsx") view(xyz)

Reading from and writing to excel file # Reading and writing excel file it need the 3 packages. #First Install the Packages install.packages("rJava") install.packages("xlsxjars") install.packages("xlsx") #Load the packages into R library(rJava) library(xlsxjars) library(xlsx)

Reading the excel file #In path address replace backslashes with slashes. #If you have dates in dataset then set detectDates argument TRUE wipotrends <- read.xlsx("C:/Documents and Settings/Archana/Desktop/R Tutorial/wipotrends.xlsx", sheetIndex = 1, startRow = 1, endRow = 23, as.data.frame = TRUE, header=TRUE) #print the file wipotrends

Contd.. #If you wnat to start from 5th row then mention startRow=5 wipotrends <- read.xlsx("C:/Documents and Settings/Archana/Desktop/RTutorial/wipotrends.xlsx", sheetIndex = 1, startRow = 5, endRow = 23, as.data.frame = TRUE, header=TRUE) #pribt the file wipotrends

writing to excel file write.xlsx(wipotrends, "C:/Documents and Settings/Archana/Desktop/RTutorial/file1_new.xlsx ", sheetName="Sheet1", col.names = TRUE, row.names = TRUE, append = FALSE, showNA = TRUE) Check the path it automatically created the file1_new.xlsx file in your directory.

Work with .csv file The sample data can be in comma separated values (CSV) format. Each cell inside such data file is separated by comma. The first row of the data file should contain the column names instead of the actual data Example: Col1,Col2,Col3  100,a1,b1  200,a2,b2  Save with .csv extension Command: mydata = read.csv(“Filename")  # read csv file  Each cell inside such data file is separated by a special character, which usually is a comma, although other characters can be used as well.

Example id,name,salary,start_date,dept 1,Rick,623.3,2012-01-01,IT Mydata.csv id,name,salary,start_date,dept 1,Rick,623.3,2012-01-01,IT 2,Dan,515.2,2013-09-23,Operations 3,Michelle,611,2014-11-15,IT   4,Ryan,729,2014-05-11,HR ,Gary,843.25,2015-03-27,Finance 6,Nina,578,2013-05-21,IT 7,Simon,632.8,2013-07-30,Operations 8,Guru,722.5,2014-06-17,Finance Each cell inside such data file is separated by a special character, which usually is a comma, although other characters can be used as well.

Reading a CSV File read.csv() function to read a CSV file available in your current working directory: data <- read.csv(“mydata.csv") print(data)

Analyzing the CSV File By default the read.csv() function gives the output as a data frame. Also we can check the number of columns and rows. data <- read.csv(“mydata.csv") print(is.data.frame(data)) print(ncol(data)) print(nrow(data))

Contd.. Once we read data in a data frame, we can apply all the functions applicable to data frames. Get the maximum salary: # Create a data frame. data <- read.csv("mydata.csv") # Get the max salary from data frame. sal <- max(data$salary) print(sal)

Contd.. Get the details of the person with max salary We can fetch rows meeting specific filter criteria similar to a SQL where clause. # Create a data frame. data <- read.csv("mydata.csv") # Get the max salary from data frame. sal <- max(data$salary) # Get the person detail having max salary. retval <- subset(data, salary == max(salary)) print(retval)

Contd.. Get the people who joined on or after 2014. # Create a data frame. data <- read.csv("mydata.csv") retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01")) print(retval)

Writing into a CSV File R can create csv file form existing data frame. The write.csv() function is used to create the csv file. This file gets created in the working directory. # Create a data frame. data <- read.csv("mydata.csv") retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))

Contd.. # Write filtered data into a new file. write.csv(retval,"output.csv") newdata <- read.csv("output.csv") print(newdata) Here the column X comes from the data set newper. This can be dropped using additional parameters while writing the file.

Contd.. # Write filtered data into a new file. write.csv(retval,"output.csv", row.names=FALSE) newdata <- read.csv("output.csv") print(newdata)

To extract specific columns from csv dataFile = read.csv("filename.csv",header= TRUE);  #suppose u want col 1, col 3 col1 = 1;  col3 = 3;  modifiedDataFile1 = dataFile[,c(col1, col3)];  #write extracted data to file write.csv(modifiedDataFile1, file = "Myinfo.csv")

example vec_rev=c(100,20,500) vec_mar=vec_rev*0.02 vec_city=c("HUBLI ","DHARWAD",“BELGAUM") salesdf=data.frame(vec_rev,vec_mar,vec_city) write.csv(salesdf,"mydataframe.csv", row.names=FALSE) salesdf_2=read.csv("mydataframe.csv") salesdf_2

R – Data Reshaping Data Reshaping in R is about changing the way data is organized into rows and columns. Most of the time data processing in R is done by taking the input data as a data frame. It is easy to extract data from the rows and columns of a data frame but there are situations when we need the data frame in a format that is different from format in which we received it. R has many functions to split, merge and change the rows to columns and vice-versa in a data frame.

Contd.. We can join multiple vectors to create a data frame using the cbind() function. Also we can merge two data frames using rbind() function.

Rbind and cbind functions first_row <- c(1,2,3) second_row <- c(10,20,30) third_row <- c(100,200,300) fourth_row <- c(1000,1000,1000) tmp <- rbind(first_row, second_row, third_row, fourth_row) row_scores <- rowSums(tmp) scores <- cbind(tmp, row_scores) rownames(scores) <- c("row1", "row2", "row3", "row4") colnames(scores) <- c("c1", "c2", "c3", "total") scores

example #rbind rd1=data.frame(cI=c(1:6),product=c(rep("toster",3),rep("radio" ,3))) rd2=data.frame(cI=c(1:4),product=c(rep("TV",3),rep("Mobile", 1))) rd3=rbind(rd1,rd2) rd3 #cbind cd1=data.frame(cI=c(1:6),product=c(rep("toster",3),rep("radio" ,3))) cd2=data.frame(cI=c(rep("IND"))) cd3=cbind(cd1,cd2) cd3

data frames (inner, outer, left, right)

inner join rdf11=data.frame(cI=c(1:6),product=c(rep("toster",5),r ep("radio",1))) rdf22=data.frame(cI=c(2,4,6),state=c(rep("goa",2),rep(" karnatak",1))) rdf11 rdf22 merge(rdf11,rdf22,by="cI")

outer join rdf111=data.frame(cI=c(1:6),product=c(rep("toster", 3),rep("radio",3))) rdf222=data.frame(cI=c(1:6),state=c(rep("goa",3),re p("karnatak",3))) merge(x=rdf111,y=rdf222,by="cI",all=TRUE)

Data tables package called data.table that extends and enhances the functionality of data.frames. data.tables have an index like databases. This allows faster value accessing, group by operations and joins.

example theDF<-data.frame( A=1:10, B=letters[1:10], C=LETTERS[11:20], D=rep(c("one","two","three"),length.out=10)) theDF class(theDF$B) write.csv(theDF,"datatable2.csv",row.names=FALSE) theDT<-data.table( A=1:10, theDT class(theDT$B) write.csv(theDT,"datatable1.csv",row.names=FALSE)

R – Strings Any value written within a pair of single quote or double quotes in R is treated as a string. Internally R stores every string within double quotes, even when you create them with single quote. Examples of Valid Strings a <- 'Start and end with single quote' b <- "Start and end with double quotes" c <- "single quote ' in between double quotes" d <- 'Double quotes " in between single quote'

String Manipulation Concatenating Strings - paste() function Syntax: paste(..., sep = " ", collapse = NULL) ... represents any number of arguments to be combined. sep represents any separator between the arguments. It is optional. collapse is used to eliminate the space in between two strings. But not the space within two words of one string.

example a <- "Hello" b <- 'How' c <- "are you? " print(paste(a,b,c)) print(paste(a,b,c, sep = "-")) print(paste(a,b,c, sep = "", collapse = ""))

String manipulation paste("good","bad") paste(c("good","bad"),c("morning","evening")) paste(c("good","bad"),c("morning","evening"),sep= "/") paste("good",c("morning","evening"))

Counting number of characters in a string - nchar() function This function counts the number of characters including spaces in a string. Syntax : nchar(x) x is the vector input. Examle: result <- nchar("Count the number of characters") print(result)

Changing the case – toupper() & tolower() functions These functions change the case of characters of a string. Syntax : toupper(x) tolower(x) Example # Changing to Upper case. result <- toupper("Changing To Upper") print(result) # Changing to lower case. result <- tolower("CHANGING TO LOWER")

Extracting parts of a string - substring() function Syntax : substring(x,first,last) x is the character vector input. first is the position of the first character to be extracted. last is the position of the last character to be extracted. Example: # Extract characters from 5th to 7th position. result <- substring("Extract", 5, 7) print(result)