ETL – Using R Kiran Math Developer Work @ : Flour in Greenville SC kiranmath@outlook.com
Motivation
Tidy Data Raw Sensor Data GOAL
R <- Core && R <-packages ggPlot2 sqldf Base Packages rodbc dplyr stringR reshape2 tidyR lubridate R <- Core && R <-packages
Home Sale price Question : I have a 3000 square ft house located in zipcode 29615. How much it will sale for?
Visualize Model Transform Get & Tidy Transform @hadleywickham
Get Data – From CSV File
Data frame Variables Observations dat[5,3] dat A data frame is used for storing data tables. It is a list of vectors of equal length. To retrieve data in a cell, we would enter its row and column coordinates in the single square bracket "[]" operator. The two coordinates are separated by a comma.
Str(Dat) If you need a quick overview of your dataset, use the R command str() and look at the structure. tells you something about the classes of your variables and the number of observations.
R – Summary() summary(object) distribution of your variables in the dataset Numerical variables: summary() gives you the range, quartiles, median, and mean. Factor variables: summary() gives you a table with frequencies.
Passes object on LHS as first argument to function on RHS SELECT - DPLYR
Visualize Model Transform Get & Tidy Transform @hadleywickham
Linear Regression model
Home Sale Question : I have a 3000 sql ft house and how much it will sale for? Answer : $198,000
DEMO – Housing Price
Motivation
Excel Data ETL Sql Server Table Motivation
Motivation
Gather Spread ~ does the opposite tDat gDat Gather columns into Rows
Mutate gDat Compute and appends or new columns
DEMO – Import Data into SQL SERVER
Thank you