Download presentation
Presentation is loading. Please wait.
Published byHubert Welch Modified over 6 years ago
1
ETL – Using R Kiran Math Developer Work @ : Flour in Greenville SC
2
Motivation
3
Tidy Data Raw Sensor Data GOAL
4
R <- Core && R <-packages
ggPlot2 sqldf Base Packages rodbc dplyr stringR reshape2 tidyR lubridate R <- Core && R <-packages
5
Home Sale price Question :
I have a 3000 square ft house located in zipcode How much it will sale for?
6
Visualize Model Transform Get & Tidy Transform @hadleywickham
7
Get Data – From CSV File
8
Data frame Variables Observations dat[5,3] dat
A data frame is used for storing data tables. It is a list of vectors of equal length. To retrieve data in a cell, we would enter its row and column coordinates in the single square bracket "[]" operator. The two coordinates are separated by a comma.
9
Str(Dat) If you need a quick overview of your dataset, use the R command str() and look at the structure. tells you something about the classes of your variables and the number of observations.
10
R – Summary() summary(object)
distribution of your variables in the dataset Numerical variables: summary() gives you the range, quartiles, median, and mean. Factor variables: summary() gives you a table with frequencies.
11
Passes object on LHS as first argument to function on RHS
SELECT - DPLYR
12
Visualize Model Transform Get & Tidy Transform @hadleywickham
13
Linear Regression model
14
Home Sale Question : I have a 3000 sql ft house and how much it will sale for? Answer : $198,000
15
DEMO – Housing Price
17
Motivation
18
Excel Data ETL Sql Server Table Motivation
19
Motivation
20
Gather Spread ~ does the opposite tDat gDat Gather columns into Rows
21
Mutate gDat Compute and appends or new columns
22
DEMO – Import Data into SQL SERVER
23
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.