Download presentation
Presentation is loading. Please wait.
1
R Programming For Sql Developers ETL USING R
Kiran Math Consultant
2
Excel Data ETL Sql Server Table Motivation
3
Motivation
4
Motivation
5
DEMO MOTIVATion
6
Installation Comprehensive R Archive Network (CRAN)
R Studio Installation
7
R <- Core && R <-packages
ggPlot2 sqldf Base Packages rodbc dplyr stringR reshape2 tidyR lubridate R <- Core && R <-packages
8
zillow
9
Visualize Model Transform Get & Tidy Transform @hadleywickham
10
# Define a Variable a <- 25 # Call a Variable a ## [1] 25 # Do something to it a + 10 ## [1] 35
# Create a vector - Numeric x <- c(0.5, 0.6,0.7) ## call it x ## # Do something to the vector mean(x) ## [1] 0.6 Basics 1 - vector
11
Functions are blocks of code that allow R to be a modular and facilitate code reuse Funct_name <- function (arg1,arg2, ..){ ### do something } ## Compute the mean of the vector of numbers meanX <- function(a_vector) { s <- sum(a_vector) l <- length(a_vector) m <- s/l return(m) } ### create a vector v <- c(1,2,3,4,5) ### Find the mean meanX(v) ## [1] 3 Basics 2 - Functions
12
Data frame Variables To Preview the data frame head(dat) Tail(dat)
Observations dat A data frame is used for storing data tables. It is a list of vectors of equal length. To retrieve data in a cell, we would enter its row and column coordinates in the single square bracket "[]" operator. The two coordinates are separated by a comma. Number of Rows
13
R –Str() Compactly display the internal structure of an R object, a diagnostic function Str(object, ...) tDat If you need a quick overview of your dataset, use the R command str() and look at the structure. tells you something about the classes of your variables and the number of observations. dat$SaleDate <- as.Date(dat$SaleDate) Change the class of column SaleDate
14
R – Summary() summary(object)
distribution of your variables in the dataset tDat Numerical variables: summary() gives you the range, quartiles, median, and mean. Factor variables: summary() gives you a table with frequencies.
15
Reshaping Data - DPLYR Select Subset variables (Columns). tDat Dat
16
filter Data - DPLYR Filter()
allows you to select a subset of rows in a data frame.
17
piping- DPLYR %>% Passes object on LHS as first argument to function on RHS
18
Reshaping Data - tidyr Gather Gather columns into Rows
Spread ~ does the opposite tDat gDat
19
Make new variable (Column)
Mutate Compute and appends or or more new columns gDat
20
Reshaping Data - tidyr Separate Separate one column into several.
Spread ~ does the opposite gDat tDat
21
Visualize Model Transform Get & Tidy Transform @hadleywickham
22
Data Visualization – ggplot2
Based of Grammar of Graphics One can build every graph from same few components Data set Set of Geom – visual marks that represent the data Coordinate system
23
Data Visualization – ggplot2
To display data values, map the variables in the dataset to aesthetic properties geom color, size and x and y locations
24
Data Visualization – ggplot2
Qplot() Creates a complete plot with given data, geom and mapping. Supplies many useful defaults
25
Data Visualization – ggplot2
Add Layer elements with + Begin a plot that you can finish by adding layers to. No defaults but provides more control then qplot()
26
Data Visualization – ggplot2
Add Layer elements with + Begin a plot that you can finish by adding layers to. No defaults but provides more control then qplot()
27
Data Visualization – ggplot2
Lm() Begin a plot that you can finish by adding layers to. No defaults but provides more control then qplot()
28
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.