Presentation is loading. Please wait.

Presentation is loading. Please wait.

R Programming For Sql Developers ETL USING R

Similar presentations


Presentation on theme: "R Programming For Sql Developers ETL USING R"— Presentation transcript:

1 R Programming For Sql Developers ETL USING R
Kiran Math Consultant

2 Excel Data ETL Sql Server Table Motivation

3 Motivation

4 Motivation

5 DEMO MOTIVATion

6 Installation Comprehensive R Archive Network (CRAN)
R Studio Installation

7 R <- Core && R <-packages
ggPlot2 sqldf Base Packages rodbc dplyr stringR reshape2 tidyR lubridate R <- Core && R <-packages

8 zillow

9 Visualize Model Transform Get & Tidy Transform @hadleywickham

10 # Define a Variable a <- 25 # Call a Variable a ## [1] 25 # Do something to it a + 10 ## [1] 35
# Create a vector - Numeric x <- c(0.5, 0.6,0.7) ## call it x ## # Do something to the vector mean(x) ## [1] 0.6 Basics 1 - vector

11 Functions are blocks of code that allow R to be a modular and facilitate code reuse Funct_name <- function (arg1,arg2, ..){ ### do something } ## Compute the mean of the vector of numbers meanX <- function(a_vector) { s <- sum(a_vector) l <- length(a_vector) m <- s/l return(m) } ### create a vector v <- c(1,2,3,4,5) ### Find the mean meanX(v) ## [1] 3 Basics 2 - Functions

12 Data frame Variables To Preview the data frame head(dat) Tail(dat)
Observations dat A data frame is used for storing data tables. It is a list of vectors of equal length. To retrieve data in a cell, we would enter its row and column coordinates in the single square bracket "[]" operator. The two coordinates are separated by a comma. Number of Rows

13 R –Str() Compactly display the internal structure of an R object, a diagnostic function Str(object, ...) tDat If you need a quick overview of your dataset, use the R command str() and look at the structure. tells you something about the classes of your variables and the number of observations. dat$SaleDate <- as.Date(dat$SaleDate) Change the class of column SaleDate

14 R – Summary() summary(object)
distribution of your variables in the dataset tDat Numerical variables: summary() gives you the range, quartiles, median, and mean. Factor variables: summary() gives you a table with frequencies.

15 Reshaping Data - DPLYR Select Subset variables (Columns). tDat Dat

16 filter Data - DPLYR Filter()
allows you to select a subset of rows in a data frame.

17 piping- DPLYR %>% Passes object on LHS as first argument to function on RHS

18 Reshaping Data - tidyr Gather Gather columns into Rows
Spread ~ does the opposite tDat gDat

19 Make new variable (Column)
Mutate Compute and appends or or more new columns gDat

20 Reshaping Data - tidyr Separate Separate one column into several.
Spread ~ does the opposite gDat tDat

21 Visualize Model Transform Get & Tidy Transform @hadleywickham

22 Data Visualization – ggplot2
Based of Grammar of Graphics One can build every graph from same few components Data set Set of Geom – visual marks that represent the data Coordinate system

23 Data Visualization – ggplot2
To display data values, map the variables in the dataset to aesthetic properties geom  color, size and x and y locations

24 Data Visualization – ggplot2
Qplot() Creates a complete plot with given data, geom and mapping. Supplies many useful defaults

25 Data Visualization – ggplot2
Add Layer elements with + Begin a plot that you can finish by adding layers to. No defaults but provides more control then qplot()

26 Data Visualization – ggplot2
Add Layer elements with + Begin a plot that you can finish by adding layers to. No defaults but provides more control then qplot()

27 Data Visualization – ggplot2
Lm() Begin a plot that you can finish by adding layers to. No defaults but provides more control then qplot()

28 Thank you


Download ppt "R Programming For Sql Developers ETL USING R"

Similar presentations


Ads by Google