Download presentation
Presentation is loading. Please wait.
Published byFlora Blair Modified over 8 years ago
1
R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer Work @ : Proterra in Greenville SC kiranmath@outlook.com
2
MOTIVATION
3
GOAL Raw Sensor Data Tidy Data
4
ZILLOW
5
INSTALLATION Comprehensive R Archive Network (CRAN) https://www.cran.r-project.org/ R Studio https://www.rstudio.com/
6
R <- CORE && R <-PACKAGES ggPlot2 sqldf Base Packages rodbc dplyr stringR ggPlot2 reshape2 tidyR lubridate
7
BASICS 1 - VECTOR # Define a Variable a <- 25 # Call a Variable a ## [1] 25 # Do something to it a + 10 ## [1] 35 # Create a vector - Numeric x <- c(0.5, 0.6,0.7) ## call it x ## 0.5 0.6 0.7 # Do something to the vector mean(x) ## [1] 0.6
8
BASICS 2 - FUNCTIONS Functions are blocks of code that allow R to be a modular and facilitate code reuse Funct_name <- function ( arg1,arg2,..){ ### do something } ## Compute the mean of the vector of numbers meanX <- function(a_vector) { s <- sum(a_vector) l <- length(a_vector) m <- s/l return(m) } ### create a vector v <- c(1,2,3,4,5) ### Find the mean meanX(v) ## [1] 3
9
HOME SALE Question : I have a 3000 sql ft house and how much it will sale for?
10
Visualize Model Transform Get & Tidy Transform @ hadleywickham
11
GET DATA – FROM SQL SERVER
12
GET DATA – FROM CSV FILE
13
DATA FRAME dat[5,3] To Preview the data frame head(dat) Tail(dat) Variables Observations dat Number of Rows
14
R –STR() Str(object,...) dat$SaleDate <- as.Date(dat$SaleDate) Compactly display the internal str ucture of an R object, a diagnostic function Change the class of column SaleDate tDat
15
R – SUMMARY() summary(object) distribution of your variables in the dataset tDat
16
RESHAPING DATA - DPLYR Select Subset variables (Columns). tDat Dat
17
FILTER DATA - DPLYR Filter() allows you to select a subset of rows in a data frame.
18
PIPING- DPLYR %>% Passes object on LHS as first argument to function on RHS
19
RESHAPING DATA - TIDYR Gather Spread ~ does the opposite Gather columns into Rows gDat tDat
20
MAKE NEW VARIABLE (COLUMN) Mutate Compute and appends or or more new columns gDat
21
RESHAPING DATA - TIDYR Separate Spread ~ does the opposite Separate one column into several. gDat tDat
22
Visualize Model Transform Get & Tidy Transform @ hadleywickham
23
DATA VISUALIZATION – GGPLOT2 ggplot2 Based of Grammar of Graphics One can build every graph from same few components Data set Set of Geom – visual marks that represent the data Coordinate system
24
DATA VISUALIZATION – GGPLOT2 ggplot2 To display data values, map the variables in the dataset to aesthetic properties geom color, size and x and y locations
25
DATA VISUALIZATION – GGPLOT2 Qplot()
26
DATA VISUALIZATION – GGPLOT2 ggplot() Add Layer elements with +
27
DATA VISUALIZATION – GGPLOT2 ggplot() Add Layer elements with +
28
LINEAR REGRESSION MODEL
29
LEAST SQUARE METHOD R Function Lm()
30
MODEL - CORRELATION Cor() Is Area correlated to Sale Price? The value o/p is between 0 and 1
31
MODEL - PREDICTION
32
DATA VISUALIZATION – GGPLOT2 Lm()
33
HOME SALE Question : I have a 3000 sql ft house and how much it will sale for? Answer : $198,000
34
THANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.