Presentation is loading. Please wait.

Presentation is loading. Please wait.

R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC

Similar presentations


Presentation on theme: "R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC"— Presentation transcript:

1 R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer Work @ : Proterra in Greenville SC kiranmath@outlook.com

2 MOTIVATION

3 GOAL Raw Sensor Data Tidy Data

4 ZILLOW

5 Viz Model Transform Get & Tidy Transform @ hadleywickham

6 VASCO DA GAMA BRIDGE - LISBON IN PORTUGAL Question : What is the probability of having seventeen or more vehicles crossing the bridge in a particular minute?

7 Raw Data -------------------------------- ---- Data on Web CSV Format Processing Script -------------------------------------- ----- R Code Read CSV from the Web into R Tidy Data ---------------------------------- -- Packages used : TidyR Data Manipulation and Analysis ------------------------------------------- R Code Average Vehicles per min 12 Data Communication ---------------------------------------------------- Blog the probability of having seventeen or more Vehicles crossing the bridge in a particular minute is 10.1% Data Visualization --------------------------------- R Code ggplot2 baseplot Code Repository ------------------------ GitHub Data Model - Poisson distribution ---------------------------------------------------------------------------------------- ppois(16, lambda=12, lower=FALSE) # upper tail Answer : 0.10129

8 INSTALLATION Comprehensive R Archive Network (CRAN) https://www.cran.r-project.org/ R Studio https://www.rstudio.com/

9 ROBERT GENTLEMAN - ROSS IHAKA  University of Auckland

10 R <- CORE && R <-PACKAGES ggPlot2 sqldf Base Packages rodbc dplyr stringR ggPlot2 reshape2 tidyR lubridate

11 FEATURES OF R  Runs on almost any standard computing platform/OS (even on the PlayStation 3)  Frequent releases (annual + bug fix releases); active development.  Quite lean, as far as software goes; functionality is divided into modular packages  Graphics capabilities very sophisticated and better than most stat packages.  Very active and vibrant user community; R-help and R-devel mailing lists and Stack Overflow

12 DRAWBACKS OF R  Essentially based on 40 year old technology.  Objects must generally be stored in physical memory;

13 BASICS 1 - VECTOR # Define a Variable a <- 25 # Call a Variable a ## [1] 25 # Do something to it a + 10 ## [1] 35 # Create a vector - Numeric x <- c(0.5, 0.6,0.7) ## call it x ## 0.5 0.6 0.7 # Do something to the vector mean(x) ## [1] 0.6

14 BASICS 2 - MATRIX A matrix is a collection of data elements arranged in a two- dimensional rectangular layout. > A = matrix( c(1, 2, 3, 4, 5, 6), # the data elements nrow=2, # number of rows ncol=3, # number of columns byrow = TRUE) # fill matrix by rows > A # print the matrix [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6

15 BASICS 3 – CONTROL STRUCTURES #If Statements x <- 10 y 75) 'Pass' else 'Fail' ##Get the value of variable y ## [1] "Fail" ## For loops for (index in 1:3) { print(index) }

16 BASICS 4 - FUNCTIONS Functions are blocks of code that allow R to be a modular and facilitate code reuse Funct_name <- function ( arg1,arg2,..){ ### do something } ## Compute the mean of the vector of numbers meanX <- function(a_vector) { s <- sum(a_vector) l <- length(a_vector) m <- s/l return(m) } ### create a vector v <- c(1,2,3,4,5) ### Find the mean meanX(v) ## [1] 3

17 DATA FRAME A data frame is used for storing data tables. To retrieve data in a cell, we would enter its row and column coordinates in the single square bracket "[]" operator. mtcars[1, 2] [1] 6 mtcars["Mazda RX4", "cyl"] [1] 6 Preview data frame  head(mtcars)  tail(mtcars)  View(mtcars)

18 BASICS 6 - PLOTS # Make a very simple plot # Define Vectors x <- c(1,3,6,9,12) y <- c(1.5,2,7,8,15) plot (x,y, xlab="x axis", ylab="y axis", main="my plot", ylim=c(0,20), xlim=c(0,20), pch=15, col="blue") # add some more points to the graph x2 <- c(0.5, 3, 5, 8, 12) y2 <- c(0.8, 1, 2, 4, 6) points (x2, y2, pch=16, col="green")

19 HOME SALE I have home sales data in the neighborhood, in sql server database. Question : I have a 3000 sql ft house and how much it will sale for?

20 REGRESSION MODEL

21 Demo : Predict sale price of the house that is 3000 sq ft

22 MANAGING DATA FRAMES WITH DPLYR The dplyr package provides simple functions that can be chained together to easily and quickly manipulate data install.packages ("dplyr") library (dplyr) Verbs 1. filter – select a subset of the rows of a data frame 2. arrange – works similarly to filter, except that instead of filtering or selecting rows, it reorders them 3. select – select columns of a data frame 4. mutate – add new columns to a data frame that are functions of existing columns 5. summarize – summarize values 6. group_by – describe how to break a data frame into groups of rows

23 DEMO : DPLYR

24 VISUALIZING DATA FRAMES WITH GGPLOT2 Grammer of Graphics The ggplot2 package provides two workhouse function for plotting 1. qplot() 2. ggplot() install.packages (“ggplot2") library (ggplot2) Building Blocks 1. Data Frame 2. Aesthetics – how data is mapped to color and size ~ aes() 3. Geoms – Geometric objects to be drawn, such as points, lines, bars, polygons and text. 4. Facets – Panels used in conditional Plot 5. Stats – statistical transformation ~ binning, quantiles, smoothing 6. Scales – coding that aesthetic map uses like male = blue and female = red 7. Co-ordinate System

25 DEMO : GGPLOT2

26 THANK YOU


Download ppt "R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC"

Similar presentations


Ads by Google