Download presentation
Presentation is loading. Please wait.
Published byReynold Mathews Modified over 8 years ago
1
R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer Work @ : Proterra in Greenville SC kiranmath@outlook.com
2
MOTIVATION
3
GOAL Raw Sensor Data Tidy Data
4
ZILLOW
5
Viz Model Transform Get & Tidy Transform @ hadleywickham
6
VASCO DA GAMA BRIDGE - LISBON IN PORTUGAL Question : What is the probability of having seventeen or more vehicles crossing the bridge in a particular minute?
7
Raw Data -------------------------------- ---- Data on Web CSV Format Processing Script -------------------------------------- ----- R Code Read CSV from the Web into R Tidy Data ---------------------------------- -- Packages used : TidyR Data Manipulation and Analysis ------------------------------------------- R Code Average Vehicles per min 12 Data Communication ---------------------------------------------------- Blog the probability of having seventeen or more Vehicles crossing the bridge in a particular minute is 10.1% Data Visualization --------------------------------- R Code ggplot2 baseplot Code Repository ------------------------ GitHub Data Model - Poisson distribution ---------------------------------------------------------------------------------------- ppois(16, lambda=12, lower=FALSE) # upper tail Answer : 0.10129
8
INSTALLATION Comprehensive R Archive Network (CRAN) https://www.cran.r-project.org/ R Studio https://www.rstudio.com/
9
ROBERT GENTLEMAN - ROSS IHAKA University of Auckland
10
R <- CORE && R <-PACKAGES ggPlot2 sqldf Base Packages rodbc dplyr stringR ggPlot2 reshape2 tidyR lubridate
11
FEATURES OF R Runs on almost any standard computing platform/OS (even on the PlayStation 3) Frequent releases (annual + bug fix releases); active development. Quite lean, as far as software goes; functionality is divided into modular packages Graphics capabilities very sophisticated and better than most stat packages. Very active and vibrant user community; R-help and R-devel mailing lists and Stack Overflow
12
DRAWBACKS OF R Essentially based on 40 year old technology. Objects must generally be stored in physical memory;
13
BASICS 1 - VECTOR # Define a Variable a <- 25 # Call a Variable a ## [1] 25 # Do something to it a + 10 ## [1] 35 # Create a vector - Numeric x <- c(0.5, 0.6,0.7) ## call it x ## 0.5 0.6 0.7 # Do something to the vector mean(x) ## [1] 0.6
14
BASICS 2 - MATRIX A matrix is a collection of data elements arranged in a two- dimensional rectangular layout. > A = matrix( c(1, 2, 3, 4, 5, 6), # the data elements nrow=2, # number of rows ncol=3, # number of columns byrow = TRUE) # fill matrix by rows > A # print the matrix [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6
15
BASICS 3 – CONTROL STRUCTURES #If Statements x <- 10 y 75) 'Pass' else 'Fail' ##Get the value of variable y ## [1] "Fail" ## For loops for (index in 1:3) { print(index) }
16
BASICS 4 - FUNCTIONS Functions are blocks of code that allow R to be a modular and facilitate code reuse Funct_name <- function ( arg1,arg2,..){ ### do something } ## Compute the mean of the vector of numbers meanX <- function(a_vector) { s <- sum(a_vector) l <- length(a_vector) m <- s/l return(m) } ### create a vector v <- c(1,2,3,4,5) ### Find the mean meanX(v) ## [1] 3
17
DATA FRAME A data frame is used for storing data tables. To retrieve data in a cell, we would enter its row and column coordinates in the single square bracket "[]" operator. mtcars[1, 2] [1] 6 mtcars["Mazda RX4", "cyl"] [1] 6 Preview data frame head(mtcars) tail(mtcars) View(mtcars)
18
BASICS 6 - PLOTS # Make a very simple plot # Define Vectors x <- c(1,3,6,9,12) y <- c(1.5,2,7,8,15) plot (x,y, xlab="x axis", ylab="y axis", main="my plot", ylim=c(0,20), xlim=c(0,20), pch=15, col="blue") # add some more points to the graph x2 <- c(0.5, 3, 5, 8, 12) y2 <- c(0.8, 1, 2, 4, 6) points (x2, y2, pch=16, col="green")
19
HOME SALE I have home sales data in the neighborhood, in sql server database. Question : I have a 3000 sql ft house and how much it will sale for?
20
REGRESSION MODEL
21
Demo : Predict sale price of the house that is 3000 sq ft
22
MANAGING DATA FRAMES WITH DPLYR The dplyr package provides simple functions that can be chained together to easily and quickly manipulate data install.packages ("dplyr") library (dplyr) Verbs 1. filter – select a subset of the rows of a data frame 2. arrange – works similarly to filter, except that instead of filtering or selecting rows, it reorders them 3. select – select columns of a data frame 4. mutate – add new columns to a data frame that are functions of existing columns 5. summarize – summarize values 6. group_by – describe how to break a data frame into groups of rows
23
DEMO : DPLYR
24
VISUALIZING DATA FRAMES WITH GGPLOT2 Grammer of Graphics The ggplot2 package provides two workhouse function for plotting 1. qplot() 2. ggplot() install.packages (“ggplot2") library (ggplot2) Building Blocks 1. Data Frame 2. Aesthetics – how data is mapped to color and size ~ aes() 3. Geoms – Geometric objects to be drawn, such as points, lines, bars, polygons and text. 4. Facets – Panels used in conditional Plot 5. Stats – statistical transformation ~ binning, quantiles, smoothing 6. Scales – coding that aesthetic map uses like male = blue and female = red 7. Co-ordinate System
25
DEMO : GGPLOT2
26
THANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.