R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

R for Macroecology Aarhus University, Spring 2011.
PRE-SCHOOL QUANT WORKSHOP II R THROUGH EXCEL. NEW YORK TIMES INFOGRAPHICS GALARY The Jobless Rate for People Like You Home Prices in Selected Cities For.
Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department.
RAPTOR Syntax and Semantics By Lt Col Schorsch
Spreadsheets A spreadsheet package is a general purpose computer package that is designed to perform calculations. A spreadsheet is a table which is divided.
HOW MANY HAVE USED QUALTRICS? WHO WHAT WHEN WHERE WHY.
Plotting with ggplot2: Part 1
Computing for Data Analysis R statistics programming environment Ming Ni 11/14/2014.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Introduction to GIS Ghassan Mikati, Ph.D GIS Expert.
MR2300: MARKETING RESEARCH PAUL TILLEY Unit 10: Basic Data Analysis.
CS1100: Computer Science and Its Applications Creating Graphs and Charts in Excel.
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
What is R Muhammad Omer. What is R  R is the programing language software for statistical computing and data analysis  The R language is extensively.
Python quick start guide
ACCB 133 Information Technology and Accounting Applications Lecture 6: Application Software.
File Types, MS Word, and MS Excel
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Introduction to MATLAB adapted from Dr. Rolf Lakaemper.
ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
GCSE Information and Communications Technology. Assessment The course is split into 60% coursework and 40% exam You will produce coursework in year 10.
TheDataWeb & DataFerrett Rebecca Blash Bill Hazard The DataWeb Applications Branch U.S. Census Bureau.
Descriptive Statistics
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
Lesson 1: Exploring Access Learning Objectives After studying this lesson, you will be able to: Start Access and identify elements of the application.
Introduction to MATLAB adapted from Dr. Rolf Lakaemper.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
An Introduction to R Statistical Computing AMS 597 Stony Brook University Spring 2009 By Tianyi Zhang.
Ggplot2 A cool way for creating plots in R Maria Novosolov.
+ Part I. + R Developed by Ross Ihaka and Robert Gentleman at the University of Auckland, NZ Open source software environment for statistical computing.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
Blog: R YOU READY FOR.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Scripting Just Enough SSIS to be Dangerous. 6/13/2015 Visit the Sponsor tables to enter their end of day raffles. Turn in your completed Event Evaluation.
Frequency Distributions Chapter 2. Descriptive Statistics Distributions are part of descriptive statistics…we are learning how to describe some data by.
Pinellas County Schools
R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Statistical Programming Using the R Language Lecture 2 Basic Concepts II Darren J. Fitzpatrick, Ph.D April 2016.
Web Database Programming Using PHP
Overview of R and ggplot2 for graphics
Programming in R Intro, data and programming structures
Data Virtualization Tutorial: Introduction to SQL Script
Bridging the Data Science and SQL Divide for Practitioners
Digital Text and Data Processing
Introduction to R.
Web Database Programming Using PHP
Lecture 2: Programming in R
ggplot2 Merrill Rudd TAs: Brooke Davis and Megsie Siple
TU170 Learning online and computing with confidence
Introduction to R Programming with AzureML
Next Generation R tidyr, dplyr, ggplot2
R Programming Language
Chapter 4 Application Software
ETL – Using R Kiran Math Developer : Flour in Greenville SC
An introduction to data analysis using R
R Programming For Sql Developers ETL USING R
Communication and Coding Theory Lab(CS491)
Spreadsheets, Modelling & Databases
Overview of R and ggplot2 for graphics
Lecture 7 – Delivering Results with R
Project 4 Creating an Image Map.
Predictive Models with SQL Server Machine Learning Services
R course 6th lecture.
EET 2259 Unit 9 Arrays Read Bishop, Sections 6.1 to 6.3.
Key Concepts R for Data Science.
> Introduction to Nelson Rios, Tulane University
Spark with R Martijn Tennekes
Presentation transcript:

R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC

MOTIVATION

GOAL Raw Sensor Data Tidy Data

ZILLOW

Viz Model Transform Get & Tidy hadleywickham

VASCO DA GAMA BRIDGE - LISBON IN PORTUGAL Question : What is the probability of having seventeen or more vehicles crossing the bridge in a particular minute?

Raw Data Data on Web CSV Format Processing Script R Code Read CSV from the Web into R Tidy Data Packages used : TidyR Data Manipulation and Analysis R Code Average Vehicles per min 12 Data Communication Blog the probability of having seventeen or more Vehicles crossing the bridge in a particular minute is 10.1% Data Visualization R Code ggplot2 baseplot Code Repository GitHub Data Model - Poisson distribution ppois(16, lambda=12, lower=FALSE) # upper tail Answer :

INSTALLATION Comprehensive R Archive Network (CRAN) R Studio

ROBERT GENTLEMAN - ROSS IHAKA  University of Auckland

R <- CORE && R <-PACKAGES ggPlot2 sqldf Base Packages rodbc dplyr stringR ggPlot2 reshape2 tidyR lubridate

FEATURES OF R  Runs on almost any standard computing platform/OS (even on the PlayStation 3)  Frequent releases (annual + bug fix releases); active development.  Quite lean, as far as software goes; functionality is divided into modular packages  Graphics capabilities very sophisticated and better than most stat packages.  Very active and vibrant user community; R-help and R-devel mailing lists and Stack Overflow

DRAWBACKS OF R  Essentially based on 40 year old technology.  Objects must generally be stored in physical memory;

BASICS 1 - VECTOR # Define a Variable a <- 25 # Call a Variable a ## [1] 25 # Do something to it a + 10 ## [1] 35 # Create a vector - Numeric x <- c(0.5, 0.6,0.7) ## call it x ## # Do something to the vector mean(x) ## [1] 0.6

BASICS 2 - MATRIX A matrix is a collection of data elements arranged in a two- dimensional rectangular layout. > A = matrix( c(1, 2, 3, 4, 5, 6), # the data elements nrow=2, # number of rows ncol=3, # number of columns byrow = TRUE) # fill matrix by rows > A # print the matrix [,1] [,2] [,3] [1,] [2,] 4 5 6

BASICS 3 – CONTROL STRUCTURES #If Statements x <- 10 y 75) 'Pass' else 'Fail' ##Get the value of variable y ## [1] "Fail" ## For loops for (index in 1:3) { print(index) }

BASICS 4 - FUNCTIONS Functions are blocks of code that allow R to be a modular and facilitate code reuse Funct_name <- function ( arg1,arg2,..){ ### do something } ## Compute the mean of the vector of numbers meanX <- function(a_vector) { s <- sum(a_vector) l <- length(a_vector) m <- s/l return(m) } ### create a vector v <- c(1,2,3,4,5) ### Find the mean meanX(v) ## [1] 3

DATA FRAME A data frame is used for storing data tables. To retrieve data in a cell, we would enter its row and column coordinates in the single square bracket "[]" operator. mtcars[1, 2] [1] 6 mtcars["Mazda RX4", "cyl"] [1] 6 Preview data frame  head(mtcars)  tail(mtcars)  View(mtcars)

BASICS 6 - PLOTS # Make a very simple plot # Define Vectors x <- c(1,3,6,9,12) y <- c(1.5,2,7,8,15) plot (x,y, xlab="x axis", ylab="y axis", main="my plot", ylim=c(0,20), xlim=c(0,20), pch=15, col="blue") # add some more points to the graph x2 <- c(0.5, 3, 5, 8, 12) y2 <- c(0.8, 1, 2, 4, 6) points (x2, y2, pch=16, col="green")

HOME SALE I have home sales data in the neighborhood, in sql server database. Question : I have a 3000 sql ft house and how much it will sale for?

REGRESSION MODEL

Demo : Predict sale price of the house that is 3000 sq ft

MANAGING DATA FRAMES WITH DPLYR The dplyr package provides simple functions that can be chained together to easily and quickly manipulate data install.packages ("dplyr") library (dplyr) Verbs 1. filter – select a subset of the rows of a data frame 2. arrange – works similarly to filter, except that instead of filtering or selecting rows, it reorders them 3. select – select columns of a data frame 4. mutate – add new columns to a data frame that are functions of existing columns 5. summarize – summarize values 6. group_by – describe how to break a data frame into groups of rows

DEMO : DPLYR

VISUALIZING DATA FRAMES WITH GGPLOT2 Grammer of Graphics The ggplot2 package provides two workhouse function for plotting 1. qplot() 2. ggplot() install.packages (“ggplot2") library (ggplot2) Building Blocks 1. Data Frame 2. Aesthetics – how data is mapped to color and size ~ aes() 3. Geoms – Geometric objects to be drawn, such as points, lines, bars, polygons and text. 4. Facets – Panels used in conditional Plot 5. Stats – statistical transformation ~ binning, quantiles, smoothing 6. Scales – coding that aesthetic map uses like male = blue and female = red 7. Co-ordinate System

DEMO : GGPLOT2

THANK YOU