R Programming For Sql Developers ETL USING R

Slides:



Advertisements
Similar presentations
Plotting with ggplot2: Part 1
Advertisements

Section 1.6 Frequency Distributions and Histograms.
Excel Charts – Basic Skills Creating Charts in Excel.
Maths for Computer Graphics
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Introduction to R: The Basics Rosales de Veliz L., David S.L., McElhiney D., Price E., & Brooks G. Contributions from Ragan. M., Terzi. F., & Smith. E.
Pasewark & Pasewark 1 Access Lesson 6 Integrating Access Microsoft Office 2007: Introductory.
STATISTICS Microsoft Excel “Frequency Distribution”
Are You Smarter Than a 5 th Grader?. 1,000,000 5th Grade Topic 15th Grade Topic 24th Grade Topic 34th Grade Topic 43rd Grade Topic 53rd Grade Topic 62nd.
I❤RI❤R Kin Wong (Sam) Game Plan Intro R Import SPSS file Descriptive Statistics Inferential Statistics GraphsQ&A.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
Exploratory Data Analysis Exploratory Data Analysis Dr.Lutz Hamel Dr.Joan Peckham Venkat Surapaneni.
Ggplot2 A cool way for creating plots in R Maria Novosolov.
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
1 An Introduction to R © 2009 Dan Nettleton. 2 Preliminaries Throughout these slides, red text indicates text that is typed at the R prompt or text that.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC
Blog: R YOU READY FOR.
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
An Introduction to Programming in Matlab Emily Blumenthal
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Introduction to R and Data Science Tools in the Microsoft Stack
Tidy data, wrangling, and pipelines in R
Introduction to R and Data Science Tools in the Microsoft Stack
EMPA Statistical Analysis
Data Visualization The commonality between science and art is in trying to see profoundly - to develop strategies of seeing and showing Edward Tufte.
Overview of R and ggplot2 for graphics
DATA MANAGEMENT MODULE: USING SQL in R
Programming in R Intro, data and programming structures
Using R Graphs in R.
Exploring, Displaying, and Examining Data
Introduction to R Carolina Salge March 29, 2017.
Introduction to R.
ggplot2 Merrill Rudd TAs: Brooke Davis and Megsie Siple
R in Power BI.
Graphical Presentation of data
Data Cleansing with SQL and R Kevin Feasel
Line Plots, Histograms and Box Plots
Next Generation R tidyr, dplyr, ggplot2
Summary Statistics in R Commander
Data Wrangling in the Tidyverse
Shoe Sizes.
Data manipulation in R: dplyr
Dplyr I EPID 799C Mon Sep
Ggplot2 I EPID 799C Mon Sep
Data Visualization using R
DATA MANAGEMENT MODULE: USING SQL in R
What Power BI users need to know about R
ETL – Using R Kiran Math Developer : Flour in Greenville SC
Tidy Data Global Health 811 April 3, 2018.
Use of Mathematics using Technology (Maltlab)
Thank you Sponsors.
Parts of an Excel Window
Tidy data, wrangling, and pipelines in R
Global Health 811 October 30th, 2018
Communication and Coding Theory Lab(CS491)
Installing Packages Introduction to R, Part II
Overview of R and ggplot2 for graphics
Lecture 7 – Delivering Results with R
Dplyr Tidyr & R Markdown
Data analysis with R and the tidyverse
Key Concepts R for Data Science.
EECS Introduction to Computing for the Physical Sciences
Tidy Data Global Health 811 April 9th, 2018.
R for Epi Workshop Module 2: Data Manipulation & Summary Statistics
Give your answer in Standard Form
Experiment #2 Resistor Statistics
Presentation transcript:

R Programming For Sql Developers ETL USING R Kiran Math Consultant kiranmath@outlook.com

Excel Data ETL Sql Server Table Motivation

Motivation

Motivation

DEMO MOTIVATion

Installation Comprehensive R Archive Network (CRAN) https://www.cran.r-project.org/ R Studio https://www.rstudio.com/ Installation

R <- Core && R <-packages ggPlot2 sqldf Base Packages rodbc dplyr stringR reshape2 tidyR lubridate R <- Core && R <-packages

zillow

Visualize Model Transform Get & Tidy Transform @hadleywickham

# Define a Variable a <- 25 # Call a Variable a ## [1] 25 # Do something to it a + 10 ## [1] 35 # Create a vector - Numeric x <- c(0.5, 0.6,0.7) ## call it x ## 0.5 0.6 0.7 # Do something to the vector mean(x) ## [1] 0.6 Basics 1 - vector

Functions are blocks of code that allow R to be a modular and facilitate code reuse Funct_name <- function (arg1,arg2, ..){ ### do something } ## Compute the mean of the vector of numbers meanX <- function(a_vector) { s <- sum(a_vector) l <- length(a_vector) m <- s/l return(m) } ### create a vector v <- c(1,2,3,4,5) ### Find the mean meanX(v) ## [1] 3 Basics 2 - Functions

Data frame Variables To Preview the data frame head(dat) Tail(dat) Observations dat A data frame is used for storing data tables. It is a list of vectors of equal length. To retrieve data in a cell, we would enter its row and column coordinates in the single square bracket "[]" operator. The two coordinates are separated by a comma. Number of Rows

R –Str() Compactly display the internal structure of an R object, a diagnostic function Str(object, ...) tDat If you need a quick overview of your dataset, use the R command str() and look at the structure. tells you something about the classes of your variables and the number of observations. dat$SaleDate <- as.Date(dat$SaleDate) Change the class of column SaleDate

R – Summary() summary(object) distribution of your variables in the dataset tDat Numerical variables: summary() gives you the range, quartiles, median, and mean. Factor variables: summary() gives you a table with frequencies.

Reshaping Data - DPLYR Select Subset variables (Columns). tDat Dat

filter Data - DPLYR Filter() allows you to select a subset of rows in a data frame.

piping- DPLYR %>% Passes object on LHS as first argument to function on RHS

Reshaping Data - tidyr Gather Gather columns into Rows Spread ~ does the opposite tDat gDat

Make new variable (Column) Mutate Compute and appends or or more new columns gDat

Reshaping Data - tidyr Separate Separate one column into several. Spread ~ does the opposite gDat tDat

Visualize Model Transform Get & Tidy Transform @hadleywickham

Data Visualization – ggplot2 Based of Grammar of Graphics One can build every graph from same few components Data set Set of Geom – visual marks that represent the data Coordinate system

Data Visualization – ggplot2 To display data values, map the variables in the dataset to aesthetic properties geom  color, size and x and y locations

Data Visualization – ggplot2 Qplot() Creates a complete plot with given data, geom and mapping. Supplies many useful defaults

Data Visualization – ggplot2 Add Layer elements with + Begin a plot that you can finish by adding layers to. No defaults but provides more control then qplot()

Data Visualization – ggplot2 Add Layer elements with + Begin a plot that you can finish by adding layers to. No defaults but provides more control then qplot()

Data Visualization – ggplot2 Lm() Begin a plot that you can finish by adding layers to. No defaults but provides more control then qplot()

Thank you