ETL – Using R Kiran Math Developer : Flour in Greenville SC

Slides:



Advertisements
Similar presentations
PRE-SCHOOL QUANT WORKSHOP II R THROUGH EXCEL. NEW YORK TIMES INFOGRAPHICS GALARY The Jobless Rate for People Like You Home Prices in Selected Cities For.
Advertisements

Data in R. General form of data ID numberSexWeightLengthDiseased… 112m … 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX!
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
LSP 121 Week 2 Intro to Statistics and SPSS/PASW.
Arrays  Writing a program that uses a large amount of information.  Such as a list of 100 elements.  It is not practical to declare.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Dr. Michael R. Hyman, NMSU Statistics for a Single Measure (Univariate)
Data Transformation Data conversion Changing the original form of the data to a new format More appropriate data analysis New.
1 Summary Statistics Excel Tutorial Using Excel to calculate descriptive statistics Prepared for SSAC by *David McAvity – The Evergreen State College*
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
1 Access Lesson 6 Integrating Access Microsoft Office 2010 Introductory Pasewark & Pasewark.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
1.1 Displaying Distributions with Graphs
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Outline Class Intros – What are your goals? – What types of problems? datasets? Overview of Course Example Research Project.
I❤RI❤R Kin Wong (Sam) Game Plan Intro R Import SPSS file Descriptive Statistics Inferential Statistics GraphsQ&A.
Outline Class Intros Overview of Course Example Research Project.
Lecture 3 Describing Data Using Numerical Measures.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
VIDEO: INTRODUCTION TO STATA EMBA Data Analysis Professor Timothy Simcoe Boston University School of Management.
A Powerful Python Library for Data Analysis BY BADRI PRUDHVI BADRI PRUDHVI.
Example 3-2 Population Mean Vacation Ski Packages Issue: Determine the mean nightly revenue for the Foster City Hotel. Objective: Use Excel to calculate.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
QM Spring 2002 Statistics for Decision Making Excel for Statistics: An Overview.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
R Workshop #2 Basic Data Analysis. What we did last week: Understand the basics of how R works Generated objects (vectors, matrices, etc.) Read in data.
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Descriptive Statistics – Measures of Relative Position.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC
In cell E3 enter a formula to display the viewing cost if a local taxi is used. If the distance to the property is: less than 5 kilometres, the cost will.
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Tidy data, wrangling, and pipelines in R
Test of independence: Contingency Table
EMPA Statistical Analysis
DATA MANAGEMENT MODULE: USING SQL in R
Data Mining: Concepts and Techniques
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Data Cleansing with SQL and R Kevin Feasel
Example 3-2 Population Mean Vacation Ski Packages
Uploading and handling databases
Descriptive Statistics
Data Wrangling in the Tidyverse
(12) students were asked their SAT Math scores:
Data manipulation in R: dplyr
Dplyr I EPID 799C Mon Sep
DATA MANAGEMENT MODULE: USING SQL in R
Tidy Data Global Health 811 April 3, 2018.
Numerical Descriptives in R
Thank you Sponsors.
STA 282 – Regression Analysis
R Programming For Sql Developers ETL USING R
Parts of an Excel Window
Tidy data, wrangling, and pipelines in R
Sampling Distributions
Global Health 811 October 30th, 2018
Communication and Coding Theory Lab(CS491)
CSCI N317 Computation for Scientific Applications Unit R
Statistics and Data (Algebraic)
Statistics for a Single Measure (Univariate)
Custom Functions in Power Query
Dplyr Tidyr & R Markdown
EECS Introduction to Computing for the Physical Sciences
Tidy Data Global Health 811 April 9th, 2018.
Spark with R Martijn Tennekes
Presentation transcript:

ETL – Using R Kiran Math Developer Work @ : Flour in Greenville SC kiranmath@outlook.com

Motivation

Tidy Data Raw Sensor Data GOAL

R <- Core && R <-packages ggPlot2 sqldf Base Packages rodbc dplyr stringR reshape2 tidyR lubridate R <- Core && R <-packages

Home Sale price Question : I have a 3000 square ft house located in zipcode 29615. How much it will sale for?

Visualize Model Transform Get & Tidy Transform @hadleywickham

Get Data – From CSV File

Data frame Variables Observations dat[5,3] dat A data frame is used for storing data tables. It is a list of vectors of equal length. To retrieve data in a cell, we would enter its row and column coordinates in the single square bracket "[]" operator. The two coordinates are separated by a comma.

Str(Dat) If you need a quick overview of your dataset, use the R command str() and look at the structure. tells you something about the classes of your variables and the number of observations.

R – Summary() summary(object) distribution of your variables in the dataset Numerical variables: summary() gives you the range, quartiles, median, and mean. Factor variables: summary() gives you a table with frequencies.

Passes object on LHS as first argument to function on RHS SELECT - DPLYR

Visualize Model Transform Get & Tidy Transform @hadleywickham

Linear Regression model

Home Sale Question : I have a 3000 sql ft house and how much it will sale for? Answer : $198,000

DEMO – Housing Price

Motivation

Excel Data ETL Sql Server Table Motivation

Motivation

Gather Spread ~ does the opposite tDat gDat Gather columns into Rows

Mutate gDat Compute and appends or new columns

DEMO – Import Data into SQL SERVER

Thank you