Introduction to R and Data Science Tools in the Microsoft Stack

Slides:



Advertisements
Similar presentations
R for Macroecology Aarhus University, Spring 2011.
Advertisements

Lecture 2 LISAM. Statistical software.. LISAM What is LISAM? Social network for Creating personal pages Creating courses  Storing course materials (lectures,
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
R Programming Yang, Yufei. Normal distribution.
© 2015 by Wade Rogers Introduction to R Cytomics Workshop December, 2015.
SSMS SQL Server Management System. SQL Server Microsoft SQL Server is a Relational Database Management System (RDBMS) Relational Database Management System.
Blog: R YOU READY FOR.
Scripting Just Enough SSIS to be Dangerous. 6/13/2015 Visit the Sponsor tables to enter their end of day raffles. Turn in your completed Event Evaluation.
Overview of Security Investments in SQL Server 2016 and Azure SQL Database Jamey Johnston 1/15/2016Security Investments in SQL Server 2016 and Azure SQL.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Blog: R YOU READY FOR.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Blog: R YOU READY FOR.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Machine Learning with SQL Server 2016 & R Dinesh Asanka Senior Architect – Technology VirtusaPolaris.
Building 1 million predictions per second using SQL-R
DBMS Programs MS SQL Server & MySQL
Introduction to R and Data Science Tools in the Microsoft Stack
R Brown-Bag Seminar 2.1 Topic: Introduction to R Presenter: Faith Musili ICRAF-Geoscience Lab.
SQL 2016 R Services a.k.a. leveraging your local data lake
Programming in R Intro, data and programming structures
Enterprise Row Level Security: SQL Server 2016 and Azure SQL DB
Make Power BI Your Own with the Power BI APIs
R programming language
Introduction to R Samal Dharmarathna.
LISAM. Statistical software.
Overview of Security Investments
Using a Gateway to Leverage On-Premises Data in Power BI
Arko Barman COSC 6335 Data Mining Fall 2014
Second Annual Cytomics Workshop April, 2017
Introduction to R Carolina Salge March 29, 2017.
Bridging the Data Science and SQL Divide for Practitioners
Introduction to R.
Intro to R & MS Data Science Tools
Make Power BI Your Own with the Power BI APIs
R in Power BI.
Introduction to R Programming with AzureML
Welcome! Power BI User Group (PUG)
Power BI – Exploring New Frontiers
Introduction to R Studio
Using a Gateway to Leverage On-Premises Data in Power BI
Make Power BI Your Own with the Power BI APIs
What Power BI users need to know about R
PowerBI : Data Visualization SQL Saturday with R
Power BI – Exploring New Frontiers
Dane Stubben QuintilesIMS Database Manager
Lab 1 Introductions to R Sean Potter.
Welcome! Power BI User Group (PUG)
Overview of Security Investments
Introduction to R By Robert Biddle.
Use of Mathematics using Technology (Maltlab)
Make Power BI Your Own with the Power BI APIs
Thank you Sponsors.
R Programming For Sql Developers ETL USING R
Statistics 540 Computing in Statistics
Communication and Coding Theory Lab(CS491)
Installing Packages Introduction to R, Part II
Enterprise RLS in SQL Server in Power BI
Power BI – Exploring New Frontiers
Predictive Models with SQL Server Machine Learning Services
R Course 1st Lecture.
Data analysis with R and the tidyverse
Power BI – Exploring New Frontiers
What is New in SQL Server 2016 BI Stack
Python for Data Analysis
Ch 1 .Installing and configuring SQL Server 2005
Beyond orchestration with Azure Data Factory
Presentation transcript:

Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston

Agenda Intro to R MS and R Resources Why R? R and RStudio Basics Objects in R Packages Control Flows RStudio Overview MS and R Azure ML MS R Server SQL 2016 Power BI Resources Source: https://www.r-project.org/logo/ May 2016 Intro to R & Data Science Tools in the MS Stack

Jamey Johnston Data Scientist for an O&G Company 20+ years DBA Experience TAMU MS in Analytics http://analytics.stat.tamu.edu Semi-Pro Photographer http://jamey.photography @STATCowboy http://STATCowboy.com May 2016 Intro to R & Data Science Tools in the MS Stack

Why R? R if free Graphics and Data Viz I typically use others (PBI, Spotfire, SAS JMP) Flexible Statistical Analysis Toolkit Very Powerful Open Source Community Microsoft is investing in R (Revo Analytics) May 2016 Intro to R & Data Science Tools in the MS Stack

R and RStudio R Project for Statistical Computing RStudio https://www.r-project.org/ RStudio https://www.rstudio.com/ May 2016 Intro to R & Data Science Tools in the MS Stack

Basics # - comment Variable Creation Help > # Basics [1] 15 Help > help(“lm”) # lm is function for Fitting Linear Models > ?lm > lm(y ~ x) May 2016 Intro to R & Data Science Tools in the MS Stack

Objects in R Variables, Values, Commands, Functions … Everything in R is an Object Typical Data in R is stored in: Vectors (one row, same data type) Matrices (multiple rows, same data type) Data Frames (multiple rows, multiple data types) It’s like a Table! List (collection of objects) May 2016 Intro to R & Data Science Tools in the MS Stack

Vector Building Blocks for data objects in R c (combine) function to create a Vector v <- c(2, 3, 1.5, 3.1, 49) seq function generates numeric sequences s <- seq(from = 0, to = 100, by = .1) rep function replicates values r <- rep(c(1,4), times = 4) : creates a number seq incremented by 1 or -1 colon <- 1:10 length(var) returns length of vector length(colon) May 2016 Intro to R & Data Science Tools in the MS Stack

Matrix matrix function used to build matrix rbind (row bind) and cbind (column bind) Combine matrices by row or column http://www.ats.ucla.edu/stat/r/library/matrix_alg.htm Demos May 2016 Intro to R & Data Science Tools in the MS Stack

Data Frame It is like a table! rownames – extract row labels colnames – extract column labels read.table, read.csv, readxl, RODBC Different ways to create data frames Demos May 2016 Intro to R & Data Science Tools in the MS Stack

List Combine multiple objects types into one object vectors, matrices, data frames, list, functions Typically used by functions to output the model output e.g. the output from the lm function Demo May 2016 Intro to R & Data Science Tools in the MS Stack

Missing Data NA is used to represent Missing Data The is.na and which functions are used to manage NA > x <- c(1.3,2.3,3.4,NA) > print(x) [1] 1.3 2.3 3.4 NA > > # Returns integer location of values (not the values) > n <- which (is.na(x)) > v <- which (!is.na(x)) > print(n) [1] 4 > print(v) [1] 1 2 3 > # y will be set to the values not = NA > y <- x[!is.na(x)] > print(y) [1] 1.3 2.3 3.4 May 2016 Intro to R & Data Science Tools in the MS Stack

Packages Add-ons for R library() install.package(“dplyr2”, “ggplot2”) List packages already installed install.package(“dplyr2”, “ggplot2”) Install new packages library(dplyr2) Load package to be used in R May 2016 Intro to R & Data Science Tools in the MS Stack

Conditional Operators Comparisons return logical vector > 1:10 == 2 [1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1:10 != 2 [1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > 1:10 > 2 [1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > 1:10 >= 2 [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > 1:10 < 2 [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1:10 <= 2 [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > x <- 2 > x > 1 [1] TRUE May 2016 Intro to R & Data Science Tools in the MS Stack

Logical Operations > x <- 1:4 > x [1] 1 2 3 4 > > (x > 2) | (x <= 3) [1] TRUE TRUE TRUE TRUE > (x > 2) & (x <= 3) [1] FALSE FALSE TRUE FALSE > xor((x > 2), (x < 4)) [1] TRUE TRUE FALSE TRUE > 0:5 %in% x [1] FALSE TRUE TRUE TRUE TRUE FALSE May 2016 Intro to R & Data Science Tools in the MS Stack

Control Flows IF … ELSE FOR Loops WHILE Loops i <- 1 x <- 4 if (x < 3) print("true") else print("false") ifelse ((x < 3), print("true"), print("false")) FOR Loops for(i in 1:10) print(1:i) for (i in 1:nrow(df)) print(df[i,]) break and next … WHILE Loops i <- 1 while (i <= 10) { print(i) i <- i + 1 } May 2016 Intro to R & Data Science Tools in the MS Stack

RStudio Run Options Built-In Docs Version Control Projects CTL+Enter Ctl+Alt+R Built-In Docs Version Control Projects May 2016 Intro to R & Data Science Tools in the MS Stack

RStudio Debugging Breakpoints (Shift+F9) R Functions Environment Pane browser() debugonce() Environment Pane Traceback(Callstack) Console Step into function (Shift+F4) Finish Function (Shift+F6) Continue Running (Shift+F5) Stop Debugging (Shift+F8) May 2016 Intro to R & Data Science Tools in the MS Stack

Azure ML Azure Machine Learning R Integration May 2016 Intro to R & Data Science Tools in the MS Stack

MS R Server Enterprise Class R Built on Revolution Analytics acquistion SQL Server 2016 R Support via R Server https://www.microsoft.com/en-us/server-cloud/products/r-server/ Source: Microsoft Website (URL above) May 2016 Intro to R & Data Science Tools in the MS Stack

SQL 2016 and R Leverages the MS R Server Setup and Installation Install Advanced Analytics Extensions https://msdn.microsoft.com/en-us/library/mt590808.aspx Install R Packages and Providers for SQL Server R Services https://msdn.microsoft.com/en-us/library/mt590809.aspx Post-Installation Server Configuration https://msdn.microsoft.com/en-us/library/mt590536.aspx May 2016 Intro to R & Data Science Tools in the MS Stack

SQL 2016 and R SQL Server R Services Tutorials https://msdn.microsoft.com/en-US/library/mt591993.aspx DEMO - iris-sepal-example.sql sp_execute_external_script (Transact-SQL) https://msdn.microsoft.com/en-us/library/mt604368.aspx sp_execute_external_script @language = N'language' , @script = N'script', @input_data_1 = ] 'input_data_1' [ , @input_data_1_name = ] N'input_data_1_name' ] [ , @output_data_1_name = 'output_data_1_name' ] [ WITH <execute_option> [ ,...n ] ] [;] May 2016 Intro to R & Data Science Tools in the MS Stack

Power BI Running R Scripts in Power BI Desktop Demo – mtcars.pbix https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-r-scripts/ https://powerbi.microsoft.com/en-us/blog/announcing-preview-of-r-visuals-in-power-bi-desktop/ Demo – mtcars.pbix Options Needed May 2016 Intro to R & Data Science Tools in the MS Stack

Resources UCLA idre R-Bloggers (sign up for daily email) Quick-R http://statistics.ats.ucla.edu/stat/r/ R-Bloggers (sign up for daily email) http://www.r-bloggers.com/ Quick-R http://www.statmethods.net/ R in Action (book to go with website) Hadley Wickham http://hadley.nz/ May 2016 Intro to R & Data Science Tools in the MS Stack

Thank You Sponsors! Visit the Sponsor tables to enter their end of day raffles. Turn in your completed Event Evaluation form at the end of the day in the Registration area to be entered in additional drawings. May 2016 Intro to R & Data Science Tools in the MS Stack

Questions? Thank you for attending! @STATCowboy http://STATCowboy.com Download Demos and PPT May 2016 Intro to R & Data Science Tools in the MS Stack