Bridging the Data Science and SQL Divide for Practitioners October 7, 2017 Matthew A Simonson PhD Alex Barbeau
What is R? EPS2014 6/15/2018 1. Popular programming language 2. Very powerful, capable of advanced statistical and machine learning algorithms as well as publication quality graphics 3. Thriving user community expanding every year thanks to being taught at most universities © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Microsoft Acquires Revolution Analytics Remove Memory Constraints Parallel Algorithms Common R Packages Enterprise Solution Delivery In 2015 Revolution analytics was purchased by Microsoft Their product offers solutions that complement open source R 1.) improve scalability 2.) easier to deploy in a large scale commercial setting
How can R interact with SQL? EPS2014 6/15/2018 Option 1: Connect to SQL server, run and process with R locally ODBC connector Connect to SQL using a SQL connector library Download tables into R on the local workstation and work with data frames Data frames are tables that R stores data in Cons: Large scale analysis is not very feasible Limited by local memory Not many processors, minimal if any parallelization Pros: Allows for testing code on smaller subsets of data and is a friendly environment for debugging R © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
How can R interact with SQL? EPS2014 6/15/2018 Option 2: R Machine Learning Services 1. Connect to SQL using Microsoft R Machine Learning Services 2. Execute R script that runs in database 3. Cons: difficult to debug and develop R code in Microsoft R server environment 4. Pros: described in next 3 slides © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Why R Machine Learning Services? EPS2014 6/15/2018 provides a platform for developing and deploying intelligent applications that uncover new insights. © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Why R Machine Learning Services? EPS2014 6/15/2018 1.) Makes it convenient to use R language 2.) utilize packages from the R community 3.) create models / generate predictions using your SQL Server data © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Why R Machine Learning Services? EPS2014 6/15/2018 By keeping analytics close to the data you: remove the costs and security risks associated with data movement. © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Other Benefits? Revolution’s functions for multi-core analysis EPS2014 6/15/2018 Revolution’s functions for multi-core analysis Built in functions for distributed computing Why? Useful for speeding up analysis of large data sets © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Other Benefits? EPS2014 6/15/2018 Scalable to very big data analysis: 1.) Compatible with large scale analysis environments like Hadoop and apache spark © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Other Benefits? Architecture Resource Governance Resource Allocation EPS2014 6/15/2018 Architecture Resource Governance Resource Allocation 1.) Computing resources can be easily capped, will not crash your system or slow things down 2.) A set amount of computing resources can be easily allocated © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Examples: Translating a Standalone Model to R Server EPS2014 6/15/2018 Getting started with R scripting Using SQL Server data in R scripts Naming data elements Defining R input parameters Adding R scripts to stored procedures Defining R output parameters Summary © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Getting started with R scripting EPS2014 6/15/2018 A basic example of how to use the “sp_execute_external_script” stored procedure © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Getting started with R scripting EPS2014 6/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Getting started with R scripting EPS2014 6/15/2018 Specify value R Provide values for language parameter © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Getting started with R scripting EPS2014 6/15/2018 Define R script Provide values for script parameter Explain what R script is doing © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Getting started with R scripting EPS2014 6/15/2018 table output of executed procedure © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Getting started with R scripting EPS2014 6/15/2018 Easier to put the script in a T-SQL variable then call that variable from within the stored procedure © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Getting started with R scripting EPS2014 6/15/2018 Easier to put the script in a T-SQL variable then call that variable from within the stored procedure Why? © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Getting started with R scripting EPS2014 6/15/2018 Easier to put the script in a T-SQL variable then call that variable from within the stored procedure Why? 1.) Easier to read © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Getting started with R scripting EPS2014 6/15/2018 Easier to put the script in a T-SQL variable then call that variable from within the stored procedure Why? 1.) Easier to read 2.) Easier to update the R script © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Using SQL Server data in R scripts EPS2014 6/15/2018 Use T-SQL variable to store the SELECT statement © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Using SQL Server data in R scripts EPS2014 6/15/2018 Use T-SQL variable to store the SELECT statement Specify SELECT statement to retrieve data from DB © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Using SQL Server data in R scripts EPS2014 6/15/2018 Use T-SQL variable to store the SELECT statement Update R script to assign the InputDataSet value to OutputDataSet variable Specify SELECT statement to retrieve data from DB © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Using SQL Server data in R scripts EPS2014 6/15/2018 Table output: © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Using SQL Server data in R scripts EPS2014 6/15/2018 Now for some data processing © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Using SQL Server data in R scripts EPS2014 6/15/2018 Now for some data processing Find the average monthly sales given 7 months have passed © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Using SQL Server data in R scripts EPS2014 6/15/2018 Now for some data processing Find the average monthly sales given 7 months have passed Divide SalesYTD by 7 then round to two decimal places © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Using SQL Server data in R scripts EPS2014 6/15/2018 Table output: © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Naming data elements EPS2014 6/15/2018 So far, we have used the default names for input and output data sets © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Naming data elements EPS2014 6/15/2018 So far, we have used the default names for input and output data sets Specify name of input data set © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Naming data elements EPS2014 6/15/2018 So far, we have used the default names for input and output data sets Update variable name in R code Specify name of input data set © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Naming data elements Specify name of output data set EPS2014 6/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Naming data elements Update variable name in R code EPS2014 6/15/2018 Update variable name in R code Specify name of output data set © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Naming data elements EPS2014 6/15/2018 Specify column names and data types using WITH RESULT SETS clause © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Naming data elements Table output: EPS2014 6/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Defining R input parameters EPS2014 6/15/2018 Declare variables as input parameters; Includes name of variable and data type © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Defining R input parameters EPS2014 6/15/2018 Update scripts to reference the parameters Declare variables as input parameters; Includes name of variable and data type © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Adding R scripts to stored procedures EPS2014 6/15/2018 Why use stored procedures? © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Adding R scripts to stored procedures EPS2014 6/15/2018 Why use stored procedures? Call your R script just like any other database object © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Adding R scripts to stored procedures EPS2014 6/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Adding R scripts to stored procedures EPS2014 6/15/2018 Include input parameters as part of procedure definition © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Adding R scripts to stored procedures EPS2014 6/15/2018 Include input parameters as part of procedure definition Assign @MinSales parameter to the @TotalSales parameter Assign @MonthsYTD parameter to the @TotalMonths parameter © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Adding R scripts to stored procedures EPS2014 6/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Adding R scripts to stored procedures EPS2014 6/15/2018 Benefits? © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Adding R scripts to stored procedures EPS2014 6/15/2018 Benefits? Persistent structure © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Adding R scripts to stored procedures EPS2014 6/15/2018 Benefits? Persistent structure Pass different parameter values © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Defining R output parameters EPS2014 6/15/2018 Why? © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Defining R output parameters EPS2014 6/15/2018 Why? Define an output parameter for returning a scalar value or a data frame © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Defining R output parameters EPS2014 6/15/2018 Example: Updates preceding function to return only a scalar value © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Defining R output parameters EPS2014 6/15/2018 Declare the @mean variable © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Defining R output parameters EPS2014 6/15/2018 Declare the @mean variable Add @MeanOut parameter Include OUTPUT keyword © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Defining R output parameters EPS2014 6/15/2018 Remove the round function Find mean of SalesYTD and assign Declare the @mean variable Add @MeanOut parameter Include OUTPUT keyword © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Defining R output parameters EPS2014 6/15/2018 Set WITH RESULT SETS clause to NONE so no data frame is returned © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Defining R output parameters EPS2014 6/15/2018 Removed @output_data_1_name parameter Removed @output_data_1_name parameter because it cannot be used when specifying NONE for the WITH RESULT SETS clause (otherwise an error is returned) Set WITH RESULT SETS clause to NONE so no data frame is returned © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Defining R output parameters EPS2014 6/15/2018 We can now execute the stored procedure © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Defining R output parameters EPS2014 6/15/2018 We can now execute the stored procedure The stored procedure returns a scalar value of 461084.189 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Summary What have we learned? EPS2014 6/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Summary What have we learned? EPS2014 6/15/2018 What have we learned? How to execute R within SQL Server to anlyze stored data © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Summary What have we learned? EPS2014 6/15/2018 What have we learned? How to execute R within SQL Server to analyze stored data The Core Pieces: © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Summary What have we learned? EPS2014 6/15/2018 What have we learned? How to execute R within SQL Server to analyze stored data The Core Pieces: Define a SELECT statement Assign results to a variable Use variable in R script Manipulate data in R script Output data to the output variable Return data to SQL Server environment © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Some Final Pieces of Advice EPS2014 6/15/2018 Do not develop R code while writing the stored procedure Make sure syntax is precise Stick to R libraries that are broadly supported © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Thank you!