Bridging the Data Science and SQL Divide for Practitioners

Slides:



Advertisements
Similar presentations
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Advertisements

Using Visual Basic 6.0 to Create Web-Based Database Applications
A Guide to SQL, Seventh Edition. Objectives Embed SQL commands in PL/SQL programs Retrieve single rows using embedded SQL Update a table using embedded.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
René Balzano Technology Solution Professional Data Platform Microsoft Switzerland Database Development with SQL Server Data Tools (SSDT)
Passage Three Introduction to Microsoft SQL Server 2000.
Introduction to Big Data and Hadoop Name Title Microsoft Corporation.
SEMESTER 1, 2013/2014 DB2 APPLICATION DEVELOPMENT OVERVIEW.
Overview What is SQL Server? Creating databases Administration Security Backup.
Copyright © 2006, SAS Institute Inc. All rights reserved. Enterprise Guide 4.2 : A Primer SHRUG : Spring 2010 Presented by: Josée Ranger-Lacroix SAS Institute.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 7-1 David M. Kroenke’s Chapter Seven: SQL for Database Construction and.
Using Visual Basic 6.0 to Create Web-Based Database Applications
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Operating System for the Cloud Runs applications in the cloud Provides Storage Application Management Windows Azure ideal for applications needing:
3-Tier Client/Server Internet Example. TIER 1 - User interface and navigation Labeled Tier 1 in the following graphic, this layer comprises the entire.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Chapter 15 Introduction to PL/SQL. Chapter Objectives  Explain the benefits of using PL/SQL blocks versus several SQL statements  Identify the sections.
Chapter 9: Advanced SQL and PL/SQL Guide to Oracle 10g.
Variables and control statements in PL\SQL Chapter 10.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS ® Using the SAS Grid.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Blog: R YOU READY FOR.
Chapter 2 Build Your First Project A Step-by-Step Approach 2 Exploring Microsoft Visual Basic 6.0 Copyright © 1999 Prentice-Hall, Inc. By Carlotta Eaton.
 INDEX  Overview.  Introduction.  System Requirement.  Features Of SQL.  Development Process.  System Design (SDLC).  Implementation.  Future.
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
Chapter 10 Application Development
IT Operations Management
Explore Microsoft SQL Server 2016 R services
Tips for Mastering Relational Databases Using SAS/ACCESS®
Data Platform and Analytics Foundational Training
Data Platform Modernization
PowerApps & Flow Licensing Overview for Partners
SQL 2016 R Services a.k.a. leveraging your local data lake
Data Platform and Analytics Foundational Training
A Guide to SQL, Seventh Edition
Component 1.6.
System Center Marketing
Creating Enterprise Grade BI Models with Azure Analysis Services
Learning to Program D is for Digital.
SQL and SQL*Plus Interaction
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
6/11/2018 8:14 AM THR2175 Building and deploying existing ASP.NET applications using VSTS and Docker on Windows Marcel de Vries CTO, Xpirit © Microsoft.
Chapter 2: Input, Processing, and Output
Spark Presentation.
Power BI Premium overview
Julie Strauss Senior Program Manager Microsoft
Data Platform and Analytics Foundational Training
SQL Server Data Tools for Visual Studio Part I: Core SQL Server Tools
IT Operations Management
Chapter Topics 2.1 Designing a Program 2.2 Output, Input, and Variables 2.3 Variable Assignment and Calculations 2.4 Variable Declarations and Data Types.
Excel Services Deployment and Administration
An Introduction to Visual Basic .NET and Program Design
ISC440: Web Programming 2 Server-side Scripting PHP 3
Dane Stubben QuintilesIMS Database Manager
Data Platform Modernization
Azure SQL Database: A Guided Tour
Server & Tools Business
11/22/2018 1:43 PM THR3005 How to provide business insight from your data using Azure Analysis Services Peter Myers Bitwise Solutions © Microsoft Corporation.
11/23/2018 8:30 AM BRK3037 BRK3037: Dive deep on building apps and services with the Office 365 Communications Platform David Newman Senior Program Manager.
TechEd /4/2018 3:19 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Power-up NoSQL with Azure Cosmos DB
TechEd /11/ :54 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
Multithreaded Programming
Alex Kelly | Program Manager
Tutorial 6 PHP & MySQL Li Xu
Using SQL*Plus.
Predictive Models with SQL Server Machine Learning Services
4/18/2019 9:46 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Service Template Creation from the Ground Up
Service Template Creation from the Ground Up
Presentation transcript:

Bridging the Data Science and SQL Divide for Practitioners October 7, 2017 Matthew A Simonson PhD Alex Barbeau

What is R? EPS2014 6/15/2018 1. Popular programming language 2. Very powerful, capable of advanced statistical and machine learning algorithms as well as publication quality graphics 3. Thriving user community expanding every year thanks to being taught at most universities © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Microsoft Acquires Revolution Analytics Remove Memory Constraints Parallel Algorithms Common R Packages Enterprise Solution Delivery In 2015 Revolution analytics was purchased by Microsoft Their product offers solutions that complement open source R 1.) improve scalability 2.) easier to deploy in a large scale commercial setting

How can R interact with SQL? EPS2014 6/15/2018 Option 1: Connect to SQL server, run and process with R locally ODBC connector Connect to SQL using a SQL connector library Download tables into R on the local workstation and work with data frames Data frames are tables that R stores data in Cons: Large scale analysis is not very feasible Limited by local memory Not many processors, minimal if any parallelization Pros: Allows for testing code on smaller subsets of data and is a friendly environment for debugging R © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

How can R interact with SQL? EPS2014 6/15/2018 Option 2: R Machine Learning Services 1. Connect to SQL using Microsoft R Machine Learning Services 2. Execute R script that runs in database 3. Cons: difficult to debug and develop R code in Microsoft R server environment 4. Pros: described in next 3 slides © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Why R Machine Learning Services? EPS2014 6/15/2018 provides a platform for developing and deploying intelligent applications that uncover new insights. © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Why R Machine Learning Services? EPS2014 6/15/2018 1.) Makes it convenient to use R language 2.) utilize packages from the R community 3.) create models / generate predictions using your SQL Server data © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Why R Machine Learning Services? EPS2014 6/15/2018 By keeping analytics close to the data you: remove the costs and security risks associated with data movement. © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Other Benefits? Revolution’s functions for multi-core analysis EPS2014 6/15/2018 Revolution’s functions for multi-core analysis Built in functions for distributed computing Why? Useful for speeding up analysis of large data sets © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Other Benefits? EPS2014 6/15/2018 Scalable to very big data analysis: 1.) Compatible with large scale analysis environments like Hadoop and apache spark © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Other Benefits? Architecture Resource Governance Resource Allocation EPS2014 6/15/2018 Architecture Resource Governance Resource Allocation 1.) Computing resources can be easily capped, will not crash your system or slow things down 2.) A set amount of computing resources can be easily allocated © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Examples: Translating a Standalone Model to R Server EPS2014 6/15/2018 Getting started with R scripting Using SQL Server data in R scripts Naming data elements Defining R input parameters Adding R scripts to stored procedures Defining R output parameters Summary © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Getting started with R scripting EPS2014 6/15/2018 A basic example of how to use the “sp_execute_external_script” stored procedure © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Getting started with R scripting EPS2014 6/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Getting started with R scripting EPS2014 6/15/2018 Specify value R Provide values for language parameter © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Getting started with R scripting EPS2014 6/15/2018 Define R script Provide values for script parameter Explain what R script is doing © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Getting started with R scripting EPS2014 6/15/2018 table output of executed procedure © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Getting started with R scripting EPS2014 6/15/2018 Easier to put the script in a T-SQL variable then call that variable from within the stored procedure © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Getting started with R scripting EPS2014 6/15/2018 Easier to put the script in a T-SQL variable then call that variable from within the stored procedure Why? © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Getting started with R scripting EPS2014 6/15/2018 Easier to put the script in a T-SQL variable then call that variable from within the stored procedure Why? 1.) Easier to read © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Getting started with R scripting EPS2014 6/15/2018 Easier to put the script in a T-SQL variable then call that variable from within the stored procedure Why? 1.) Easier to read 2.) Easier to update the R script © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Using SQL Server data in R scripts EPS2014 6/15/2018 Use T-SQL variable to store the SELECT statement © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Using SQL Server data in R scripts EPS2014 6/15/2018 Use T-SQL variable to store the SELECT statement Specify SELECT statement to retrieve data from DB © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Using SQL Server data in R scripts EPS2014 6/15/2018 Use T-SQL variable to store the SELECT statement Update R script to assign the InputDataSet value to OutputDataSet variable Specify SELECT statement to retrieve data from DB © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Using SQL Server data in R scripts EPS2014 6/15/2018 Table output: © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Using SQL Server data in R scripts EPS2014 6/15/2018 Now for some data processing © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Using SQL Server data in R scripts EPS2014 6/15/2018 Now for some data processing Find the average monthly sales given 7 months have passed © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Using SQL Server data in R scripts EPS2014 6/15/2018 Now for some data processing Find the average monthly sales given 7 months have passed Divide SalesYTD by 7 then round to two decimal places © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Using SQL Server data in R scripts EPS2014 6/15/2018 Table output: © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Naming data elements EPS2014 6/15/2018 So far, we have used the default names for input and output data sets © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Naming data elements EPS2014 6/15/2018 So far, we have used the default names for input and output data sets Specify name of input data set © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Naming data elements EPS2014 6/15/2018 So far, we have used the default names for input and output data sets Update variable name in R code Specify name of input data set © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Naming data elements Specify name of output data set EPS2014 6/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Naming data elements Update variable name in R code EPS2014 6/15/2018 Update variable name in R code Specify name of output data set © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Naming data elements EPS2014 6/15/2018 Specify column names and data types using WITH RESULT SETS clause © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Naming data elements Table output: EPS2014 6/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Defining R input parameters EPS2014 6/15/2018 Declare variables as input parameters; Includes name of variable and data type © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Defining R input parameters EPS2014 6/15/2018 Update scripts to reference the parameters Declare variables as input parameters; Includes name of variable and data type © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Adding R scripts to stored procedures EPS2014 6/15/2018 Why use stored procedures? © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Adding R scripts to stored procedures EPS2014 6/15/2018 Why use stored procedures? Call your R script just like any other database object © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Adding R scripts to stored procedures EPS2014 6/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Adding R scripts to stored procedures EPS2014 6/15/2018 Include input parameters as part of procedure definition © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Adding R scripts to stored procedures EPS2014 6/15/2018 Include input parameters as part of procedure definition Assign @MinSales parameter to the @TotalSales parameter Assign @MonthsYTD parameter to the @TotalMonths parameter © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Adding R scripts to stored procedures EPS2014 6/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Adding R scripts to stored procedures EPS2014 6/15/2018 Benefits? © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Adding R scripts to stored procedures EPS2014 6/15/2018 Benefits? Persistent structure © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Adding R scripts to stored procedures EPS2014 6/15/2018 Benefits? Persistent structure Pass different parameter values © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Defining R output parameters EPS2014 6/15/2018 Why? © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Defining R output parameters EPS2014 6/15/2018 Why? Define an output parameter for returning a scalar value or a data frame © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Defining R output parameters EPS2014 6/15/2018 Example: Updates preceding function to return only a scalar value © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Defining R output parameters EPS2014 6/15/2018 Declare the @mean variable © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Defining R output parameters EPS2014 6/15/2018 Declare the @mean variable Add @MeanOut parameter Include OUTPUT keyword © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Defining R output parameters EPS2014 6/15/2018 Remove the round function Find mean of SalesYTD and assign Declare the @mean variable Add @MeanOut parameter Include OUTPUT keyword © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Defining R output parameters EPS2014 6/15/2018 Set WITH RESULT SETS clause to NONE so no data frame is returned © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Defining R output parameters EPS2014 6/15/2018 Removed @output_data_1_name parameter Removed @output_data_1_name parameter because it cannot be used when specifying NONE for the WITH RESULT SETS clause (otherwise an error is returned) Set WITH RESULT SETS clause to NONE so no data frame is returned © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Defining R output parameters EPS2014 6/15/2018 We can now execute the stored procedure © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Defining R output parameters EPS2014 6/15/2018 We can now execute the stored procedure The stored procedure returns a scalar value of 461084.189 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Summary What have we learned? EPS2014 6/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Summary What have we learned? EPS2014 6/15/2018 What have we learned? How to execute R within SQL Server to anlyze stored data © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Summary What have we learned? EPS2014 6/15/2018 What have we learned? How to execute R within SQL Server to analyze stored data The Core Pieces: © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Summary What have we learned? EPS2014 6/15/2018 What have we learned? How to execute R within SQL Server to analyze stored data The Core Pieces: Define a SELECT statement Assign results to a variable Use variable in R script Manipulate data in R script Output data to the output variable Return data to SQL Server environment © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Some Final Pieces of Advice EPS2014 6/15/2018 Do not develop R code while writing the stored procedure Make sure syntax is precise Stick to R libraries that are broadly supported © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

Thank you!