Predicting Loan Delinquency at 1M Transactions per Second David Smith @revodavid R Community Lead, Microsoft
It looks like you’ve created a predictive model… NOW WHAT?
TRAINING A MODEL IS EASY OPERATIONALIZING IT IS HARDER http://hamiltonmusical.wikia.com/wiki/Right_Hand_Man TRAINING A MODEL IS EASY OPERATIONALIZING IT IS HARDER
Generating Predictions Batch Mode Create many (millions!) of predictions at once Time required proportional to number of predictions Real Time Only a few (maybe only one!) data point available to predict There may be multiple requests in a short timeframe Latency the key metric here Many applications require sub-second latency at endpoint
Real-Time Operationalization Options Rewrite prediction code in some other language PMML / C++ / Java / … OR, use your R code: Deploy as a web service with Microsoft R Server Deploy as a stored procedure in SQL Server
Lending Club Loan Performance Data www.lendingclub.com/info/download-data.action Feature selection and generation: aka.ms/lendingclub LoanStatNew Description all_util Balance to credit limit on all trades annual_inc_joint The combined self-reported annual income provided by the co-borrowers during registration dti_joint A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the co-borrowers' combined self-reported monthly income int_rate Interest Rate on the loan mths_since_last_record The number of months since the last public record. revol_util Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit. total_rec_prncp Principal received to date is_bad (generated) Late > 16 days, Default, or Charged Off
Operationalization with Microsoft R Server Consumption Explore and consume services in R directly Quant Microsoft R Client (mrsdeploy package) IT Administator Deployment Publish R function into web services Data Scientist Microsoft R Server configured for operationalizing R analytics Services / Sessions getService Microsoft R Client (mrsdeploy package) publishService Apps REST API calls Configuration In-cloud or on-prem Add nodes to scale out High availability & load balancing Integration Swagger-based APIs: Consume with any programming language Developer
Flexible vs Real-Time Deployment Flexible Deployment Publish any R script or function as Web Service R interpreter runs script on demand via REST API Real-Time Deployment Publish R model object RevoScaleR or MicrosoftML Prediction engine generates scores from data via REST API library(mrsdeploy) publishService( serviceType='Script', Code=<<R script or function>>) library(mrsdeploy) publishService( serviceType='RealTime', model=<<R object>>)
Real-Time Deployment Models Linear Regression (rxLinMod, rxFastLinear) Logistic Regression (rxLogit, rxLogisticRegression) Classification / Regression trees (rxDTree, rxFastTrees) Classification / Regression forests (rxDForest, rxFastForest) Stochastic gradient-boosted decision trees (rxBTrees) One-class Support Vector Machines (rxOneClassSvm) Convolutional Neural Networks (rxNeuralNet) Also: pre-trained models for text sentiment and image featurization Source: https://msdn.microsoft.com/en-us/microsoft-r/operationalize/data-scientist-manage-services#publish-web-services Have a model object that was created with following supported functions: From RevoScaleR package, these specific functions: rxLogit, rxLinMod, rxBTrees, rxDTree, and rxDForestfunctions From MicrosoftML package, only the machine learning tasks and transform tasks functions, which include rxFastTrees, rxFastForest, rxLogisticRegression, rxOneClassSvm, rxNeuralNet, rxFastLinear, featurizeText, concat, categorical, categoricalHash, selectFeatures, featurizeImage, getSentiment, loadimage, resizeImage, extractPixels, selectColumns, and dropColumns https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-azure-ml-netsharp-reference-guide
Flexible and real-time scoring with Microsoft R Server Demonstration Server: Azure Data Science Virtual Machine, Azure GS5 instance (32 cores, 448 GB memory) Client: SurfaceBook Flexible and real-time scoring with Microsoft R Server
Remote client, 10 threads with a payload of 100 predictions
Remote client, 10 threads with a payload of 100 predictions
Flexible vs Real-Time Performance Comparison Server: Standard_D3_v2 (4 CPU core, 14GB RAM), Windows Algos Real time (ms) Flexible (ms) RxLogit (model size 2K) 3.5 39.2 RxNeuralNet (model size 8K) 2.5 122.0 Model Size Real time (ms) Flexible (ms) 2 MB (RxLogisticRegression) 5.0 9215.7 43 MB 5.4 20255.6
Deployment in SQL Server 2016 Apps SQL SERVER 2016 sp_execute_external_script Flexible Microsoft R Client (RevoScaleR package) rxSerializeObject Apps sp_rxPredict Real-Time
SQL Server: R Script Operationalization
SQL Server: Real-Time Operationalization
1M predictions/sec Same benchmark One-sixth the resources SQL Server 2017 8 sockets, 192 cores 6 TB RAM Flexible operationalization blog.revolutionanalytics.com/2016/09/fraud-detection.html
Operationalization Overview Platform Flexible Operationalization Any R Function / Package Real-Time Operationalization Specific RevoScaleR / MicrosoftML models SQL Server EXEC sp_execute_external_script @language = N'R', @script = N'<<R script>>' EXEC sp_rxPredict @model=<<serialized R object>> @inputData=<<SQL query>> Microsoft R Server library(mrsdeploy) publishService( serviceType='Script', Code=<<R script or function>>) publishService( serviceType='RealTime', model=<<R object>>) Use Microsoft R Server 9+ or SQL Server 2016+ as the deployment server Flexible Operationalization supports any R code / package Real-Time Operationalization supports Microsoft R models with improved latency
Thank You! David Smith @revodavid R Community Lead, Microsoft Special thanks: Pratik Palnitkar, Microsoft Arun Gurunathan, Microsoft Download Microsoft R Client: aka.ms/rclient Data Science Virtual Machine: aka.ms/dsvm