Predicting Azure Consumption using Ensemble Learning 5/20/2018 1:18 PM BRK2289 Predicting Azure Consumption using Ensemble Learning Siddharth Kumar Senior Data Scientist Manager Customer Growth and Analytics Team © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Session Overview Introduction Model Workflow Data Preparation 5/20/2018 1:18 PM Session Overview Introduction Model Workflow Data Preparation Feature Engineering Model Building Model Validation Model Deployment Learnings © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
5/20/2018 1:18 PM Context The objective was to build a machine learning solution to predict azure customers spend in the next 6 months Session Goals Understanding of data science challenges faced in real world How to solve a regression problem overcoming those challenges Tricks to improve the model performance using deep learning Develop intelligent solution using ML / Deep Learning on Microsoft AI Platform Scalable way to deploy ML models using Microsoft HDInsight clusters © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Model workflow Feature Engineering Data collection Model Validation 5/20/2018 1:18 PM Model workflow Data Preparation Data Pre processing Transformations Data collection Data source Identification Initial Dataset creation Feature Engineering Feature extraction Feature transformation Feature selection Model Building Model Architecture Model Stacking Model Validation Picking the right accuracy measures Model Deployment Using Microsoft Azure © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Data Preparation Population Selection Data Cleansing 5/20/2018 1:18 PM Data Preparation Identify the right sample based on business requirements Population Selection Noise/Outlier treatment Missing values Data Cleansing Normalization / Log transformation Aggregation and Encoding Data Transformation Constant values/ Zero variance Highly correlated Data Reduction © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Data Selection and Transformation 5/20/2018 1:18 PM Data Selection and Transformation Log Appropriate transformation of response variable can improve performance of model Transformation of predictors also helps for some algorithms The above transformation helped us in identifying the right population to be model © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Model workflow Feature Engineering Data collection Model Validation 5/20/2018 1:18 PM Model workflow Data Preparation Data Pre processing Transformations Data collection Data source Identification Initial Dataset creation Feature Engineering Feature extraction Feature transformation Feature selection Model Building Model Architecture Model Stacking Model Validation Picking the right accuracy measures Model Deployment Using Microsoft Azure © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Hypotheses based feature generation 5/20/2018 1:18 PM Hypotheses based feature generation Prior spend Customers with higher historic run rate (spending/rate) are more likely to spend more on Azure Prior Spend Customers subscribed to higher tier offers are likely to spend more on Azure What offer they subscribed to.. Customers from developed countries are likely to have higher spend Customers from tech dominant states/regions are more likely to spend more What country they belong to.. Customer associated with professional services, tech industry are more likely to spend more on Azure Which industry they belong to.. Offering Customer Spend Geography Industry © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Using Deep Learning for Feature Engineering 5/20/2018 1:18 PM Using Deep Learning for Feature Engineering Deep Features Autoencoders Input layer Hidden layers Output layer Input layer Encoded layer - Autoencoders are helpful in feature representations It is also used for feature reduction (dimensionality reduction) For this project, autoencoders lead to minor lift in accuracy (1%) - The hidden layers were used to generate non-linear features - The deep features led to 10% improvement in Mean Absolute Error (MAE) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Model workflow Feature Engineering Data collection Model Validation 5/20/2018 1:18 PM Model workflow Data Preparation Data Pre processing Transformations Data collection Data source Identification Initial Dataset creation Feature Engineering Feature extraction Feature transformation Feature selection Model Building Model Architecture Model Stacking Model Validation Picking the right accuracy measures Model Deployment Using Microsoft Azure © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Model Architecture Gradient Boosting Machine Distributed Random Forest 5/20/2018 1:18 PM Gradient Boosting Machine GBM4 GBM2 GBM1 GBM1 GBM3 GBM5 Distributed Random Forest DRF4 DRF5 DRF1 DRF2 DRF3 Generalized Linear Model Generalized Linear Models Analytical dataset GLM4 GLM1 GLM2 GLM3 GLM5 Deep Neural Nets (ANN) DL4 DL1 DL2 DL3 DL5 Base learners Super learner © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Model workflow Feature Engineering Data collection Model Validation 5/20/2018 1:18 PM Model workflow Data Preparation Data Pre processing Transformations Data collection Data source Identification Initial Dataset creation Feature Engineering Feature extraction Feature transformation Feature selection Model Building Model Architecture Model Stacking Model Validation Picking the right accuracy measures Model Deployment Using Microsoft Azure © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Model Validation Picking the right performance metric is important 5/20/2018 1:18 PM Model Validation Picking the right performance metric is important Metric Formulae Pro Cons R^2 % the variance in response explained by predictors Can be misleading MAE Outlier resistant No sense of direction RMSE General purpose Metric Outlier sensitive Decision should be driven by the specific business use case © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Model Validation: our case 5/20/2018 1:18 PM Model Validation: our case Over Prediction Under Prediction Model Stats: R^2: 0.82 RMSE: 0.5963707 MAE: 0.247357 Spike at 100% indicates cases where the model failed to predict spend but there was some spend © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Model workflow Feature Engineering Data collection Model Validation 5/20/2018 1:18 PM Model workflow Data Preparation Data Pre processing Transformations Data collection Data source Identification Initial Dataset creation Feature Engineering Feature extraction Feature transformation Feature selection Model Building Model Architecture Model Stacking Model Validation Picking the right accuracy measures Model Deployment Using Microsoft Azure © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Scalable Model Deployment using Microsoft Azure 5/20/2018 1:18 PM Scalable Model Deployment using Microsoft Azure Data collection Data Preparation Feature Engineering Model Building Output Endpoint dashboards SQL Database SQL Database Storage blob Storage blob Microsoft PowerBI SQL Database *The pipeline was setup using Azure Data Factory © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Summary of Learnings 01 04 Select evaluation metric based on use case 5/20/2018 1:18 PM Summary of Learnings 01 Select evaluation metric based on use case Preprocessing is important Lorem Ipsum Model Evaluation Data preparation Spending time on Feature engineering leads to higher rewards Don’t use grid search for large datasets Model Tuning Feature Engineering Uncorrelated models make stacking useful Stacking Deep Learning Deep learning for feature engineering 04 Lorem Ipsum Learn many models, not just one Crowdsourcing Benchmark Establish a benchmark model © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Please evaluate this session Tech Ready 15 5/20/2018 Please evaluate this session From your Please expand notes window at bottom of slide and read. Then Delete this text box. PC or tablet: visit MyIgnite https://myignite.microsoft.com/evaluations Phone: download and use the Microsoft Ignite mobile app https://aka.ms/ignite.mobileapp Your input is important! © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.