11/21/2018 11:32 PM BRK3316 Operationalizing Microsoft Cognitive Toolkit and TensorFlow models with HDInsight Spark Mary Wahl Data Scientist, AI Enablement.

Slides:



Advertisements
Similar presentations
customer.
Advertisements

demo © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
demo Demo.
demo QueryForeign KeyInstance /sm:body()/x:Order/x:Delivery/y:TrackingId1Z
Microsoft Teams Behind the Scenes – Q&A
Predicting Azure Consumption using Ensemble Learning
From IT Pros to IT Heroes - with Azure DevTest Labs
Azure Machine Learning Deploying and Managing Models in production
The story of an IoT solution
Azure File Sync Setup, configuration and management
Use any Amazon S3 application with Azure Blob Storage
6/5/2018 1:30 PM THR1029 Spend less time managing data and more time with customers: Quick tour of Outlook Customer Manager Welly Lee
Azure Cloud Shell Magic of Modern Command-line Management
Developing Hybrid Apps on Microsoft Azure Stack
AI development using Data Science Virtual Machines (DSVM) in Azure
Azure SDKs and Tools for You
Optimizing Microsoft OneDrive for the enterprise
What a Real, Functioning DevOps Team Looks Like
Azure Machine Learning Algorithm Accuracy Enhancement, Tips and Tricks
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
SQL Server on Linux on All-Flash Arrays
8/6/ :17 AM THR2214 Hybrid Cloud Activated A customer case study optimizing on-premises & Azure performance and cost Mor Cohen-Tal Senior Product.
Build smarter bots and devices by connecting to the Microsoft Graph
Data Platform and Analytics Foundational Training
Workflow Orchestration with Adobe I/O
Customize Office 365 Search and create result sources
Find, try and get line-of-business apps on Microsoft AppSource
Automate all things! Microsoft Azure continuous deployment
9/14/ :46 AM BRK3293 How the Portland Trail Blazers Use Personalization and Acxiom Data to Target Customers Chris Hoder Program Manager, AI + Research.
Modern Front-End Web Development with Visual Studio
Agile Planning with Visual Studio Team Services (VSTS)
9/22/2018 3:49 AM BRK2247 Learn from MVPs: Panel discussion on all things SharePoint and OneDrive © Microsoft Corporation. All rights reserved. MICROSOFT.
Confidence at speed: Visual Studio 2017 and your CI pipeline
Azure PowerShell Aaron Roney Senior Program Manager Cormac McCarthy
Continuous Delivery for Microsoft Azure
Port your AWS Knowledge to Azure
Continuous Delivery with Visual Studio Team Services
Azure Advisor: Optimization in the best way
PowerShell Unplugged Jeffrey Snover Technical Fellow
Mobile Center and VSTS:​ Better together for your Mobile DevOps
Title of Presentation 12/2/2018 3:48 PM
Microsoft products for non-profits
Power-up NoSQL with Azure Cosmos DB
Automating security for better, continuous compliance in the cloud
Five cool things you can do with Windows PowerShell on Office 365
Microsoft To-Do Preview
Microsoft Exchange: Through the eyes of MVPs (Panel discussion)
MDM Migration Analysis Tool (MMAT)
Overview: Dynamics 365 for Project Service Automation
Virtual Reality with Azure and Unity
Surviving identity management in a hybrid world
Breaking Down the Value of A Yammer Post: 20 Things to Do
8/04/2019 9:13 PM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Cool Microsoft Edge Tips and Tricks
When Bad Things Happen to Good Applications
Getting the most out of Azure resources with Azure Advisor
Manage your App Service resources using Command line tools
“Hey Mom, I’ll Fix Your Computer”
4/21/2019 7:09 AM THR2098 Unlock New Opportunities with Nintex Hawkeye Process Intelligence and Workflow Analytics Sr. Product.
4/28/2019 3:30 AM THR1061 Learn how Dynamics 365, Office 365 and related applications work together to transform the workplace Donna Edwards Solution Architect.
Consolidate, manage, backup, and secure your cloud content
Designing Bots that Fit Your Organization
Ask the Experts: Windows 10 deployment and servicing
Passwordless Service Accounts
Digital Transformation: Putting the Jigsaw Together
WCF and .NET Framework Microservices in Containers
Diagnostics and troubleshooting in Azure App Service Support Center
Optimizing your content for search and discovery
Title of Presentation 5/24/2019 1:26 PM
WCL425 App Compat for Nerds Chris Jackson.
Presentation transcript:

11/21/2018 11:32 PM BRK3316 Operationalizing Microsoft Cognitive Toolkit and TensorFlow models with HDInsight Spark Mary Wahl Data Scientist, AI Enablement Artificial Intelligence & Research @ Microsoft © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Common customer request: 11/21/2018 11:32 PM Common customer request: Train a DNN at scale on a huge pool of collected images… …and apply in real-time to new images. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

On further investigation: Very few of those images are labeled… 11/21/2018 11:32 PM On further investigation: Very few of those images are labeled… …and the customer would like the model’s predictions on the rest. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Machine Learning, Analytics, & Data Science Conference 11/21/2018 11:32 PM Session Goals Introduce an example use case Explain methods for DNN operationalization with PySpark Using Cognitive Toolkit (CNTK) and TensorFlow (TF) APIs Using MMLSpark Highlight common and insidious errors Enable attendees to adapt the methods © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Example use case: aerial image classification 11/21/2018 11:32 PM Example use case: aerial image classification © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Land use classification of aerial imagery Machine Learning, Analytics, & Data Science Conference 11/21/2018 11:32 PM Land use classification of aerial imagery Large, freely-available, labeled datasets Imagery: National Agriculture Imagery Program, every two years Labels: National Land Cover Database, every five years (w/ delay) Common need in industry and government Enforce regulations, collect taxes, geopolitical surveillance Monitor crop performance, property value estimation, marketing Barren Forested Shrub Cultivated Grassland Developed © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Selecting training and validation data Machine Learning, Analytics, & Data Science Conference 11/21/2018 11:32 PM Selecting training and validation data © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Training method: transfer learning 11/21/2018 11:32 PM Training method: transfer learning Adapts pretrained models for new tasks Used AlexNet and 52-layer ResNet pretrained on ImageNet classification task Accommodates smaller training datasets Avoids overfitting by retraining only part of the model Used a balanced training set of 44k labeled images Lower computation burden Performed retraining in under one hour on a single-GPU Windows Data Science Virtual Machine © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Data readers offer huge benefits during training 11/21/2018 11:32 PM Data readers offer huge benefits during training Minibatching Makes efficient use of multiple cores Improve gradient estimation Faster convergence (potentially) Queuing Pre-load the next minibatch while the GPU processes the current one Distributed training Partition data between workers Transformations Add diversity through random cropping/scaling/colorization © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Most commonly-used data readers 11/21/2018 11:32 PM Most commonly-used data readers Cognitive Toolkit (CNTK): “MAP file” lists filename and label for each image in the training set Read by MinibatchSource TensorFlow: “TFRecords” are binary files containing images and labels Read by TFRecordReader © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Quick look: data preparation and use in training 11/21/2018 11:32 PM Quick look: data preparation and use in training © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Batch scoring with CNTK and TF models on HDInsight Spark 11/21/2018 11:32 PM Batch scoring with CNTK and TF models on HDInsight Spark © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Motivation for operationalizing DNNs on Spark Machine Learning, Analytics, & Data Science Conference 11/21/2018 11:32 PM Motivation for operationalizing DNNs on Spark Reduces image data transfer latency Cluster and images can be located on the same Azure Data Lake Store (HDFS) Even scoring with DNNs is a time-intensive task Often 100s of milliseconds per image on CPU Split scoring task over arbitrarily-many worker nodes No interdependency -> “Embarrassingly parallel” scoring is possible Familiar Python interface to Cognitive Toolkit/TensorFlow © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Operationalization architecture on Azure 11/21/2018 11:32 PM Operationalization architecture on Azure Azure Data Lake Store (HDFS) - or - Azure HDInsight Spark Azure storage account © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Replicating image loading steps on Spark Machine Learning, Analytics, & Data Science Conference 11/21/2018 11:32 PM Replicating image loading steps on Spark Can’t use the data readers that we used during training: Cognitive Toolkit: MinibatchSource expects local file access to images listed in MAP files TensorFlow: Can’t realistically write TFRecords for all files Alternative: match the data loading steps that each reader performed during training with custom code © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Image pre-processing with OpenCV Machine Learning, Analytics, & Data Science Conference 11/21/2018 11:32 PM Image pre-processing with OpenCV Color channels loaded in “BGR” order Many other packages load images in RGB order Image data dimensions: “# color channels x width x height” Many other packages load images with dimensions “width x height x # channels” Data type (float vs. int, precision) may also differ NB: some mistakes have a surprisingly small effect on prediction accuracy! © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Split scoring task appropriately among workers Machine Learning, Analytics, & Data Science Conference 11/21/2018 11:32 PM Split scoring task appropriately among workers Divide images into n partitions: Map partitions to workers: Workers access data through a tuple generator: image_rdd = sc.binaryFiles('adl://account_name.azuredatalakestore.net/images/*.png', minPartitions=num_workers).coalesce(num_workers) labeled_images = image_rdd.mapPartitions(image_scoring_func).collect() def image_scoring_func(file_generator): for file in file_generator: # file is a two-tuple: [0] filename, [1] byte data ... return predicted_labels © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Demo: Batch scoring on Azure HDInsight Spark 11/21/2018 11:32 PM Demo: Batch scoring on Azure HDInsight Spark © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Results: Parallelization and processing time Machine Learning, Analytics, & Data Science Conference 11/21/2018 11:32 PM Results: Parallelization and processing time Measured time required to score an entire balanced test set of 11,760 images. From 38 minutes to <1 minute through parallelization (using CPU-only workers) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Machine Learning, Analytics, & Data Science Conference 11/21/2018 11:32 PM Results: overall classification accuracy ~80% for both CNTK and TensorFlow © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

11/21/2018 11:32 PM Operationalizing CNTK models with Microsoft Machine Learning for Apache Spark © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Microsoft Machine Learning for Apache Spark (MMLSpark) 11/21/2018 11:32 PM Microsoft Machine Learning for Apache Spark (MMLSpark) Easily ingest and preprocess images from HDFS Seamless integration with CNTK and OpenCV Featurize images and other inputs with pretrained DNNs BYOM or use one of many pretrained CNTK models Can use a GPU edge node to accelerate this process Train classifiers on featurized images Fast form of transfer learning that does not require GPU compute © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Demo: Training and scoring with DNNs using MMLSpark 11/21/2018 11:32 PM Demo: Training and scoring with DNNs using MMLSpark © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Results: Identifying newly-developed regions Machine Learning, Analytics, & Data Science Conference 11/21/2018 11:32 PM Results: Identifying newly-developed regions 2010 2016 © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Results: Predicting land use in Middlesex County, MA in 2016 Machine Learning, Analytics, & Data Science Conference 11/21/2018 11:32 PM Results: Predicting land use in Middlesex County, MA in 2016 Most recent ground-truth labels are from 2011 Red: developed; white: cultivated; green: all others (undeveloped) Come visit us at Microsoft’s NERD Center! © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Machine Learning, Analytics, & Data Science Conference 11/21/2018 11:32 PM Where to learn more: End-to-end tutorial covering the aerial image classification use case, with sample data/code/models: aka.ms/aerialimageclassification Download MMLSpark and examples from: https://github.com/Azure/mmlspark You can reach me (Mary Wahl) at mawah@Microsoft.com © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Please evaluate this session Tech Ready 15 11/21/2018 Please evaluate this session From your Please expand notes window at bottom of slide and read. Then Delete this text box. PC or tablet: visit MyIgnite https://myignite.microsoft.com/evaluations Phone: download and use the Microsoft Ignite mobile app https://aka.ms/ignite.mobileapp Your input is important! © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

11/21/2018 11:32 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.