BigDL Deep Learning Library on HDInsight

Slides:



Advertisements
Similar presentations
Breaking points of traditional approach What if you could handle big data?
Advertisements

IT Operations Management
Secure Hyperconnectivity with TeamViewer and Windows technologies
Azure Machine Learning Deploying and Managing Models in production
The story of an IoT solution
Creating Enterprise Grade BI Models with Azure Analysis Services
Azure File Sync Setup, configuration and management
Microsoft Machine Learning & Data Science Summit
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Use any Amazon S3 application with Azure Blob Storage
6/5/2018 1:30 PM THR1029 Spend less time managing data and more time with customers: Quick tour of Outlook Customer Manager Welly Lee
Azure Cloud Shell Magic of Modern Command-line Management
Developing Hybrid Apps on Microsoft Azure Stack
AI development using Data Science Virtual Machines (DSVM) in Azure
Azure SDKs and Tools for You
Optimizing Microsoft OneDrive for the enterprise
What a Real, Functioning DevOps Team Looks Like
Virtual Machine Diagnostics in Microsoft Azure
Azure Machine Learning Algorithm Accuracy Enhancement, Tips and Tricks
Location – the next frontier in analytics
SQL Server on Linux on All-Flash Arrays
8/6/ :17 AM THR2214 Hybrid Cloud Activated A customer case study optimizing on-premises & Azure performance and cost Mor Cohen-Tal Senior Product.
Modernizing Application Delivery with Containers & Kubernetes
Data Platform and Analytics Foundational Training
SQL Server for Java developers
Workflow Orchestration with Adobe I/O
IT Operations Management
Customize Office 365 Search and create result sources
Automate all things! Microsoft Azure continuous deployment
Modern Front-End Web Development with Visual Studio
Agile Planning with Visual Studio Team Services (VSTS)
9/22/2018 3:49 AM BRK2247 Learn from MVPs: Panel discussion on all things SharePoint and OneDrive © Microsoft Corporation. All rights reserved. MICROSOFT.
Azure PowerShell Aaron Roney Senior Program Manager Cormac McCarthy
Port your AWS Knowledge to Azure
11/22/2018 1:43 PM THR3005 How to provide business insight from your data using Azure Analysis Services Peter Myers Bitwise Solutions © Microsoft Corporation.
Continuous Delivery with Visual Studio Team Services
Supercharge Microsoft Teams using Teams apps in node.js
Azure Advisor: Optimization in the best way
Mobile Center and VSTS:​ Better together for your Mobile DevOps
Microsoft products for non-profits
Power-up NoSQL with Azure Cosmos DB
Introduction to ASP.NET Core 1.0
Five cool things you can do with Windows PowerShell on Office 365
Microsoft To-Do Preview
Securely pass passwords into your deployment
Yammer for IT Tom Kretzmer Solutions Developer, Westinghouse THR1016
Microsoft Exchange: Through the eyes of MVPs (Panel discussion)
1/2/2019 5:18 PM THR3016 Customer stories: Plan and orchestrate large resource deployments on Azure infrastructure Igal Figlin Principal PM Manager – Azure.
MDM Migration Analysis Tool (MMAT)
Overview: Dynamics 365 for Project Service Automation
Virtual Reality with Azure and Unity
Understand your Azure cloud assets dependencies with BMC Discovery
Tech Ed North America /12/2019 6:45 AM Required Slide
Breaking Down the Value of A Yammer Post: 20 Things to Do
Cool Microsoft Edge Tips and Tricks
When Bad Things Happen to Good Applications
Getting the most out of Azure resources with Azure Advisor
Manage your App Service resources using Command line tools
“Hey Mom, I’ll Fix Your Computer”
HDInsight Tools for Visual Studio
4/21/2019 7:09 AM THR2098 Unlock New Opportunities with Nintex Hawkeye Process Intelligence and Workflow Analytics Sr. Product.
Consolidate, manage, backup, and secure your cloud content
Designing Bots that Fit Your Organization
Ask the Experts: Windows 10 deployment and servicing
Passwordless Service Accounts
Azure Networking inside and out
Digital Transformation: Putting the Jigsaw Together
WCF and .NET Framework Microservices in Containers
Diagnostics and troubleshooting in Azure App Service Support Center
Optimizing your content for search and discovery
Presentation transcript:

BigDL Deep Learning Library on HDInsight 5/23/2018 1:25 PM THR3040 BigDL Deep Learning Library on HDInsight Microsoft Ignite September , 2017 Xiaoyong Zhu, Microsoft Sergey Ermolin, Intel © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

BigDL Deep Learning Library on HDInsight Microsoft Ignite September , 2017 Xiaoyong Zhu, Microsoft Sergey Ermolin, Intel

BIGDL WITHIN SPARK FRAMEWORK End-to-end Big Data Analytics with Deep Learning Functionalities Directly on Spark Natively integrated with Big Data (Hadoop/Spark) ecosystem Massively distributed, scale out Sends compute to data Fault tolerance Elasticity Incremental scaling Dynamic resource sharing BigDL https://software.intel.com/bigdl

BigDL features BigDL Python API Scala API Examples Documents Seq2Seq Vgg ResNet Lenet Inception SGD Adagrad Cross Entropy Distributed Training Batch Normalization Other 100+ Layers Tensor MKL Integration Spatial Convolution RELU LRN RNN Pooling BigDL

https://software.intel.com/bigdl BigDL Features Distributed Deep learning applications (training, fine-tuning & prediction) on Apache Spark* No changes to the existing Hadoop/Spark clusters needed https://software.intel.com/bigdl

https://github.com/intel-analytics/BigDL BIGDL benefits Allows to write deep learning applications as standard Spark programs Runs on top of existing Spark or Hadoop/Hive clusters Adds rich Deep Learning functionalities to Apache Spark Feature parity with Caffe and TensorFlow. High performance - Intel MKL and multi-threaded programming Efficient scale-out with an all-reduce communications on Spark https://github.com/intel-analytics/BigDL BigDL has been open-sourced since 2016: https://software.intel.com/bigdl

BigDL can re-use/fine-tune models from other frameworks BigDL Model File Load existing Caffe/Torch/TF Model Allows for transition from single-node to distributed application deployment Useful for inference Allows for minor model tuning Allows for model sharing between Data Scientists and Production Engr. Scoring can be done *outside of Spark*, as a Java app Caffe Model File Load BigDL TensorFlow Model File Save Torch Model File Storage https://software.intel.com/bigdl

BigDL integration with spark streaming Integration with Spark Streaming for runtime training and prediction HDFS/S3 Kafka Flume Kinesis Twitter BigDL Model RDDs Train Spark Streaming Evaluator StreamWriter Predict https://software.intel.com/bigdl

https://software.intel.com/bigdl Python API Support Based on PySpark, Python API in BigDL allows use of existing Python libs: Numpy Scipy Pandas Scikit-learn Matplotlib $pip install bigdl https://software.intel.com/bigdl

Jupyter Notebook support Running BigDL applications directly in Jupyter notebooks Share and Reproduce Notebooks can be shared with others Easy to reproduce and track Rich Content Texts, images, videos, LaTeX and JavaScript Code can also produce rich contents Rich toolbox Apache Spark, from Python, R and Scala Pandas, scikit-learn, ggplot2, dplyr, etc https://software.intel.com/bigdl

Visualization of optimization process - tensorboard BigDL integration with TensorBoard TensorBoard is a suite of web applications from Google for visualizing and understanding deep learning applications https://software.intel.com/bigdl

HDInsight on Linux Overview

HDInsight (Linux) supports… Hive & Hive LLAP & Standard Hadoop:  ETL, reporting, ad hoc queries, data mining and analysis, log analysis, data warehousing… Spark: real-time analysis, streaming analysis, machine learning, ETL, graph analysis, real-time SQL query R Server: advanced analytics over big data, machine learning, statistical analysis Hbase & Phoenix: No SQL storage with SQL friendly interfaces (Phoenix), suitable for key-value store or schema-changing logs Storm: real-time streaming analysis Kafka: high throughput data ingestion engine

Scale compute & storage independently Gateway nodes Head Worker Edge Zookeeper nodes Azure Blob Storage or Azure Data Lake Store

Demo

Train a CNN model on MNIST dataset Install BigDL on HDInsight – easy as 1-2-3 Configure Spark settings Set up BigDL parameters Set up network topologies Run, train, and see results

Set up HDInsight Cluster in a few steps

Monitor HDInsight Cluster via Ambari GUI

BigDL is easily installed and built (“Deploy to Azure”)

Spark Session configuration

Network Layout

To learn more about BigDL + HDInsight github.com/intel-analytics/BigDL software.intel.com/bigdl https://blogs.msdn.microsoft.com/azuredatalake/2017/03/17/ho w-to-use-bigdl-on-apache-spark-for-azure-hdinsight/

Please evaluate this session Tech Ready 15 5/23/2018 Please evaluate this session From your Please expand notes window at bottom of slide and read. Then Delete this text box. PC or tablet: visit MyIgnite https://myignite.microsoft.com/evaluations Phone: download and use the Microsoft Ignite mobile app https://aka.ms/ignite.mobileapp Your input is important! © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5/23/2018 1:25 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.