Microsoft Machine Learning & Data Science Summit

Slides:



Advertisements
Similar presentations
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Advertisements

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Feature: Purchase Requisitions - Requester © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
MIX 09 4/15/ :14 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
demo Default WANGPSLookup Default WANGPS.
Co- location Mass Market Managed Hosting ISV Hosting.
Multitenant Model Request/Response General Model.
Announcing Demo Announcing.
Feature: OLE Notes Migration Utility
Session 1.
Built by Developers for Developers…. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
 Rico Mariani Architect Microsoft Corporation.
Migrating to Windows Azure SQL Database Name Title Microsoft Corporation.
Feature: Assign an Item to Multiple Sites © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Print Remaining Documents © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
Windows Azure Connect Name Title Microsoft Corporation.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Document Attachment –Replace OLE Notes © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
Building Social Games for Windows 8 with Windows Azure Name Title Microsoft Corporation.
Feature: Customer Combiner and Modifier © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
SQL Server SQL Azure Visual Studio“Quadrant” SQL Server Modeling Services Entity Framework ADO.NET“M”/EDM Data Services …
announcing Dev Manager Do I understand what we’ve built? Developer Can I bet on using this shared component? Testers What’s changed since I last.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
demo Instance AInstance B Read “7” Write “8”

customer.
demo © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
demo Demo.
Breaking points of traditional approach What if you could handle big data?
demo QueryForeign KeyInstance /sm:body()/x:Order/x:Delivery/y:TrackingId1Z
Windows Azure SQL Data Sync Name Title Microsoft Corporation.
projekt202 © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
The CLR CoreCLRCoreCLR © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks.
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.

DataModel VisualizationExternal Assets Workbook Excel Services API BrowserRich Apps EWA JSOMBrowser REST BrowserRich Apps.
MIX 09 5/29/ :31 AM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Build interactive data analysis environments using Apache Spark
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Data Platform and Analytics Foundational Training
Возможности Excel 2010, о которых следует знать
Title of Presentation 11/22/2018 3:34 PM
Office Mac /30/2018 © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Title of Presentation 12/2/2018 3:48 PM
Introduction to Building Applications with Windows Azure
Jim Nakashima Program Manager Cloud Tools
1/3/2019 1:21 PM © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Tech Ed North America /12/2019 6:45 AM Required Slide
Silverlight Debugging
8/04/2019 9:13 PM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
SharePoint 2013 Authentication with Azure – Part 1
HDInsight Tools for Visual Studio
Виктор Хаджийски Катедра “Металургия на желязото и металолеене”
WINDOWS AZURE A LAP AROUND PLATFORM THE Steve Marx
PENSACOLA ENERGY WORK PLAN OCTOBER 10, 2016
Developing Windows Azure Applications with Visual Studio
Title of Presentation 5/12/ :53 PM
Шитманов Дархан Қаражанұлы Тарих пәнінің
Title of Presentation 5/24/2019 1:26 PM
5/24/2019 6:44 PM 1/8/18 Bell #10 In a world governed by the gods, is there any room for human will? Do human choices make a difference? EXPLAIN © 2007.
Using Smart Unit Tests to find bugs earlier in the development cycle
日本初公開!? Vista の新機能を実演 とっちゃん わんくま同盟 7/23/2019 9:09 AM
Title of Presentation 7/24/2019 8:53 PM
WCL425 App Compat for Nerds Chris Jackson.
Presentation transcript:

Microsoft Machine Learning & Data Science Summit September 26 – 27 | Atlanta, GA

Big, fast and data-furious… with Spark Maxim Lukiyanov Senior Program Manager Big Data, Microsoft

Session objectives and takeaways Tech Ready 15 5/31/2018 Session objectives and takeaways Session objective(s): Discover tools and techniques enabling interactive data analysis on Spark Explore interactive Spark, Notebooks, Job submission server, BI Tools, Developer Tools, Azure Cloud Discuss problems and solutions of resource management in Spark Key takeaway 1 Productivity of data scientists is bound by the speed of development cycle Key takeaway 2 Speed of development cycle in big data projects can be maintained at high level as long as right tools and techniques are utilized © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

What is your top concern for big data projects?

Length of Development Cycle Machine Learning & Data Science Conference 5/31/2018 11:33 PM Length of Development Cycle #1 © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Length of development cycle Universal metric to track and improve Affects productivity Predicts project risk

Development phases Data exploration and experimentation Data sharing Development of production code Debugging

Interactive Spark on Azure YARN Jupyter notebooks Default Queue Local HDFS Spark Application IntelliJ/Eclipse Spark Application Livy server REST Spark Application Blob Storage Command line SSH Thrift Queue BI Tools Spark Application Thrift server ODBC Data Lake Store

Components

Apache Spark Interactive compute engine Upcoming in Spark 2.0 Interactive on small datasets Interactive on large datasets on large clusters with in-memory or SSD caching Built-in sampling Upcoming in Spark 2.0 Tungsten Phase 2 (3-10x speedup) Structured Streams Great momentum Active and large community Supported by all major big data vendors Fast release cadence

Evolution of big data Data Sources

Spark on Azure Cloud (HDInsight) Fully Managed Service 100% open source Apache Spark and Hadoop bits Latest releases of Spark Fully supported by Microsoft and Hortonworks 99.9% Azure Cloud SLA Certifications: PCI, ISO 27018, SOC, HIPAA, EU-MC Tools for data exploration, experimentation and development Jupyter Notebooks (scala, python, automatic data visualizations) IntelliJ/Eclipse plugin (job submission, remote debugging) ODBC connector for Power BI, Tableau, Qlik, SAP, Excel, etc

Demo: Components in action Maxim Lukiyanov

Resource management

Interactive Spark on Azure YARN Jupyter notebooks Default Queue Local HDFS Spark Application IntelliJ/Eclipse Spark Application Livy server REST Spark Application Blob Storage Command line SSH Thrift Queue BI Tools Spark Application Thrift server ODBC Data Lake Store

Yarn resource management Dynamic resource allocation (Thrift) Thrift server adds executors when processing SQL queries After timeout it shrinks back Resource preemption (between queues) Thrift will take resources from other apps during activity and vice versa When multiple apps are active the resources are shared fairly

Yarn resource management: Limitations Bugs Capacity resource scheduler + Default resource calculator configuration works Dominant resource calculator breaks preemption logic Limitations No resource preemption between applications No application sharing between notebooks in Livy

Summary Components Techniques Apache Spark Jupyter + sparkmagic kernel (or Zeppelin) Livy job server Apache Yarn resource management using queues and preemption Columnar file formats (parquet, orc) IntelliJ/Eclipse + plugin for HDInsight [Non-OSS] BI Tools: Power BI, Tableau, Qlik, SAP, Excel, etc Azure Cloud Techniques Sample, sample, sample CACHE TABLE (or auto-caching using Alluxio) Scale out on demand using elasticity of the cloud

In review: session objectives and takeaways Tech Ready 15 5/31/2018 In review: session objectives and takeaways Session objective(s): Discover tools and techniques enabling interactive data analysis on Spark Explore interactive Spark, Notebooks, Job submission server, BI Tools, Developer Tools, Azure Cloud Discuss problems and solutions of resource management in Spark Key takeaway 1 Productivity of data scientists is bound by the speed of development cycle Key takeaway 2 Speed of development cycle can remain high even in big data projects as long as right tools and techniques are utilized © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Related content SparkMagic kernel for Jupyter notebook Livy job server https://github.com/jupyter-incubator/sparkmagic Livy job server https://github.com/cloudera/livy IntelliJ IDEA plug-in documentation https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-intellij-tool-plugin/ Azure Spark Documentation https://azure.microsoft.com/en-us/documentation/services/hdinsight/

Q & A

5/31/2018 11:33 PM © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.