Build interactive data analysis environments using Apache Spark

Slides:



Advertisements
Similar presentations
Review DirectQuery in SSAS 2016, best practices and use cases
Advertisements

Learn how the cloud is accelerating network transformation
5 reasons to store your backups on ReFS 3.0 and Storage Spaces
Microsoft Ignite /4/2018 1:44 PM BRK3105
C# and VB code-focused development with Visual Studio
Microsoft /3/2018 4:38 PM BRK3184 Explore Spark 2.0 and structured streaming in Microsoft Azure HDInsight Maxim Lukiyanov Senior Program Manager,
2/20/2018 7:04 PM BRK1038 Meet Azure Information Protection customers and learn about their success stories Jeffrey Kalfut Strategy & Architecture Manager,
BRK1017 Taking your hybrid management and security strategy to the cloud with Operations Management Suite Jeremy Winter and Srini Chandrasekar.
Enterprise grade security in your Hadoop clusters on Azure
Microsoft Ignite /30/2018 9:28 PM BRK3174
Extending IT Best Practices to Microsoft Azure
Transform yourself and build your IT cloud career path
Deliver business insights with Microsoft Dynamics AX and Power BI
Examine information management in Cortana Intelligence
Microsoft Ignite /20/2018 3:40 PM BRK3068
Develop, debug and deploy containerized applications with Docker
Microsoft Ignite /22/2018 7:21 PM BRK2007
Operational Analytics in SQL Server 2016 and Azure SQL Database
Microsoft Machine Learning & Data Science Summit
Working With Azure Batch AI
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
BRK3288-Discover data-driven apps that learn and adapt
Windows Server* 2016 & Intel® Technologies
Conduct a successful pilot deployment of Microsoft Intune
Review the Nutanix Cloud Platform System Standard solution
Microsoft Ignite /11/2018 1:18 AM BRK4017
Microsoft /23/2018 1:11 AM BRK3180 Migrate CRM OnPremise organizations to CRM Online cloud using Dynamics Lifecycle Services (LCS) Aditya Varma Ganapathy.
Web development productivity with Visual Studio
Deep Dive into the Azure Container Service
Innovate with Microsoft BI in the enterprise
Red Hat OpenShift on Microsoft Azure
Elastic database patterns for SaaS applications in Azure
Azure Functions and Automation: The SQL Agent in the Cloud
Microsoft Ignite /22/2018 3:27 PM BRK2121
Secure Remote Access to on-premises Web Apps using Azure AD
BRK2264 Move 13,000+ global Dynamics CRM users from on-premises to Online at Caterpillar Inc. Todd Byrne & John Finney 1 Business Unit Name Here.
Master Modern PaaS for the Enterprise with Azure App Service
BRK1018 Discover how Manulife and Rackspace manage their hybrid environments today Satya Vel Principal Program Manager Operations Management Suite + System.
Get Started with Common Data Model (CDM) and PowerApps
Microsoft Ignite /8/2018 3:50 PM BRK2112
Design Seamless Upgrades to SQL Server 2016 with Query Store
Microsoft /8/2018 4:45 PM BRK3062 BRK3062- Build smarter and scalable applications using Microsoft Azure Database Services Moshe Gutman CEO, GeoSafe.
Bring new levels of visibility to your datacenter with Cisco Tetration
Integrate Power BI with Microsoft Dynamics
Microsoft Ignite /16/2018 2:39 PM BRK3307
Add intelligence to Dynamics AX with Cortana Intelligence suite
Use server-based personal desktops in Windows Server 2016
Break out of the box with Python
Azure SQL Data Warehouse Scaling: Configuration and Guidance
Enterprise security for big data solutions on Azure HDInsight
Accelerate Your Transition from Traditional IT to the Cloud
Deploy Windows 10 Mobile for the mobile workforce
Dive deep into ASP.NET Core 1.0
Explore web development with Microsoft ASP.NET Core 1.0
Microsoft Ignite /14/ :21 AM BRK2101
Migrate to CRM Online - Tips and Tricks
Searching for Rio: Azure Search, NBC Sports, and the Olympics
Determine your role in a managed service
Dive into Predictive Maintenance using Cortana Intelligence Suite
Microsoft Ignite /17/2018 2:42 AM BRK2223
Secure your Active Directory to mitigate risk in the cloud
Project Springfield Fuzz your code before hackers do
Microsoft Ignite /22/2018 3:58 PM BRK2254
Automating Windows 10 and software deployments from the Cloud
Task recorder in Dynamics AX
Improve Office 365 Adoption: Top 10 Ways
Learn how to use and customize the Dynamics AX interactive help system
Meetup User Experience Design for SharePoint
Choosing between Microsoft PowerPoint & Sway
Discussion Panel: Windows Server MVP Panel
Presentation transcript:

Build interactive data analysis environments using Apache Spark Microsoft 2016 5/29/2018 4:13 PM BRK3226 Build interactive data analysis environments using Apache Spark Maxim Lukiyanov Senior Program Manager, Big Data © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Agenda How it all fits together Components Resource management 5/29/2018 4:13 PM Agenda How it all fits together Components Apache Spark, Notebooks, Job submission server, BI Tools, Developer Tools, Azure Cloud Resource management © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

What is your top concern for big data projects?

Length of Development Cycle #1 © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Length of development cycle Universal metric to track and improve Affects productivity Predicts project risk

Development phases Data exploration and experimentation Data sharing Development of production code Debugging

Interactive Spark on Azure YARN Jupyter notebooks Default Queue Local HDFS Spark Application IntelliJ/Eclipse Spark Application Livy server REST Spark Application Blob Storage Command line SSH Thrift Queue BI Tools Spark Application Thrift server ODBC Data Lake Store

Components

Apache Spark Interactive compute engine New in Spark 2.0 Interactive on small datasets Interactive on large datasets on large clusters with in-memory or SSD caching Built-in sampling New in Spark 2.0 Tungsten Phase 2 (3-10x speedup) Structured Streams Great momentum Active and large community Supported by all major big data vendors Fast release cadence

Evolution of big data Data Sources

Spark on Azure Cloud (HDInsight) Fully Managed Service 100% open source Apache Spark and Hadoop bits Latest releases of Spark (2.0 is coming later this week) Fully supported by Microsoft and Hortonworks 99.9% Azure Cloud SLA Certifications: PCI, ISO 27018, SOC, HIPAA, EU-MC Tools for data exploration, experimentation and development Jupyter Notebooks (scala, python, automatic data visualizations) IntelliJ/Eclipse plugin (job submission, remote debugging) ODBC connector for Power BI, Tableau, Qlik, SAP, Excel, etc

Demo: Components in action Maxim Lukiyanov

Resource Management

Interactive Spark on Azure YARN Jupyter notebooks Default Queue Local HDFS Spark Application IntelliJ/Eclipse Spark Application Livy server REST Spark Application Blob Storage Command line SSH Thrift Queue BI Tools Spark Application Thrift server ODBC Data Lake Store

Yarn resource management Dynamic resource allocation (Thrift) Thrift server adds executors when processing SQL queries After timeout it shrinks back Resource preemption (between queues) Thrift will take resources from other apps during activity and vice versa When multiple apps are active the resources are shared fairly

Yarn resource management: Limitations Bugs Capacity resource scheduler + Default resource calculator configuration works Dominant resource calculator breaks preemption logic Limitations No resource preemption between applications No application sharing between notebooks in Livy

Summary Components Techniques Apache Spark Jupyter + sparkmagic kernel (or Zeppelin) Livy job server Apache Yarn resource management using queues and preemption Columnar file formats (parquet, orc) IntelliJ IDEA + plugin for HDInsight [Non-OSS] BI Tools: Power BI, Tableau, Qlik, SAP, Excel, etc Azure Cloud Techniques Sample, sample, sample CACHE TABLE (or auto-caching using Alluxio) Scale out on demand using elasticity of the cloud

Resources SparkMagic kernel for Jupyter notebook Livy job server https://github.com/jupyter-incubator/sparkmagic Livy job server https://github.com/cloudera/livy IntelliJ IDEA plug-in documentation https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-intellij-tool-plugin/ NYTaxi data science notebooks https://azure.microsoft.com/en-us/documentation/articles/machine-learning-data-science-spark-overview/

Q & A Maxim Lukiyanov

Free IT Pro resources To advance your career in cloud technology Microsoft Ignite 2016 5/29/2018 4:13 PM Free IT Pro resources To advance your career in cloud technology Plan your career path Microsoft IT Pro Career Center www.microsoft.com/itprocareercenter Cloud role mapping Expert advice on skills needed Self-paced curriculum by cloud role $300 Azure credits and extended trials Pluralsight 3 month subscription (10 courses) Phone support incident Weekly short videos and insights from Microsoft’s leaders and engineers Connect with community of peers and Microsoft experts Get started with Azure Microsoft IT Pro Cloud Essentials www.microsoft.com/itprocloudessentials Demos and how-to videos Microsoft Mechanics www.microsoft.com/mechanics Connect with peers and experts Microsoft Tech Community https://techcommunity.microsoft.com © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Please evaluate this session 5/29/2018 4:13 PM Please evaluate this session Your feedback is important to us! From your PC or Tablet visit MyIgnite at http://myignite.microsoft.com From your phone download and use the Ignite Mobile App by scanning the QR code above or visiting https://aka.ms/ignite.mobileapp © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5/29/2018 4:13 PM © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.