Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Platform and Analytics Foundational Training

Similar presentations


Presentation on theme: "Data Platform and Analytics Foundational Training"— Presentation transcript:

1 Data Platform and Analytics Foundational Training
Microsoft C+E Technology Training Data Platform and Analytics Foundational Training Solution Area Data Analytics Solution Big Data Technology Apache Spark [Speaker Name]

2 Apache Spark: A unified framework
8/23/2018 5:54 PM Apache Spark: A unified framework A unified, open source, parallel data processing framework for big data analytics Spark SQL Interactive queries Spark Streaming Stream processing Spark MLlib Machine learning GraphX Graph computation Spark core engine Yarn Mesos Standalone scheduler © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

3 Developer productivity
8/23/2018 5:54 PM Apache Spark benefits Performance Developer productivity Unified engine Ecosystem © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4 Advantages of a unified platform
8/23/2018 5:54 PM Advantages of a unified platform In many pipelines, data exchange between engines is the dominant cost Spark Streaming Machine learning Spark SQL Input streams of events NoSQL DB © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5 Spark integrates well with Hadoop
8/23/2018 5:54 PM Spark integrates well with Hadoop Alternative resource managers: Mesos or the Spark resource manager Primary resource managers: Hadoop 1.0+ or Hadoop YARN Spark Hadoop © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6 Faster data, faster results
Logistic regression 140 120 100 80 40 20 60 Hadoop Spark 0.9 Running time(s) Spark is the 2014 Sort Benchmark winner. 3x faster than 2013 winner (Hadoop). Spark is fast not just for in-memory, but for on-disk computation too Logistic regression on a 100-node cluster with 100 GB of data. tinyurl.com/spark-sort

7 What makes Spark fast? Data sharing between steps of a job
8/23/2018 5:54 PM What makes Spark fast? Data sharing between steps of a job In traditional MapReduce Reads from HDFS Writes to HDFS Writes to HDFS Step 1 Step 2 Step 1 Reads and writes from HDFS In Spark © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8 Spark cluster architecture
8/23/2018 5:54 PM Spark cluster architecture Cluster manager Worker node Cache Task Driver program SparkContext The driver runs the user’s main function and executes the various parallel operations on the worker nodes The driver collects the results of the operations Worker nodes read and write data from/to HDFS Worker nodes also cache transformed data in-memory as RDDs Read HDFS © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

9 Cluster Worker node 1 Worker 1 Job Spark driver RDD Gateway Browser
8/23/2018 5:54 PM Cluster Worker node 1 Worker 1 Job Task Spark driver Spark context RDD Gateway Browser Zeppelin Jupyter Spark submit Worker node 2 Worker 2 Head node Spark master App 0 App 1 App 2 Task Job Task Worker node 3 Worker 3 Task Job Task Worker node 4 Worker 4 Task Job Task © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

10 Use Cases

11 Apache Spark use cases High performance batch computation
8/23/2018 5:54 PM Apache Spark use cases High performance batch computation Interactive analytics Machine learning Real-time stream processing Data integration and ETL © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

12 Azure HDInsight supports Spark
Microsoft delivers interactive analytics on Big Data with Azure HDInsight

13 Power BI supports Spark
Power BI includes an out-of-the-box connector for Spark, enabling the creation and sharing of interactive reports and dashboards to any device

14 © 2016 Microsoft Corporation. All rights reserved
© 2016 Microsoft Corporation. All rights reserved. Microsoft, Windows, Microsoft Azure, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION


Download ppt "Data Platform and Analytics Foundational Training"

Similar presentations


Ads by Google