Download presentation
Presentation is loading. Please wait.
Published byOsborne Green Modified over 9 years ago
1
© 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa
2
© 2015 IBM Corporation Outline Map/Reduce Scala Spark Core API Transformations and Actions Spark Platforms: – MLLib – Machine Learning –GraphX – Graph Processing –SQL –Streaming What’s new? 2
3
© 2015 IBM Corporation How to Analyze BigData? 3
4
© 2015 IBM Corporation Basic Example: Word Count (Spark & Python) 4 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
5
© 2015 IBM Corporation Basic Example: Word Count (Spark & Scala) 5 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
6
© 2015 IBM Corporation Scala Spark was originally written in Scala –Java and Python API were added later Scala: high-level language for the JVM –Object oriented –Functional programming –Immutable –Inspired by criticism of the shortcomings of Java Static types –Comparable in speed to Java –Type inference saves us from having to write explicit types most of the time Interoperates with Java –Can use any Java class –Can be called from Java code 6
7
© 2015 IBM Corporation Scala vs. Java 7 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
8
© 2015 IBM Corporation Spark 8 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
9
© 2015 IBM Corporation Spark & Scala: Creating RDD 9 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/ or SoftLayer object store
10
© 2015 IBM Corporation Spark & Scala: Basic Transformations 10 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
11
© 2015 IBM Corporation Spark & Scala: Basic Actions 11 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
12
© 2015 IBM Corporation Spark & Scala: Key-Value Operations 12 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
13
© 2015 IBM Corporation Example: Spark Core API 13 Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/
14
© 2015 IBM Corporation Example: Spark Core API 14 Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/
15
© 2015 IBM Corporation Example: Spark Core API 15 Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/
16
© 2015 IBM Corporation Example: Spark Core API 16 Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/ Better implementation:
17
© 2015 IBM Corporation Example: PageRank How to implement PageRank algorithm using Map/Reduce? 17 Hossein Falaki, Numerical Computing with Spark, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
18
© 2015 IBM Corporation Spark Platform 18 Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
19
© 2015 IBM Corporation Spark Platform: GraphX 19 Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
20
© 2015 IBM Corporation Spark Platform: GraphX Example: PageRank PageRank is implemented using Pregel graph processing 20
21
© 2015 IBM Corporation Spark Platform: MLLib 21 Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
22
© 2015 IBM Corporation Spark Platform: MLLib Example: K-Means Clustering Goal: Segment tweets into clusters by geolocation using Spark MLLib K-means clustering 22 https://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoi-diagram-using-spark-and-mllib/
23
© 2015 IBM Corporation Spark Platform: MLLib Example: K-Means Clustering 23 https://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoi-diagram-using-spark-and-mllib/
24
© 2015 IBM Corporation Spark Platform: MLLib Example: K-Means Clustering 24 https://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoi-diagram-using-spark-and-mllib/
25
© 2015 IBM Corporation Spark Platform: Streaming 25 Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
26
© 2015 IBM Corporation Spark Platform: Streaming Example 26
27
© 2015 IBM Corporation Spark Platform: SQL 27 Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
28
© 2015 IBM Corporation Spark Platform: SQL & MLLib Example 28 // SVM using Stochastic Gradient Descent Xiangrui Meng, MLLib: scalable machine learning on Spark, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
29
© 2015 IBM Corporation What’s new in 2015? Spark R (R interface) DataFrame – API via Spark SQL Spark ML – support for pipelines 29 Matei Zaharia, New directions for Spark in 2015, Spark Summit East March 2015, https://spark-summit.org/east-2015/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.