Download presentation
Presentation is loading. Please wait.
Published byBeverley McKinney Modified over 9 years ago
1
HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng IEEE 2007 Dec 3, 2014 Kyung-Bin Lim
2
2 / 35 Outline Introduction Methodology Experiments Conclusion
3
3 / 35 Apache HAMA Easy-of-use tool for data-intensive scientific computation Massive matrix/graph computations are often used as primary functionalities Fundamental design is changed from MapReduce with matrix computation to BSP with graph processing Mimic of Pregel running on HDFS – Use zookeeper as a synchronization barrier
4
4 / 35 Our Focus This paper is a story about previous version 0.1 of HAMA – Latest version: 0.7.0, Mar. 2014 released Only Focus on matrix computation with MapReduce Shows simple case studies
5
5 / 35 The HAMA Architecture We propose distributed scientific framework called HAMA (based on HPMR) – Provide transparent matrix/graph primitives
6
6 / 35 The HAMA Architecture HAMA API: Easy-to-use Interface HAMA Core: Provides matrix/graph primitives HAMA Shell: Interactive User Console
7
7 / 35 Contributions of HAMA Compatibility – Take advantage of all Hadoop features Scalability – Scalable due to compatibility Flexibility – Multiple Compute Engines Configurable Applicability – HAMA’s primitives can be applied to various applications
8
8 / 35 Outline Introduction Methodology Experiments Conclusion
9
9 / 35 Case Study With case study approach, we introduce two basic primitives with MapReduce model running on HAMA – Matrix multiplication and finding linear solution And compare with MPI versions of these primitives
10
10 / 35 Case Study Representing matrices – As a defaults, HAMA use HBase (NoSQL database) HBase is modeled after Google’s Bigtable Column oriented, semi-structured distributed database with high scalability
11
11 / 35 Case Study – Multiplication: Iterative Way Iterative approach (Algorithm)
12
12 / 35 Case Study – Multiplication: Iterative Way Simple, naïve strategy Works well with sparse matrix Sparse matrix: most entries are 0
13
13 / 35 Multiplication: Iterative Way
14
14 / 35 Multiplication: Iterative Way
15
15 / 35 Multiplication: Iterative Way
16
16 / 35 Multiplication: Iterative Way
17
17 / 35 Multiplication: Iterative Way
18
18 / 35 Multiplication: Iterative Way
19
19 / 35 Case Study – Multiplication: Block Way Multiplication can be done using sub-matrix Works well with dense matrix
20
20 / 35 Case Study – Multiplication: Block Way Block Approach – Minimize data movement (network cost)
21
21 / 35 Case Study – Multiplication: Block Way Block Approach (Algorithm)
22
22 / 35 Case Study – Finding Linear Solution Ax =b – x = ? A: known square symmetric positive-definite matrix b: known vector Use Conjugate Gradient approach
23
23 / 35 Case Study – Finding Linear Solution Finding Linear Solution – Cramer’s rule – Conjugate Gradient Method
24
24 / 35 Case Study – Finding Linear Solution Cramer’s rule
25
25 / 35 Case Study – Finding Linear Solution Conjugate Gradient Method – Find a direction (conjugate direction) – Find a step size (Line search)
26
26 / 35 Case Study – Finding Linear Solution Conjugate Gradient Method (Algorithm)
27
27 / 35 Outline Introduction Methodology Experiments Conclusion
28
28 / 35 Evaluations TUSCI (TU Berlin SCI) Cluster – 16 nodes, two Intel P4 Xeon processors, 1GB memory – Connected with SCI (Scalable Coherent Interface) network interface in a 2D torus topology – Running in OpenCCS (similar environment of HOD) Test sets
29
29 / 35 HPMR’s Enhancements Prefetching – Increase Data Locality Pre-shuffling – Reduces Amount of intermediate outputs to shuffle
30
30 / 35 Evaluations The comparison of average execution time and scaleup with Matrix Multiplication
31
31 / 35 Evaluations The comparison of average execution time and scaleup with CG
32
32 / 35 Evaluations The comparison of average execution time with CG, when a single node is overloaded
33
33 / 35 Outline Introduction Methodology Experiments Conclusion
34
34 / 35 Conclusion HAMA provides the easy-of-use tool for data-intensive computations – Matrix computation with MapReduce – Graph computation with BSP
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.