Download presentation
Presentation is loading. Please wait.
Published byHengki Agusalim Modified over 5 years ago
1
Motivation Contemporary big data tools such as MapReduce and graph processing tools have fixed data abstraction and support a limited set of communication operations MPI contains abundant and highly-optimized collective communication operations but is limited on data abstractions To improve the expressiveness and performance in big data processing… We introduce Harp library, which provides data abstractions and related communication abstractions and transform map-reduce programming model to map-collecitve model.
2
Features Hadoop Plugin (on Hadoop 1.2.1 and Hadoop 2.2.0)
Hierarchical data abstraction on arrays, key-values and graphs for easy programming expressiveness. Collective communication model to support various communication operations on the data abstractions. Caching with buffer management for memory allocation required from computation and communication BSP style parallelism Fault tolerance with check-pointing
3
Architecture MapReduce Applications Map-Collective Applications
YARN MapReduce V2 Harp MapReduce Applications Map-Collective Applications Application Framework Resource Manager
4
Collective Communication
Parallelism Model Shuffle M Collective Communication R Map-Collective Model MapReduce Model
5
Hierarchical Data Abstraction and Collective Communication
Vertex Table Key-Value Partition Array Commutable Key-Values Vertices, Edges, Messages Double Array Int Array Long Array Array Partition < Array Type > Struct Object Vertex Partition Edge Partition Array Table <Array Type> Message Partition Key-Value Table Byte Array Message Table Edge Table Broadcast, Send, Gather Broadcast, Allgather, Allreduce, Regroup-(combine/reduce), Message-to-Vertex, Edge-to-Vertex Broadcast, Send Partition Basic Types
6
Performance on Madrid Cluster (8 nodes)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.