Download presentation
Presentation is loading. Please wait.
1
Speedup over Ji et al.'s work
Motivations Methods Reduction-based MapReduce Memory Hierarchy MapReduce Programming Model Emerged with the development of Data-Intensive Computing GPUs Cost-effective and Power-efficient Suitable for implementing MapReduce Effective Utilization of the Fast Shared Memory Shared memory size is small MapReduce generates a lot of key-value pairs Reduction-intensive applications: Reduction : associative and commutative Lots of key-value pairs share the same key map(input) { (key, value) = process(input); reductionobject->insert(key, value); } reduce(value1, value2) value1 = operation(value1, value2); Stores reduction objects in both shared memory and device memory Combines reduction results Handles shared memory overflow GPU Reduction Object 0 Reduction Object 1 Reduction Object 0 Reduction Object 1 … … … … … … Block 0’s Shared Memory Block 0’s Shared Memory Device Memory Reduction Object Reduction Object Result Array Device Memory … … CPU MapReduce Host Memory … … KeyIdx[0] ValIdx[0] KeyIdx[1] ValIdx[1] Multi-group Scheme Memory Allocator Divides each thread block into sub-groups Each sub-group owns a shared memory reduction object Trades off memory overhead and locking overhead M M M M M M M M Val Data Key Size Val Size Key Data Key Size Val Size Key Data Val Data K1:v k1:v k2:v K1:v K3:v k4:v K4:v k5:v K4:v K1:v k3:v Group by Key Experimental Results Conclusions K1: v, v, v, v K2:v K3:v, v K4:v, v, v K5:v R R R R R Presents methods of utilizing shared memory for MapReduce on GPUs Uses a reduction-based method to reduce memory overhead of the MapReduce, and performs reductions in shared memory Designs an approach for storing reduction objects on the memory hierarchy of a GPU Balances the memory overhead and locking overhead by developing a multi-group scheme. Configuration Multi-group Scheme Speedup over Sequential GPU: NVIDIA Quadro FX 5800 Device Memory: 4 GB Shared Memory: 16 KB GPUs Processing component: Memory component: Host Kernel 1 Kernel 2 Device Grid 1 Block (0, 0) (1, 0) (2, 0) (0, 1) (1, 1) (2, 1) Grid 2 Block (1, 1) Thread (3, 1) (4, 1) (0, 2) (1, 2) (2, 2) (3, 2) (4, 2) (3, 0) (4, 0) (Device) Grid Constant Memory Texture Device Block (0, 0) Shared Memory Local Thread (0, 0) Registers Thread (1, 0) Block (1, 0) Host Speedup over MapCG (Reduction-intensive) Speedup over MapCG (Non-reduction-intensive) Speedup over Ji et al.'s work
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.