Download presentation
Presentation is loading. Please wait.
Published byMalene Mikkelsen Modified over 5 years ago
1
A Map-Reduce System with an Alternate API for Multi-Core Environments
Presented by Wei Jiang April 27, 2019 April 27, 2019 April 27, 2019 1 1
2
Outline Background MapReduce Generalized Reduction
System Design and Implementation Experiments Related Work Conclusions April 27, 2019 April 27, 2019 April 27, 2019 2 2
3
Background We have evaluated FREERIDE and Hadoop MapReduce based on a set of applications Phoenix is one of the implementations of MapReduce for shared-memory systems, written in C, of small code size We also want to make FREERIDE smaller and light-weighted April 27, 2019 April 27, 2019 April 27, 2019 3 3
4
Google’s MapReduce Engine
April 27, 2019 April 27, 2019 April 27, 2019 4 4
5
Phoenix implementation
It is based on the same principles but targets shared-memory systems Consists of a simple API that is visible to application programmers An efficient runtime that handles parallelization, resource management, and fault recovery April 27, 2019 April 27, 2019 5
6
Phoenix runtime April 27, 2019 April 27, 2019 6
7
Generalized Reduction
Processing structures April 27, 2019 April 27, 2019 7
8
A Case Study: Apriori April 27, 2019 April 27, 2019 8
9
A Case Study: Apriori April 27, 2019
10
System Design and Implementation
Basic dataflow of MATE (MapReduce with AlternaTE API) Data structures to communicate between the user code and the runtime Three sets of functions in MATE Example, how to write a user application April 27, 2019 April 27, 2019 10
11
MATE runtime dataflow Basic one-stage dataflow April 27, 2019
11
12
Data structures-(1) scheduler_args_t: Basic fields Field Description
Input_data Input data pointer Data_size Input dataset size Data_type Input data type Stage_num Computation-Stage number Splitter Pointer to Splitter function Reduction Pointer to Reduction function Finalize Pointer to Finalize function April 27, 2019 April 27, 2019 12
13
Data structures-(2) scheduler_args_t: Optional fields for performance tuning Field Description Unit_size # of bytes for one element L1_cache_size # of bytes for L1 data cache size Model Shared-memory parallelization model Num_reduction_workers Max # of threads for reduction workers(threads) Num_procs Max # of processor cores used April 27, 2019 April 27, 2019 13
14
Functions-(1) Transparent to users Function Description R/O
static inline void * schedule_tasks(thread_wrapper_arg_t *) R static void * combination_worker(void *) static int array_splitter(void *, int, reduction_args_t *) void clone_reduction_object(int num) static inline int isCpuAvailable(unsigned long, int) April 27, 2019 April 27, 2019 14
15
Functions-(2) APIs provided by the runtime Function Description R/O
int mate_init(scheudler_args_t * args) R int mate_scheduler(void * args) int mate_finalize(void * args) O void reduction_object_pre_init() int reduction_object_alloc(int size)—return the object id void reduction_object_post_init() void accumulate/maximal/minimal(int id, int offset, void * value) void reuse_reduction_object() void * get_intermediate_result(int iter, int id, int offset) April 27, 2019 April 27, 2019 15
16
Functions-(3) APIs defined by the user Function Decription R/O
int (*splitter_t)(void *, int, reduction_args_t *) O void (*reduction_t)(reduction_args_t *) R Void (*combination_t)(void*) void (*finalize_t)(void *) April 27, 2019 April 27, 2019 16
17
Implementation Considerations
Data partitioning: dynamically assigns splits to worker threads Buffer management: two temporary buffers, one for reduction objects, the other for combination results Fault tolerance: re-executes failed tasks; checkingpoint may be a better solution April 27, 2019
18
What is in the user code ? Implements necessary functions such as reduction, splitter, finalize, and etc. Generates the input dataset Setups the fields in scheduler_args_t Initializes the middleware and declare reduction object(s) Executes reduction tasks by calling mate_scheduler(one or more passes) Maybe does some finalizing work April 27, 2019 April 27, 2019 18
19
K-means user code int main (int argc, char **argv){ parse_args();
generate_points(); generate_means(); mate_init(&scheduler_args_t); reduction_object_pre_init(); while(needed) reduction_object_alloc(size); reduction_object_post_init(); while(not finished) { mate_scheduler(); update_means(); reuse_reduction_object(); process_next_iteration(); } mate_finalize(); April 27, 2019 April 27, 2019 19
20
Experiments: K-means K-means: 400MB, 3-dim points, k = 100 on one WCI node with 8 cores April 27, 2019 April 27, 2019 20
21
Experiments: K-means K-means: 400MB, 3-dim points, k = 100 on one AMD node with 16 cores April 27, 2019 April 27, 2019 21
22
Experiments: PCA PCA: 8000 * 1024 matrix, on one WCI node with 8 cores
April 27, 2019 April 27, 2019 22
23
Experiments: PCA PCA: 8000 * 1024 matrix, on one AMD node with 16 cores April 27, 2019 April 27, 2019 23
24
Experiments: Apriori Apriori: 1,000,000 transactions, 3% support, on one WCI node with 8 cores April 27, 2019 April 27, 2019 24
25
Experiments: Apriori Apriori: 1,000,000 transactions, 3% support, on one AMD node with 16 cores April 27, 2019 April 27, 2019 25
26
Related Work Improves MapReduce’s API or implementations
Evaluates MapReduce across different platforms and application domains Acadamia: CGL-MapReduce, Mars, MITHRA, Phoenix, Disco… Industry: Facebook (Hive), Yahoo! (Pig Latin, Map-Reduce-Merge), Google (Sawzall), Microsoft (Dryad) April 27, 2019 April 27, 2019 26
27
Conclusions MapReduce is simple and robust in expressing parallelism
Two-stage computation style may cause performance losses for some subclasses of applications in data-intensive computing MATE provides an alternate API that is based on generalized reduction This variation can reduce overheads of data management and communication between Map and Reduce April 27, 2019 April 27, 2019 27
28
Questions? April 27, 2019 April 27, 2019 April 27, 2019 28 28
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.