Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Map-Reduce System with an Alternate API for Multi-Core Environments

Similar presentations


Presentation on theme: "A Map-Reduce System with an Alternate API for Multi-Core Environments"— Presentation transcript:

1 A Map-Reduce System with an Alternate API for Multi-Core Environments
Presented by Wei Jiang April 27, 2019 April 27, 2019 April 27, 2019 1 1

2 Outline Background MapReduce Generalized Reduction
System Design and Implementation Experiments Related Work Conclusions April 27, 2019 April 27, 2019 April 27, 2019 2 2

3 Background We have evaluated FREERIDE and Hadoop MapReduce based on a set of applications Phoenix is one of the implementations of MapReduce for shared-memory systems, written in C, of small code size We also want to make FREERIDE smaller and light-weighted April 27, 2019 April 27, 2019 April 27, 2019 3 3

4 Google’s MapReduce Engine
April 27, 2019 April 27, 2019 April 27, 2019 4 4

5 Phoenix implementation
It is based on the same principles but targets shared-memory systems Consists of a simple API that is visible to application programmers An efficient runtime that handles parallelization, resource management, and fault recovery April 27, 2019 April 27, 2019 5

6 Phoenix runtime April 27, 2019 April 27, 2019 6

7 Generalized Reduction
Processing structures April 27, 2019 April 27, 2019 7

8 A Case Study: Apriori April 27, 2019 April 27, 2019 8

9 A Case Study: Apriori April 27, 2019

10 System Design and Implementation
Basic dataflow of MATE (MapReduce with AlternaTE API) Data structures to communicate between the user code and the runtime Three sets of functions in MATE Example, how to write a user application April 27, 2019 April 27, 2019 10

11 MATE runtime dataflow Basic one-stage dataflow April 27, 2019
11

12 Data structures-(1) scheduler_args_t: Basic fields Field Description
Input_data Input data pointer Data_size Input dataset size Data_type Input data type Stage_num Computation-Stage number Splitter Pointer to Splitter function Reduction Pointer to Reduction function Finalize Pointer to Finalize function April 27, 2019 April 27, 2019 12

13 Data structures-(2) scheduler_args_t: Optional fields for performance tuning Field Description Unit_size # of bytes for one element L1_cache_size # of bytes for L1 data cache size Model Shared-memory parallelization model Num_reduction_workers Max # of threads for reduction workers(threads) Num_procs Max # of processor cores used April 27, 2019 April 27, 2019 13

14 Functions-(1) Transparent to users Function Description R/O
static inline void * schedule_tasks(thread_wrapper_arg_t *) R static void * combination_worker(void *) static int array_splitter(void *, int, reduction_args_t *) void clone_reduction_object(int num) static inline int isCpuAvailable(unsigned long, int) April 27, 2019 April 27, 2019 14

15 Functions-(2) APIs provided by the runtime Function Description R/O
int mate_init(scheudler_args_t * args) R int mate_scheduler(void * args) int mate_finalize(void * args) O void reduction_object_pre_init() int reduction_object_alloc(int size)—return the object id void reduction_object_post_init() void accumulate/maximal/minimal(int id, int offset, void * value) void reuse_reduction_object() void * get_intermediate_result(int iter, int id, int offset) April 27, 2019 April 27, 2019 15

16 Functions-(3) APIs defined by the user Function Decription R/O
int (*splitter_t)(void *, int, reduction_args_t *) O void (*reduction_t)(reduction_args_t *) R Void (*combination_t)(void*) void (*finalize_t)(void *) April 27, 2019 April 27, 2019 16

17 Implementation Considerations
Data partitioning: dynamically assigns splits to worker threads Buffer management: two temporary buffers, one for reduction objects, the other for combination results Fault tolerance: re-executes failed tasks; checkingpoint may be a better solution April 27, 2019

18 What is in the user code ? Implements necessary functions such as reduction, splitter, finalize, and etc. Generates the input dataset Setups the fields in scheduler_args_t Initializes the middleware and declare reduction object(s) Executes reduction tasks by calling mate_scheduler(one or more passes) Maybe does some finalizing work April 27, 2019 April 27, 2019 18

19 K-means user code int main (int argc, char **argv){ parse_args();
generate_points(); generate_means(); mate_init(&scheduler_args_t); reduction_object_pre_init(); while(needed) reduction_object_alloc(size); reduction_object_post_init(); while(not finished) { mate_scheduler(); update_means(); reuse_reduction_object(); process_next_iteration(); } mate_finalize(); April 27, 2019 April 27, 2019 19

20 Experiments: K-means K-means: 400MB, 3-dim points, k = 100 on one WCI node with 8 cores April 27, 2019 April 27, 2019 20

21 Experiments: K-means K-means: 400MB, 3-dim points, k = 100 on one AMD node with 16 cores April 27, 2019 April 27, 2019 21

22 Experiments: PCA PCA: 8000 * 1024 matrix, on one WCI node with 8 cores
April 27, 2019 April 27, 2019 22

23 Experiments: PCA PCA: 8000 * 1024 matrix, on one AMD node with 16 cores April 27, 2019 April 27, 2019 23

24 Experiments: Apriori Apriori: 1,000,000 transactions, 3% support, on one WCI node with 8 cores April 27, 2019 April 27, 2019 24

25 Experiments: Apriori Apriori: 1,000,000 transactions, 3% support, on one AMD node with 16 cores April 27, 2019 April 27, 2019 25

26 Related Work Improves MapReduce’s API or implementations
Evaluates MapReduce across different platforms and application domains Acadamia: CGL-MapReduce, Mars, MITHRA, Phoenix, Disco… Industry: Facebook (Hive), Yahoo! (Pig Latin, Map-Reduce-Merge), Google (Sawzall), Microsoft (Dryad) April 27, 2019 April 27, 2019 26

27 Conclusions MapReduce is simple and robust in expressing parallelism
Two-stage computation style may cause performance losses for some subclasses of applications in data-intensive computing MATE provides an alternate API that is based on generalized reduction This variation can reduce overheads of data management and communication between Map and Reduce April 27, 2019 April 27, 2019 27

28 Questions? April 27, 2019 April 27, 2019 April 27, 2019 28 28


Download ppt "A Map-Reduce System with an Alternate API for Multi-Core Environments"

Similar presentations


Ads by Google