Presentation is loading. Please wait.

Presentation is loading. Please wait.

Smart: A MapReduce-Like Framework for In-Situ Scientific Analytics

Similar presentations


Presentation on theme: "Smart: A MapReduce-Like Framework for In-Situ Scientific Analytics"— Presentation transcript:

1 Smart: A MapReduce-Like Framework for In-Situ Scientific Analytics
Yi Wang* Gagan Agrawal* Tekin Bicer# Wei Jiang^ *The Ohio State University #Argonne National Laboratory ^Quantcast Corp.

2 Outline Introduction Challenges System Design Experimental Results
Conclusion

3 The Big Picture of In Situ
In-Situ Algorithms No disk I/O Indexing, compression, visualization, statistical analysis, etc. In-Situ Resource Scheduling Systems Enhance resource utilization Simplify the management of analytics code GoldRush, Glean, DataSpaces, FlexIO, etc. Algorithm/Application Level Seamlessly Connected? Platform/System Level These two layers have been extensively studied. Is there anything to do between the two levels?

4 Rethink These Two Levels
In-Situ Algorithms Implemented with low-level APIs like OpenMP/MPI Manually handle all the parallelization details In-Situ Resource Scheduling Systems Play the role of coordinator Focus on scheduling issues like cycle stealing and asynchronous I/O No high-level parallel programming API Motivation Can the applications be mapped more easily to the platforms for in-situ analytics? Can the offline and in-situ analytics code be (almost) identical?

5 Opportunity Explore the Programming Model Level in In-Situ Environment Between application level and system level Hides all the parallelization complexities by simplified API A prominent example: MapReduce + In Situ

6 Challenges Hard to Adapt MR to In-Situ Environment 4 Mismatches
MR is not designed for in-situ analytics 4 Mismatches Data Loading Mismatch Programming View Mismatch Memory Constraint Mismatch Programming Language Mismatch

7 Data Loading Mismatch In Situ Requires Taking Input From Memory
Ways to Load Data into MRs From distributed file systems Hadoop and many variants (on HDFS), Google MR (on GFS), and Disco (on DDFS) From shared/local file systems MARIANE and CGL-MapReduce MPI-Based: MapReduce-MPI and MRO-MPI From memory Phoenix (shared-memory) From data streams HOP, M3, and iMR

8 Data Loading Mismatch (Cont’d)
Few MR Option Most MRs load data from file systems Loading data from memory is mostly restricted to shared-memory environment Wrap simulation output as a data stream? Periodical stream spiking Only one-time scan is allowed An Exception -- Spark Can support loading data from file systems, memory, or data stream Spark works as long as input can be converted into RDD.

9 Programming View Mismatch
Scientific Simulation Parallel programming view Explicit parallelism: partitioning, message passing, and synchronization MapReduce Sequential programming view Partitions are transparent Need a Hybrid Programming View Exposes partitions during data loading Hides parallelism after data loading Traditional MapReduce implementations cannot explicitly take partitioned simulation output as the input, or launch the execution of analytics from an SPMD region.

10 Memory Constraint Mismatch
MR is Often Memory/Disk Intensive Map phase creates intermediate data Sorting, shuffling, and grouping do not reduce intermediate data at all Local combiner cannot reduce the peak memory consumption (in map phase) Need Alternate MR API Avoids key-value pair emission in the map phase Eliminates intermediate data in the shuffling phase

11 Programming Language Mismatch
Simulation Code in Fortran or C/C++ Impractical to rewrite in other languages Mainstream MRs in Java/Scala Hadoop in Java Spark in Scala/Java/Python Other MRs in C/C++ are not widely adopted

12 Bridging the Gap Addresses All the Mismatches
Loads data from (distributed) memory, even without extra memcpy in time sharing mode Presents a hybrid programming view High memory efficiency with alternate API Implemented in C++11, with OpenMP + MPI

13 System Overview In-Situ System Distributed System Shared-Memory System
Distributed System = Partitioning + Shared-Memory System + Combination In-Situ System = Shared-Memory System + Combination = Distributed System – Partitioning This is because the input data is automatically partitioned based on simulation code.

14 Two In-Situ Modes Space Sharing Mode:
Enhances resource utilization when simulation reaches its scalability bottleneck Time Sharing Mode: Minimizes memory consumption

15 Ease of Use Launching Smart Application Development
No extra libraries or configuration Minimal changes to the simulation code Analytics code remains the same in different modes Application Development Define a reduction object Derive a Smart scheduler class gen_key(s): generates key(s) for a data chunk accumulate: accumulates data on a reduction object merge: merges two reduction objects Both reduction map and combination map are transparent to the user, and all the parallelization complexity is hidden in a sequential view.

16 Smart vs. Spark To Make a Fair Comparison
Bypass programming view mismatch Run on an 8-core node: multi-threaded but not distributed Bypass memory constraint mismatch Use a simulation emulator that consumes little memory Bypass programming language mismatch Rewrite the simulation in Java and only compare computation time 40 GB input and 0.5 GB per time-step K-Means Histogram 62X 92X

17 Smart vs. Spark (Cont’d)
Faster Execution Spark 1) emits intermediate data, 2) makes immutable RDDs, and 3) serializes RDDs and sends them through network even in the local mode Smart 1) avoids intermediate data, 2) performs data reduction in place, and 3) takes advantage of shared-memory environment (of each node) Greater (Thread) Scalability Spark launches extra threads for other tasks, e.g., communication and driver’s UI Smart launches no extra thread Higher Memory Efficiency Spark: over 90% of 12 GB memory Smart: around 16 MB besides 0.5 GB time-step 3 reasons of such performance difference: Spark emits massive amounts of intermediate data, whereas Smart performs data reduction in place and eliminates emission of intermediate data. Spark makes a new RDD for every transformation due to its immutability, whereas Smart reuses reduction/combination maps even for iterative processing; 3) Spark serializes RDDs and send them through network even in local mode, whereas Smart takes advantage of share-memory environment and avoids replicating any reduction object.

18 Smart vs. Low-Level Implementations
Setup Smart: time sharing mode; Low-Level: OpenMP + MPI Apps: K-means and logistic regression 1 TB input on 8–64 nodes Programmability 55% and 69% parallel codes are either eliminated or converted into sequential code Performance Up to 9% extra overheads for k-means Nearly unnoticeable overheads for logistic regression Logistic Regression K-Means Up to 9% extra overheads for k-means: low-level implementation can store reduction objects in contiguous arrays, whereas Smart stores data in map structures, and extra serialization is required for each synchronization. Nearly unnoticeable overheads for logistic regression: only a single key-value pair created.

19 Conclusion Positioning Functionality Performance Open Source
Programming model research on in-situ analytics Functionality Addresses all the 4 mismatches Time sharing and space sharing modes Performance Outperforms Spark by at least an order of magnitude Comparable to low-level implementations Open Source

20 My Google Team is Hiring!
Q&A My Google Team is Hiring!

21 Backup Slides

22 Launching Smart in Time Sharing Mode

23 Launching Smart in Space Sharing Mode

24 Data Processing Mechanism

25 Node Scalability Setup
1 TB data output by Heat3D; time sharing; 8 cores per node 4-32 nodes

26 Thread Scalability Setup
1 TB data output by Lulesh; time sharing; 64 nodes 1-8 threads per node

27 Memory Efficiency of Time Sharing
Setup Logistic regression on Heat3D using 4 nodes (left) Mutual information on Lulesh using 64 nodes (right)

28 Efficiency of Space Sharing Mode
Setup 1 TB data output by Lulesh 8 Xeon Phi nodes and 60 threads per node Apps: K-Means (Left) and Moving Median (Right) For histogram, the performance of time sharing mode is better, because its computation is very lightweight, and the overall cost is dominated by synchronization. outperform time sharing by 10% outperform time sharing by 48%

29 Optimization: Early Emission of Reduction Object
Motivation: Mainly considers window-based analytics, e.g., moving average A large # of reduction objects to maintain -> high memory consumption Key Insight: Most reduction objects can be finalized in the reduction phase Set a customizable trigger: outputs these reduction objects (locally) as early as possible


Download ppt "Smart: A MapReduce-Like Framework for In-Situ Scientific Analytics"

Similar presentations


Ads by Google