Download presentation
Presentation is loading. Please wait.
Published byBrandon O’Connor’ Modified over 6 years ago
1
Shoal: smart allocation and replication of memory for parallel programs
Stefan Kaestle, Reto Achermann, Timothy Roscoe, Tim Harris ATC’15 March 31st, 2016 Cho, Hyojae
2
CONTENTS Introduction Motivation Array
3
1. Introduction Memory allocation in NUMA multi-core machines
NUMA(Non-Uniform Memory Access)
4
1. Introduction Methods: Manual configuration by programmers
They struggle to develop software applying these techniques Programmers must repeatedly make manual changes Relying on automatic online monitoring to decide how to migrate data Maybe expensive Small number of optimizations
5
2. Motivation “memset()” considered harmful on multi-core
6
2. Motivation Shoal A system that abstracts memory access and provides rich programming interface It automatically tune data placement and access based on memory access patterns Programmers need not to know where the data is saved
7
2. Motivation Shoal A new interface for memory allocation including machine aware “malloc” call An abstraction for data access based on arrays. All implementations can be interchanged transparently without the need to change programs.
8
3. Array Array types Single-node allocation Distribution Replication
Allocates the entire array on the local node Distribution Allocate data split equally across NUMA nodes Replication Several copies of the array are allocated. Partitioning Allocate data where work units can be executed local
9
3. Array Selection of arrays
Maximize local access to minimize interconnect traffic Load-balance memory on all available controllers Partitioning If the array is only accessed via an index Replication If the array is read-only and fits into every NUMA node Otherwise use a uniform distribution
10
3. Array Selection of arrays
11
4. Implementation The Shoal runtime library
A high-level array representation based on C++ templates. A low-level, OS-specific backend
12
4. Implementation An example of high-level DSLs.
DSL : Domain-Specific Language Foreach (t: G.Nodes) means the nodes-array will be accessed sequentially, and with an index Sum(w: t.InNbrs) implies read-only, indexed accesses on in-neighbors array.
13
4. Implementation High-level compiler
High-level program Written in high-level parallel language Such as Green-Marl, OptiML High-level compiler It translates high-level code to low-level code Low-level code with array abstractions Written in C++ It uses Shoal’s abstraction to allocate and access memory At compile time the concrete choice of array implementation is not made.
14
4. Implementation Access patterns Shoal library OS-specific backends
A information about load/store patterns Read/write ratio Shoal library It takes care of selecting array implementations based on extracted access patterns OS-specific backends It runs on the Linux and Barrelfish OS currently.
15
5. Evaluation Goal: Comparison of Shore and a regular memory runtime
Shore’s array implementations Analyze shoal’s initialization cost Investigate the benefits of using a DMA engine for array copy
16
5. Evaluation Machines
17
5. Evaluation Scalability (Green-Marl) )
Almost 2x faster than the original implementation
18
5. Evaluation Scalability (PARSEC - Streamcluster)
One of the used arrays is replaced with Shoal array 4x faster than original implementation
19
5. Evaluation Use DMA engines
20
6. Conclusion Shoal, a library that
provides an array abstraction rich memory allocation functions allow automatic tuning of data placement and access depending on workload and machine characteristics 2x improvement for Green-Marl program without changing the Green-Marl input program
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.