Presentation is loading. Please wait.

Presentation is loading. Please wait.

Versioning Architectures for Local and Global Memory Hajime Fujita 123, Kamil Iskra 2, Pavan Balaji 2, Andrew A. Chien 12 1 University of Chicago, 2 Argonne.

Similar presentations


Presentation on theme: "Versioning Architectures for Local and Global Memory Hajime Fujita 123, Kamil Iskra 2, Pavan Balaji 2, Andrew A. Chien 12 1 University of Chicago, 2 Argonne."— Presentation transcript:

1 Versioning Architectures for Local and Global Memory Hajime Fujita 123, Kamil Iskra 2, Pavan Balaji 2, Andrew A. Chien 12 1 University of Chicago, 2 Argonne National Laboratory 3 Intel Dec 15, 2015Hajime Fujita, ICPADS 20151

2 Funding Acknowledgment and Legal Disclaimers Dec 15, 2015Hajime Fujita, ICPADS 20152 Intel and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. *Other names and brands may be claimed as the property of others. This work was supported by the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Award DE-SC0008603 and Contract DE-AC02-06CH11357 and completed in part with resources provided by the University of Chicago Research Computing Center.

3 Background High error rate in large-scale supercomputers Growing concern about latent errors (e.g. silent data corruption ) o Errors that have latency between their occurrence and detection Multi-versioned data store being a promising approach to address latent errors Dec 15, 2015Hajime Fujita, ICPADS 20153

4 How Multi-version Helps? Multi-versioning enables flexible recovery from latent errors Dec 15, 2015Hajime Fujita, ICPADS 20154 version corrupted version corrupted version corrupted checkpoint corrupted checkpoint error occurred version corrupted version corrupted version Error detected Error detected Start Error detected restored v ersion restored v ersion Rollback new state Recovery using a part of an old version (b) Rollback using an old version (c) Forward error correction using an old version (a) Traditional checkpoint/restart Restart

5 Programming with GVR Globally-shared, multi-version array for application state preservation Explicit library calls for array manipulation/version creation Dec 15, 2015Hajime Fujita, ICPADS 20155 Put Get Put Version 2 Version 1... Array A Array B Process Put Get (Global View Resilience)

6 Many Versions are Partial Updates Dec 15, 20156Hajime Fujita, ICPADS 2015 Opportunity for saving storage/bandwidth requirements H.Fujita, et al., Log-Structured Global Array for Efficient Multi-Version Snapshots, CCGrid 2015

7 How to Make Versions Efficiently? Dec 15, 2015Hajime Fujita, ICPADS 20157 Approach 1: Copy entire array each time Current Old Current Old Approach 2: Keep updated data only Current Old Approach 3: Allocate memory block on-demand Runtime Overhead Memory Savings Low High H.Fujita, et al., Empirical Comparison of Three Versioning Architecture, Cluster 2015

8 Approach 1: Flat Array Copy and keep entire array on each version creation Dec 15, 2015Hajime Fujita, ICPADS 20158 Current Version Version 1 Version 0 ✔ Simple structure, fast access ✖ High memory demand, copy overhead

9 Approach 2: Flat with Change Tracking Use a flat array for current version, then only record updated regions upon version creation Dec 15, 2015Hajime Fujita, ICPADS 20159 Current Version Version 1 Version 0 ✔ Relatively fast access, small footprint ✖ At least one full array, change tracking overhead

10 Approach 2: Flat with Change Tracking (Cont.) Detecting updated region User : GVR library records updates on write operations (e.g. put() or acc()) Kernel : Page write protection + page fault handling HW : Use CPU-based dirty-page tracking o Requires special modification to the kernel Dec 15, 2015Hajime Fujita, ICPADS 201510

11 Approach 2: Flat with Change Tracking (Cont.) Versioning directions (keep new or old values?) Dec 15, 2015Hajime Fujita, ICPADS 201511 Redo versioning Keep new values Undo versioning Keep original values One additional copy

12 Approach 3: Log-structured Array Allocate memory block on-demand o Allocated regions form a log o Log = data + metadata (index) Dec 15, 2015Hajime Fujita, ICPADS 201512 Current Version Version 1 Version 0 Log H.Fujita, et al., Log-Structured Global Array for Efficient Multi- Version Snapshots, CCGrid 2015 ✔ Small footprint ✖ High access overhead

13 Problem Statement Which array architecture brings the best performance and the lowest memory consumption, under varied workloads? Dec 15, 2015Hajime Fujita, ICPADS 201513 Flat Flat + change tracking Change tracking: user/kernel/HW Versioning direction: undo/redo Log-structured array Global Memory Versioning Local Memory Versioning

14 Evaluation 1: Runtime Performance of Local Memory Versioning Dec 15, 2015Hajime Fujita, ICPADS 201514 Flat Flat + change tracking Change tracking: user/kernel/HW Versioning direction: undo/redo Log-structured array Global Memory Versioning Local Memory Versioning

15 Local Memory Versioning: Setup RandomAccess benchmark from HPC Challenge o Repeat 8-byte load/store to uniformly random locations o Create a version at a certain interval Intel® Xeon® processor E5620 (2.4GHz, 4 core) Linux kernel 3.18.4 Applied a patch to enable access to dirty bit information, based on [Vasavada 2011] Dec 15, 2015Hajime Fujita, ICPADS 201515

16 Runtime Performance with Various Tracking Schemes Redo performance to relative to flat (no versioning) Dec 15, 2015Hajime Fujita, ICPADS 201516 Best tracking scheme Low frequency: kernel/HW High frequency: user =how many read/write ops per version

17 Runtime Performance with Different Versioning Directions Compare Undo vs Redo Redo performance relative to Undo Dec 15, 2015Hajime Fujita, ICPADS 201517 Redo is up to 22% slower due to extra copy

18 Evaluation 2: Runtime Performance and Memory Consumption of Global Memory Versioning Dec 15, 2015Hajime Fujita, ICPADS 201518 Flat Flat + change tracking Change tracking: user/kernel/HW Versioning direction: undo/redo Log-structured array Global Memory Versioning Local Memory Versioning

19 Synthetic Benchmark Get() and Put() to random locations + version creation Parameter: Versioning frequency Dec 15, 2015Hajime Fujita, ICPADS 201519 Environment: UChicago RCC Midway Intel® Xeon® processor E5-2670 (8 cores x2) InfiniBand FDR-10 MVAPICH2 (gcc) Based on APEX-Map [E. Strohmaier et al. 2004] LineLocalityExample HighRadix sort MediumN-body LowMatmul Array Index P0 P1 P2 Probability

20 Runtime Performance with Various Global Versioning Dec 15, 2015Hajime Fujita, ICPADS 201520 Flat with change tracking best for performance Throughput (Kops/s) #procs=32, block size=4096 B, array size=256 MiB/proc, read ratio=50% change tracking Medium locality (k=0.025)

21 Memory Usage with Various Global Versioning Dec 15, 2015Hajime Fujita, ICPADS 201521 Log-structured array best for memory usage Memory usage (MiB) #procs=32, block size=4096 B, array size=256 MiB/proc, read ratio=50%, versioning frequency=1e-5

22 Evaluation 3: Version Retrieval Cost Dec 15, 2015Hajime Fujita, ICPADS 201522 Flat Flat + change tracking Change tracking: user/kernel/HW Versioning direction: undo/redo Log-structured array Global Memory Versioning Local Memory Versioning

23 Version Retrieval Cost Partial retrieval o e.g. Localized recovery 1.Create 256 versions with certain fill ratio 2.Pick one version 3.Read from 10,000 random locations in that version Dec 15, 2015Hajime Fujita, ICPADS 201523 Full retrieval o e.g. Full rollback 1.Create 256 versions with certain fill ratio 2.Pick one version 3.Read the entire contents of that version version Get version Get

24 Full Version Retrieval Cost Dec 15, 2015Hajime Fujita, ICPADS 201524 Flat/log array have constant cost of version rollback Redo versioning is good at restoring older versions, whereas undo is good at newer versions

25 Partial Version Retrieval Cost Dec 15, 2015Hajime Fujita, ICPADS 201525 Flat/log array have more uniform, shorter latency Flat with tracking encounters higher variation and average latency Fill ratio = 1%

26 Related Work Log-structured file systems o LFS [Rosenblum 1992], PLFS [Bent 2009] o Focused on improving write performance, while our focus is in capturing writes Log-structured distributed data stores o RAMCloud [Ongaro 2011, Rumble 2014], SILT [Lim 2011], Pilaf [Mitchell 2013] o Similar structure to log-structured array o GVR is array-oriented (not KV-oriented) Incremental checkpointing o [Plank 1995], TICK [Gioiosa 2005], [Agarwal 2004] o Not focusing on RDMA, a new challenge to transparent change tracking Dec 15, 2015Hajime Fujita, ICPADS 201526

27 Summary Compared local and global memory versioning architectures for efficient versioning Findings from evaluation o Flat with change tracking: best performance in most cases o Log-structured array: best choice for memory savings, uniform and low-cost recovery Future Work o Analysis of data redundancy inside the array, seeking a way to harden the array (e.g. error correction coding) o Investigation on hardware/software architecture that allows fine-grain, efficient change tracking on remote memory Dec 15, 2015Hajime Fujita, ICPADS 201527 http://gvr.cs.uchicago.edu

28 Backup Dec 15, 2015Hajime Fujita, ICPADS 201528

29 Fine-grain Comparison on Memory Change Tracking (1) Memory access latency of the first write to each page Dec 15, 2015Hajime Fujita, ICPADS 201529 Kernel change tracking has higher latency due to page fault handling

30 Performance Comparison (2) Performance over various versioning frequency, RMA, #procs=32, block size=4096B, array size=512MB/proc, read ratio=50% Log-structured array works better for localized (smaller k) access pattern Dec 15, 2015Hajime Fujita, ICPADS 201530

31 Memory Consumption Dec 15, 2015Hajime Fujita, ICPADS 201531 Log-structured array requires the least amount of memory Undo versioning requires additional memory for the undo buffer Flat array requires fixed amount of memory, regardless of locality For flat with tracking and log array, higher locality incurs lower memory consumption

32 Incremental/decremental Dec 15, 2015Hajime Fujita, ICPADS 201532

33 Full Version Retrieval Cost Dec 15, 2015Hajime Fujita, ICPADS 201533 Flat/log array have constant cost of version rollback Redo versioning is good at restoring older versions, whereas undo is good at newer versions


Download ppt "Versioning Architectures for Local and Global Memory Hajime Fujita 123, Kamil Iskra 2, Pavan Balaji 2, Andrew A. Chien 12 1 University of Chicago, 2 Argonne."

Similar presentations


Ads by Google