Download presentation
Presentation is loading. Please wait.
1
Store Recycling Function Experimental Results
Motivation Overview Verification Run Verification run checks correctness of recycling functions on smaller problem instance Production run executes task graph with verified recycling function on actual problem instance Emergence of multi/many-core architectures Need for effective parallel programming models Dynamic task graph scheduling task scheduling via work stealing memory management Our approach: recycle memory assigned to data-blocks among tasks via store recycling functions Recycling constraints T1 can recycle T2 if both T2 and all of T2’s uses causally precede T1 ensures no premature recycling Two tasks T1 and T2 can recycle the same task T3 only if T1 can recycle T2 (or vice versa) ensures no concurrent recycling Track causality relationships between tasks via vector clocks Auto exploration of recycling functions recycling candidates: immediate and transitive predecessors ask user the dependence structure, then enumerate all traversal paths Background Production Run Representation as a DAG vertices (tasks) edges (dependences) Guarantees no concurrent recycling is allowed Memory management single assignment: likely to run out of memory garbage-collection: requires use count or last use specification for each data-block a data-block recycled too early is recomputed through re-execution Store Recycling Function Experimental Results Setup Intel Xeon Phi: 61 cores (244 threads), 8GB memory Bench.: Cholesky, FW, Hotspot, LU, Rician, Srad, SW 3) Re-execution Overheads Maps a task T1 to another task T2 so that output of T1 occupies the same memory as the output of T2 Ex: Recycle(B) = A Challenges: characterization of the sufficient conditions for a recycling function to be correct - Recycle(B) = D - Recycle(B) = A and Recycle(C) = A determination of the most memory efficient recycling function given a set of candidates efficient representation of recycling functions during runtime guaranteeing correct execution for every possible schedule and problem instance 1) Comparison between single-assignment and recycling -- Auto recycling overheads 2) Recycling Function Verification Costs Incorrect execution Fig3. Re-execution overheads with a representative incorrect recycling function Fig1. Cholesky with small (left) and large problem instance (right) for varying number of threads Fig2. Associated costs at 61 threads Tab2. Number of recycling functions checked and verified as correct Fig4. Distribution of incorrect recycling functions to overhead bins Tab1. Memory consumption in MBs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.