Parallel and Distributed Simulation Time Parallel Simulation.

Parallel and Distributed Simulation Time Parallel Simulation

Outline Space-Time Framework Time Parallel Simulation Parallel Cache Simulation Simulation Using Parallel Prefix

Space-Time Framework A simulation computation can be viewed as computing the state of the physical processes in the system being modeled over simulated time. algorithm: 1. partition space-time region into non-overlapping regions 2. assign each region to a logical process 3. each LP computes state of physical system for its region, using inputs from other regions and producing new outputs to those regions 4. repeat step 3 until a fixed point is reached simulated time physical processes region in space-time diagram inputs to region LP 6 LP 5 LP 4 LP 3 LP 2 LP 1 simulated time physical processes space-parallel simulation (e.g., Time Warp)

Time Parallel Simulation Basic idea: divide simulated time axis into non-overlapping intervals each processor computes sample path of interval assigned to it Observation: The simulation computation is a sample path through the set of possible states across simulated time. simulated time possible system states processor 1 processor 2 processor 3 processor 4 processor 5 Key question: What is the initial state of each interval (processor)?

simulated time possible system states processor 1 processor 2 processor 3 processor 4 processor 5 Time Parallel Simulation: Relaxation Approach Benefit: massively parallel execution Liabilities: cost of “fix up” computation, convergence may be slow (worst case, N iterations for N processors), state may be complex 3. using final state of previous interval as initial state, “fix up” sample path 4. repeat step 3 until a fixed point is reached 1. guess initial state of each interval (processor) 2. each processor computes sample path of its interval

Example: Cache Memory Cache holds subset of entire memory –Memory organized as blocks –Hit: referenced block in cache –Miss: referenced block not in cache Replacement policy determines which block to delete when requested data not in cache (miss) –LRU: delete least recently used block from cache Implementation: Least Recently Used (LRU) stack Stack contains address of memory (block number) For each memory reference in input (memory ref trace) –if referenced address in stack (hit), move to top of stack –if not in stack (miss), place address on top of stack, deleting address at bottom

processor 1processor 2processor 3 address: LRU Stack: second iteration: processor i uses final state of processor i-1 as initial state 1 2 1 3 4 3 6 7 2 1 2 6 9 3 3 6 4 2 3 1 7 2 7 4 address: first iteration: assume stack is initially empty: 1 2 1 3 4 3 6 7 2 1 2 6 9 3 3 6 4 2 3 1 7 2 7 4 Example: Trace Drive Cache Simulation 1---1--- 2---2--- 4---4--- 21--21-- 12--12-- 24--24-- 12--12-- 21--21-- 324-324- 312-312- 621-621- 13241324 43124312 96219621 71327132 34123412 39623962 27132713 63416341 39623962 72137213 76347634 63926392 47214721 processor 1processor 2processor 3 LRU Stack: 12761276 24632463 21762176 32463246 27632763 46394639 (idle) Done! Given a sequence of references to blocks in memory, determine number of hits and misses using LRU replacement 96219621 match! 96219621 62176217 13241324 13241324

Outline Space-Time Framework Time Parallel Simulation Parallel Cache Simulation Simulation Using Parallel Prefix

Time Parallel Simulation Using Parallel Prefix Basic idea: formulate the simulation computation as a linear recurrence, and solve using a parallel prefix computation parallel prefix: Give N values, compute the N initial products P 1 = X 1 P 2 = X 1 X 2 P i = X 1 X 2 X 3 … X i for i = 1,… N; is associative add value one position to the left X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 +++++++ 1,22,33,44,55,66,77,8 1 ++++++ 1-31-42-53-64-75-8 add value two positions to the left 1-2 ++++ 1-51-61-71-81-31-4 add value four positions to the left Parallel prefix requires O(log N) time

Example: G/G/1 Queue Example: G/G/1 queue, given r i = interarrival time of the ith job s i = service time of the ith job Compute A i = arrival time of the ith job D i = departure time of the ith job, for i=1, 2, 3, … N Solution: rewrite equations as parallel prefix computations: A i = A i-1 + r i (= r 1 + r 2 + r 3 + … r i ) D i = max (D i-1, A i ) + s i

Summary of Time Parallel Algorithms Con: only applicable to a very limited set of problems Pro: allows for massive parallelism often, little or no synchronization is required after spawning the parallel computations substantial speedups obtained for certain problems: queueing networks, caches, ATM multiplexers

Parallel and Distributed Simulation Time Parallel Simulation.

Similar presentations

Presentation on theme: "Parallel and Distributed Simulation Time Parallel Simulation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel and Distributed Simulation Time Parallel Simulation.

Similar presentations

Presentation on theme: "Parallel and Distributed Simulation Time Parallel Simulation."— Presentation transcript:

Similar presentations

About project

Feedback