Parallel and Distributed Simulation Time Parallel Simulation.

Slides:



Advertisements
Similar presentations
Song Jiang1 and Xiaodong Zhang1,2 1College of William and Mary
Advertisements

Solving IPs – Cutting Plane Algorithm General Idea: Begin by solving the LP relaxation of the IP problem. If the LP relaxation results in an integer solution,
Lecture 19: Parallel Algorithms
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
1 Parallel Parentheses Matching Plus Some Applications.
Parallel and Distributed Simulation Time Warp: Basic Algorithm.
1 Memory Management Managing memory hierarchies. 2 Memory Management Ideally programmers want memory that is –large –fast –non volatile –transparent Memory.
Parallel and Distributed Simulation Time Warp: State Saving.
CSC1016 Coursework Clarification Derek Mortimer March 2010.
CSE621/JKim Lec4.1 9/20/99 CSE621 Parallel Algorithms Lecture 4 Matrix Operation September 20, 1999.
CS 284a, 4 November 1997 Copyright (c) , John Thornley1 CS 284a Lecture Tuesday, 4 November, 1997.
ECE7995 Caching and Prefetching Techniques in Computer Systems Lecture 8: Buffer Cache in Main Memory (IV)
Computer Organization and Architecture
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Computer Architecture Lecture 28 Fasih ur Rehman.
1 I/O Management and Disk Scheduling Chapter Categories of I/O Devices Human readable Used to communicate with the user Printers Video display terminals.
7-1 Chapter 7 - Memory Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles of Computer Architecture.
7-1 Chapter 7 - Memory Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of Computer Architecture Miles.
Time Parallel Simulations II ATM Multiplexers and G/G/1 Queues.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
© 2003, Carla Ellis Simulation Techniques Overview Simulation environments emulation exec- driven sim trace- driven sim stochastic sim Workload parameters.
Subject: Operating System.
1 CSE 326: Data Structures Sorting It All Out Henry Kautz Winter Quarter 2002.
Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.
09/21/2010CS4961 CS4961 Parallel Programming Lecture 9: Red/Blue and Introduction to Data Locality Mary Hall September 21,
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Fall 2013.
Computer Architecture Lecture 26 Fasih ur Rehman.
Improving Cache Performance by Exploiting Read-Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez.
SIMULATION OF A SINGLE-SERVER QUEUEING SYSTEM
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
Lecture 5 Cache Operation ECE 463/521 Fall 2002 Edward F. Gehringer Based on notes by Drs. Eric Rotenberg & Tom Conte of NCSU.
Lecture 40: Review Session #2 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
Virtual Memory The memory space of a process is normally divided into blocks that are either pages or segments. Virtual memory management takes.
Time Parallel Simulations I Problem-Specific Approach to Create Massively Parallel Simulations.
Data Structures and Algorithms in Parallel Computing Lecture 2.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
CONCURRENT SIMULATION: A TUTORIAL Christos G. Cassandras Dept. of Manufacturing Engineering Boston University Boston, MA Scope of.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Chapter 9 Sorting. The efficiency of data handling can often be increased if the data are sorted according to some criteria of order. The first step is.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Cache Operation.
CAM Content Addressable Memory
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms.
COT 4600 Operating Systems Spring 2011 Dan C. Marinescu Office: HEC 304 Office hours: Tu-Th 5:00-6:00 PM.
Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.
Lecture 5 Cache Operation
PDES Introduction The Time Warp Mechanism
Translation Lookaside Buffer
Virtual memory.
Cache Memory.
Xing Cai University of Oslo
Computer Architecture
Multilevel Memories (Improving performance using alittle “cash”)
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Data Partition Dr. Xiao Qin Auburn University.
William Stallings Computer Organization and Architecture 7th Edition
Lecture 22: Parallel Algorithms
Operating Systems.
Translation Lookaside Buffer
Parallel and Distributed Simulation
Cache - Optimization.
Cache Memory and Performance
Chapter Contents 7.1 The Memory Hierarchy 7.2 Random Access Memory
Presentation transcript:

Parallel and Distributed Simulation Time Parallel Simulation

Outline Space-Time Framework Time Parallel Simulation Parallel Cache Simulation Simulation Using Parallel Prefix

Space-Time Framework A simulation computation can be viewed as computing the state of the physical processes in the system being modeled over simulated time. algorithm: 1. partition space-time region into non-overlapping regions 2. assign each region to a logical process 3. each LP computes state of physical system for its region, using inputs from other regions and producing new outputs to those regions 4. repeat step 3 until a fixed point is reached simulated time physical processes region in space-time diagram inputs to region LP 6 LP 5 LP 4 LP 3 LP 2 LP 1 simulated time physical processes space-parallel simulation (e.g., Time Warp)

Time Parallel Simulation Basic idea: divide simulated time axis into non-overlapping intervals each processor computes sample path of interval assigned to it Observation: The simulation computation is a sample path through the set of possible states across simulated time. simulated time possible system states processor 1 processor 2 processor 3 processor 4 processor 5 Key question: What is the initial state of each interval (processor)?

simulated time possible system states processor 1 processor 2 processor 3 processor 4 processor 5 Time Parallel Simulation: Relaxation Approach Benefit: massively parallel execution Liabilities: cost of “fix up” computation, convergence may be slow (worst case, N iterations for N processors), state may be complex 3. using final state of previous interval as initial state, “fix up” sample path 4. repeat step 3 until a fixed point is reached 1. guess initial state of each interval (processor) 2. each processor computes sample path of its interval

Example: Cache Memory Cache holds subset of entire memory –Memory organized as blocks –Hit: referenced block in cache –Miss: referenced block not in cache Replacement policy determines which block to delete when requested data not in cache (miss) –LRU: delete least recently used block from cache Implementation: Least Recently Used (LRU) stack Stack contains address of memory (block number) For each memory reference in input (memory ref trace) –if referenced address in stack (hit), move to top of stack –if not in stack (miss), place address on top of stack, deleting address at bottom

processor 1processor 2processor 3 address: LRU Stack: second iteration: processor i uses final state of processor i-1 as initial state address: first iteration: assume stack is initially empty: Example: Trace Drive Cache Simulation processor 1processor 2processor 3 LRU Stack: (idle) Done! Given a sequence of references to blocks in memory, determine number of hits and misses using LRU replacement match!

Outline Space-Time Framework Time Parallel Simulation Parallel Cache Simulation Simulation Using Parallel Prefix

Time Parallel Simulation Using Parallel Prefix Basic idea: formulate the simulation computation as a linear recurrence, and solve using a parallel prefix computation parallel prefix: Give N values, compute the N initial products P 1 = X 1 P 2 = X 1 X 2 P i = X 1 X 2 X 3 … X i for i = 1,… N; is associative add value one position to the left X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X ,22,33,44,55,66,77, add value two positions to the left add value four positions to the left Parallel prefix requires O(log N) time

Example: G/G/1 Queue Example: G/G/1 queue, given r i = interarrival time of the ith job s i = service time of the ith job Compute A i = arrival time of the ith job D i = departure time of the ith job, for i=1, 2, 3, … N Solution: rewrite equations as parallel prefix computations: A i = A i-1 + r i (= r 1 + r 2 + r 3 + … r i ) D i = max (D i-1, A i ) + s i

Summary of Time Parallel Algorithms Con: only applicable to a very limited set of problems Pro: allows for massive parallelism often, little or no synchronization is required after spawning the parallel computations substantial speedups obtained for certain problems: queueing networks, caches, ATM multiplexers