RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors.

Slides:



Advertisements
Similar presentations
The University of Adelaide, School of Computer Science
Advertisements

Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Back-end Timing Models Core Models.
Multiple Processor Systems
CS 258 Parallel Computer Architecture Lecture 15.1 DASH: Directory Architecture for Shared memory Implementation, cost, performance Daniel Lenoski, et.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Directory-Based Caches I Steve Ko Computer Sciences and Engineering University at Buffalo.
The University of Adelaide, School of Computer Science
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections )
1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.
Lecture 13: Consistency Models
1 Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections )
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
1 Lecture 2: Intro and Snooping Protocols Topics: multi-core cache organizations, programming models, cache coherence (snooping-based)
1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Multiprocessor Cache Coherency
Spring 2003CSE P5481 Cache Coherency Cache coherent processors reading processor must get the most current value most current value is the last write Cache.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor Computer System Laboratory Stanford University Daniel Lenoski, James Laudon, Kourosh.
Distributed Shared Memory Systems and Programming
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Quantifying and Comparing the Impact of Wrong-Path Memory References in Multiple-CMP Systems Ayse Yilmazer, University of Rhode Island Resit Sendag, University.
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
Performance of the Shasta distributed shared memory protocol Daniel J. Scales Kourosh Gharachorloo 創造情報学専攻 M グェン トアン ドゥク.
Analytic Evaluation of Shared-Memory Systems with ILP Processors Daniel J. Sorin, Vijay S. Pai, Sarita V. Adve, Mary K. Vernon, and David A. Wood Presented.
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Cache Coherence Protocols A. Jantsch / Z. Lu / I. Sander.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
ECE 4100/6100 Advanced Computer Architecture Lecture 13 Multiprocessor and Memory Coherence Prof. Hsien-Hsin Sean Lee School of Electrical and Computer.
The University of Adelaide, School of Computer Science
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP processors Vijay S. Pai, Parthsarthy Ranganathan, Sarita Adve and Tracy.
1 Lecture 7: PCM Wrap-Up, Cache coherence Topics: handling PCM errors and writes, cache coherence intro.
1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
EE 382 Processor DesignWinter 98/99Michael Flynn 1 EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I.
Elec/Comp 526 Spring 2015 High Performance Computer Architecture Instructor Peter Varman DH 2022 (Duncan Hall) rice.edux3990 Office Hours Tue/Thu.
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
COSC6385 Advanced Computer Architecture
Architecture and Design of AlphaServer GS320
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
12.4 Memory Organization in Multiprocessor Systems
5.2 Eleven Advanced Optimizations of Cache Performance
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 25: Multiprocessors
High Performance Computing
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors

Introduction F RSIM - the Rice Simulator for ILP Multiprocessors RSIM is a discrete event-driven simulator based on YACSIM library F Purpose Primarily designed to study shared- memory multiprocessor architectures built from state-of-the-art processors

Architecture Features n Processor Microarchitecture n The Cache and Memory System n The Multiprocessor System

Processor features n Multiple instruction issue n Out-of-order (dynamic) scheduling n Register renaming n Static and dynamic branch prediction support n Non-blocking loads and stores n Speculative load execution before address disambiguation of previous stores n Simple and optimized memory consistency implementations

RSIM Processor Microarchitecture

Memory hierarchy features n Two-level cache hierarchy n Multiported and pipelined L1 cache, pipelined L2 cache n Multiple outstanding cache requests n Memory interleaving n Software-controlled non-binding prefetching

Multiprocessor system features n CC-NUMA shared-memory system with directory-based cache-coherence protocol n Support for MSI or MESI coherence protocols n Support for sequential consistency, processor consistency, and release consistency n Wormhole-routed mesh network

The RSIM Memory System

RSIM Implementation n Event-driven simulation library n Processor out-of-order execution engine n Processor memory unit n Cache hierarchy n Directory and memory module n Interconnection system

The RSIM Memory and Network System n Memory hierarchy and Interconnection system n Cache hierarchy n Directory and Memory Simulation n System Interconnects

Memory Hierarchy and Interconnection System

Cache hierarchy n First Level of Cache-L1 u Either write-through with no-write- allocate or write-back with write- allocate n Second level of Cache u Write-back with write-allocate u Maintaining inclusion of L1

Cache coherence Protocol n MSI u An explicit upgrade message is required n MESI u A message to be sent to the directory on elimination of an exclusive line from the L2 cache is required.

Supported Cache Coherence Protocols

Directory and Memory Simulation n The directory is responsible for maintaining the current state of a cache line, serializing accesses to each line, generating and collecting coherence messages, sending replies, and handling race conditions. n The directory coherence protocol used in RSIM relies on cache-to-cache transfers and uses replacement messages

System Interconnects n Node bus u Connects L2 cache, network interface,and the directory/memory modules within node n Network Interface Modules u modules that connect each node’s local bus to the interconnection network n Multiprocessor Interconnection Network u Separates request and reply networks for deadlock -avoidance

Statistics in RSIM n Overall performance statistics n Other processor statistics n Cache, memory, and network statistics