Supporting Cache Coherence in Heterogeneous Multiprocessor Systems

Slides:



Advertisements
Similar presentations
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
Advertisements

§ Georgia Institute of Technology, Intel Corporation Cache Coherence Support for Non-Shared Bus Architecture on Heterogeneous MPSoCs Taeweon Suh §, Daehyun.
Lecture 7. Multiprocessor and Memory Coherence
Exploring Memory Consistency for Massively Threaded Throughput- Oriented Processors Blake Hechtman Daniel J. Sorin 0.
Exploiting Access Semantics and Program Behavior to Reduce Snoop Power in Chip Multiprocessors Chinnakrishnan S. Ballapuram Ahmad Sharif Hsien-Hsin S.
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Manager-Client Pairing: A Framework for Implementing Coherence Hierarchies Jesse G. Beu Michael C. Rosier Thomas M. Conte Tinker Research Georgia Institute.
The University of Adelaide, School of Computer Science
Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors Karin StraussAMD Advanced Architecture and Technology.
1 Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Nov 14, 2005 Topic: Cache Coherence.
(C) 2004 Daniel SorinDuke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1, Milo M. K. Martin 2, Mark D. Hill 3, David.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Logical Protocol to Physical Design
INTRODUCTION TO MICROPROCESSORS
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
Presented By:- Prerna Puri M.Tech(C.S.E.) Cache Coherence Protocols MSI & MESI.
Spring EE 437 Lillevik 437s06-l21 University of Portland School of Engineering Advanced Computer Architecture Lecture 21 MSP shared cached MSI protocol.
Supporting Cache Coherence in Heterogeneous Multiprocessor Systems Taeweon Suh, Douglas M. Blough, and Hsien-Hsin S. Lee Georgia Institute of Technology.
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Evaluating the Performance of Four Snooping Cache Coherency Protocols Susan J. Eggers, Randy H. Katz.
Understanding Parallel Computers Parallel Processing EE 613.
컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts.
An FPGA Approach to Quantifying Coherence Traffic Efficiency on Multiprocessor Systems Taeweon Suh ┼, Shih-Lien L. Lu ¥, and Hsien-Hsin S. Lee § Platform.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
The Pentium Series CS 585: Computer Architecture Summer 2002 Tim Barto.
§ Georgia Institute of Technology, † Intel Corporation Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research Taeweon.
The University of Adelaide, School of Computer Science
Outline Introduction (Sec. 5.1)
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
COMP 740: Computer Architecture and Implementation
Outline Introduction Centralized shared-memory architectures (Sec. 5.2) Distributed shared-memory and directory-based coherence (Sec. 5.4) Synchronization:
Multiprocessing.
Processor support devices Part 2: Caches and the MESI protocol
Cache Organization of Pentium
Framework For Exploring Interconnect Level Cache Coherency
Software Coherence Management on Non-Coherent-Cache Multicores
CS 152 Computer Architecture and Engineering Lecture 18: Snoopy Caches
תרגול מס' 5: MESI Protocol
Lecture 21 Synchronization
Cache Coherence in Shared Memory Multiprocessors
Taeweon Suh §, Daehyun Kim †, and Hsien-Hsin S. Lee § June 15, 2005
Lecture 18: Coherence and Synchronization
12.4 Memory Organization in Multiprocessor Systems
INTRODUCTION TO MICROPROCESSORS
CMSC 611: Advanced Computer Architecture
Taeweon Suh § Hsien-Hsin S. Lee § Shih-Lien Lu † John Shen †
Cache Coherence (controllers snoop on bus transactions)
Lecture 2: Snooping-Based Coherence
Taeweon Suh ┼, Shih-Lien L. Lu ¥, and Hsien-Hsin S. Lee §
Comparison of Two Processors
Comparison of AMD64, IA-32e extensions and the Itanium architecture
Multiprocessors - Flynn’s taxonomy (1966)
Taeweon Suh §, Hsien-Hsin S. Lee §, Sally A. Mckee †,
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
/ Computer Architecture and Design
Lecture 25: Multiprocessors
Computer Evolution and Performance
High Performance Computing
Lecture 25: Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E
Lecture 19: Coherence and Synchronization
CSE 486/586 Distributed Systems Cache Coherence
Presentation transcript:

Supporting Cache Coherence in Heterogeneous Multiprocessor Systems Taeweon Suh, Douglas M. Blough, and Hsien-Hsin S. Lee Georgia Institute of Technology

Introduction Cache Coherence Well-known technique for data consistency among multiprocessor Shared memory MEI, MSI, MESI and MOESI protocols PowerPC755 : MEI protocol Pentium class: MESI protocol UltraSPARC: MOESI protocol AMD64 class: MOESI protocol Distributed shared memory Directory-based coherence

Motivation SoC capacity increases as lithography technology advances Applications demand heterogeneous multiprocessor and/or IPs on a chip DiMeNsion 8650 (LSI logic) AD6525 (Analog Device) Nexperia pnx8500 (Philips) Snoop-based protocols fail to address coherence among heterogeneous processors

Contributions Systematic integration methods of distinct coherence protocols in heterogeneous multiprocessor SoC designs Performance improvements Possible power savings

Integration Methods Techniques to integrate coherence protocols Read-to-Write conversion S (Shared) state removal Shared signal assertion / de-assertion E (Exclusive) / S (Shared) state removal Integrated coherence protocol Common states from distinct protocols ex) MEI, MESI integration: MEI protocol Snoop-hit Buffer Performance booster Power saving

Read-to-Write Conversion S (Shared) state removal MEI – MESI integration example Operations on cache line X Proc1 (MEI) Proc2 (MESI) Wrapper 1 Wrapper 2 Proc 1 (MEI) Proc 2 (MESI) (1) P2 read (2) P1 read (3) P1 write (4) P2 read Without our technique I E I E S S (Stale) E M M (1) P2 read I E I Without our technique (2) P1 read E S I E Write Read/Write (3) P1 write S (Stale) E M Bus (4) P2 read S (Stale) M (1) P2 read (1) P2 read (1) P2 read (2) P1 read (4) P2 read (3) P1 write I E I Memory Controller With our technique With our technique (2) P1 read (2) P1 read E I I E (3) P1 write (3) P1 write E M I (4) P2 read (4) P2 read M I I E

Shared Signal Assertion E (Exclusive) state removal MSI - MESI integration example Operations on cache line X Proc1 (MSI) Proc2 (MESI) Wrapper 1 Wrapper 2 Proc 1 (MSI) Proc 2 (MESI) (1) P1 read (2) P2 read (3) P2 write (4) P1 read Without our technique I S I S(Stale) M I E S E M (1) P1 read I S I Without our technique (2) P2 read I E S Shared Read (3) P2 write S(Stale) E M Bus (4) P1 read S(Stale) M (1) P1 read (1) P1 read (2) P2 read (3) P2 write (4) P1 read (1) P1 read I S I Memory Controller With Our technique With Our technique (2) P2 read (2) P2 read I S S (3) P2 write (3) P2 write S M I (4) P1 read (4) P1 read I S M S

Snoop-hit Buffer Snoop-hit on M-line requires 2 transactions intended for the same address Performance enhancement and power saving Proc 1 (MEI) Wrapper 1 Proc 2 (MESI) Memory Controller Wrapper 2 Bus Write-back To memory Read Read Snoop-hit Buffer (single cache line)

Simulation Environment 3 PowerPC755 (MEI) + 1 ARM920T (no coherence) Verilog-HDL implementation Simulators: Seamless CVE + VCS Baseline: Software solution Wrapper nFIQ ARM920T (None) PowerPC755 (MEI) Snoop logic ARTRY ASB Arbiter

Performance Evaluation (1/3) Worst-case simulation Each task accesses the same critical sections 57 % 0.97 %

Performance Evaluation (2/3) Best-case simulation Each task accesses different critical sections 426% 51%

Performance Evaluation (3/3) Typical-case simulation Each task randomly selects critical sections 68% 22%

Performance Evaluation (3/3) Typical-case simulation Each task randomly selects critical sections 226% 68% 26% 22%

Conclusions Propose an integration method of cache coherence protocols for heterogeneous processors Retain common states from distinct coherence protocols Performance improved by Up to 5.26X with 96-cycle miss penalty at the expense of simple hardware Possible power savings from snoop-hit buffer Useful and effective methods for heterogeneous multiprocessor SoC designs

Questions ? Thanks for your attention!

Backup Slides

Performance Evaluation (2/5) Simulation environments (cont.) Baseline: software solution Lock mechanism: SoCLC [Bilge’02] Seamless CVE (Mentor Graphics) VCS (Synopsys) Simulators PowerPC755: 100MHz ARM920T: 50MHz ASB: 50MHz Operating Frequencies I$ / D$ Enabled Memory Access Time 6 cycles for 1st word 1 cycles for each subsequent word

Introduction (2/2) Cache Coherence Example PowerPC755: MEI protocol #1 D$ Memory #2 #3 #4 32 GBL ARTRY TT ADDR

Implementation Examples (1/2) Intel486: Modified MESI protocol PowerPC755: MEI protocol Intel486 (MESI) Wrapper PowerPC755 (MEI) Arbiter Bus INV ARTRY HLDA BOFF BREQ BG_BAR BR_BAR HOLD HITM

Implementation Examples (2/2) PowerPC755: MEI protocol ARM920T: No cache coherence support Arbiter ASB ARM920T (None) PowerPC755 (MEI) Wrapper ARTRY BG_BAR BR_BAR Snoop logic BGNT BREQ nFIQ Problem: Hardware deadlock due to interrupt response time