Download presentation
Presentation is loading. Please wait.
Published byBonnie Cain Modified over 5 years ago
1
Supporting Cache Coherence in Heterogeneous Multiprocessor Systems
Taeweon Suh, Douglas M. Blough, and Hsien-Hsin S. Lee Georgia Institute of Technology
2
Introduction Cache Coherence
Well-known technique for data consistency among multiprocessor Shared memory MEI, MSI, MESI and MOESI protocols PowerPC755 : MEI protocol Pentium class: MESI protocol UltraSPARC: MOESI protocol AMD64 class: MOESI protocol Distributed shared memory Directory-based coherence
3
Motivation SoC capacity increases as lithography technology advances
Applications demand heterogeneous multiprocessor and/or IPs on a chip DiMeNsion 8650 (LSI logic) AD6525 (Analog Device) Nexperia pnx8500 (Philips) Snoop-based protocols fail to address coherence among heterogeneous processors
4
Contributions Systematic integration methods of distinct coherence protocols in heterogeneous multiprocessor SoC designs Performance improvements Possible power savings
5
Integration Methods Techniques to integrate coherence protocols
Read-to-Write conversion S (Shared) state removal Shared signal assertion / de-assertion E (Exclusive) / S (Shared) state removal Integrated coherence protocol Common states from distinct protocols ex) MEI, MESI integration: MEI protocol Snoop-hit Buffer Performance booster Power saving
6
Read-to-Write Conversion
S (Shared) state removal MEI – MESI integration example Operations on cache line X Proc1 (MEI) Proc2 (MESI) Wrapper 1 Wrapper 2 Proc 1 (MEI) Proc 2 (MESI) (1) P2 read (2) P1 read (3) P1 write (4) P2 read Without our technique I E I E S S (Stale) E M M (1) P2 read I E I Without our technique (2) P1 read E S I E Write Read/Write (3) P1 write S (Stale) E M Bus (4) P2 read S (Stale) M (1) P2 read (1) P2 read (1) P2 read (2) P1 read (4) P2 read (3) P1 write I E I Memory Controller With our technique With our technique (2) P1 read (2) P1 read E I I E (3) P1 write (3) P1 write E M I (4) P2 read (4) P2 read M I I E
7
Shared Signal Assertion
E (Exclusive) state removal MSI - MESI integration example Operations on cache line X Proc1 (MSI) Proc2 (MESI) Wrapper 1 Wrapper 2 Proc 1 (MSI) Proc 2 (MESI) (1) P1 read (2) P2 read (3) P2 write (4) P1 read Without our technique I S I S(Stale) M I E S E M (1) P1 read I S I Without our technique (2) P2 read I E S Shared Read (3) P2 write S(Stale) E M Bus (4) P1 read S(Stale) M (1) P1 read (1) P1 read (2) P2 read (3) P2 write (4) P1 read (1) P1 read I S I Memory Controller With Our technique With Our technique (2) P2 read (2) P2 read I S S (3) P2 write (3) P2 write S M I (4) P1 read (4) P1 read I S M S
8
Snoop-hit Buffer Snoop-hit on M-line requires 2 transactions intended for the same address Performance enhancement and power saving Proc 1 (MEI) Wrapper 1 Proc 2 (MESI) Memory Controller Wrapper 2 Bus Write-back To memory Read Read Snoop-hit Buffer (single cache line)
9
Simulation Environment
3 PowerPC755 (MEI) + 1 ARM920T (no coherence) Verilog-HDL implementation Simulators: Seamless CVE + VCS Baseline: Software solution Wrapper nFIQ ARM920T (None) PowerPC755 (MEI) Snoop logic ARTRY ASB Arbiter
10
Performance Evaluation (1/3)
Worst-case simulation Each task accesses the same critical sections 57 % 0.97 %
11
Performance Evaluation (2/3)
Best-case simulation Each task accesses different critical sections 426% 51%
12
Performance Evaluation (3/3)
Typical-case simulation Each task randomly selects critical sections 68% 22%
13
Performance Evaluation (3/3)
Typical-case simulation Each task randomly selects critical sections 226% 68% 26% 22%
14
Conclusions Propose an integration method of cache coherence protocols for heterogeneous processors Retain common states from distinct coherence protocols Performance improved by Up to 5.26X with 96-cycle miss penalty at the expense of simple hardware Possible power savings from snoop-hit buffer Useful and effective methods for heterogeneous multiprocessor SoC designs
15
Questions ? Thanks for your attention!
16
Backup Slides
17
Performance Evaluation (2/5)
Simulation environments (cont.) Baseline: software solution Lock mechanism: SoCLC [Bilge’02] Seamless CVE (Mentor Graphics) VCS (Synopsys) Simulators PowerPC755: 100MHz ARM920T: 50MHz ASB: 50MHz Operating Frequencies I$ / D$ Enabled Memory Access Time 6 cycles for 1st word 1 cycles for each subsequent word
18
Introduction (2/2) Cache Coherence Example PowerPC755: MEI protocol
#1 D$ Memory #2 #3 #4 32 GBL ARTRY TT ADDR
19
Implementation Examples (1/2)
Intel486: Modified MESI protocol PowerPC755: MEI protocol Intel486 (MESI) Wrapper PowerPC755 (MEI) Arbiter Bus INV ARTRY HLDA BOFF BREQ BG_BAR BR_BAR HOLD HITM
20
Implementation Examples (2/2)
PowerPC755: MEI protocol ARM920T: No cache coherence support Arbiter ASB ARM920T (None) PowerPC755 (MEI) Wrapper ARTRY BG_BAR BR_BAR Snoop logic BGNT BREQ nFIQ Problem: Hardware deadlock due to interrupt response time
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.