Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supporting Cache Coherence in Heterogeneous Multiprocessor Systems

Similar presentations


Presentation on theme: "Supporting Cache Coherence in Heterogeneous Multiprocessor Systems"— Presentation transcript:

1 Supporting Cache Coherence in Heterogeneous Multiprocessor Systems
Taeweon Suh, Douglas M. Blough, and Hsien-Hsin S. Lee Georgia Institute of Technology

2 Introduction Cache Coherence
Well-known technique for data consistency among multiprocessor Shared memory MEI, MSI, MESI and MOESI protocols PowerPC755 : MEI protocol Pentium class: MESI protocol UltraSPARC: MOESI protocol AMD64 class: MOESI protocol Distributed shared memory Directory-based coherence

3 Motivation SoC capacity increases as lithography technology advances
Applications demand heterogeneous multiprocessor and/or IPs on a chip DiMeNsion 8650 (LSI logic) AD6525 (Analog Device) Nexperia pnx8500 (Philips) Snoop-based protocols fail to address coherence among heterogeneous processors

4 Contributions Systematic integration methods of distinct coherence protocols in heterogeneous multiprocessor SoC designs Performance improvements Possible power savings

5 Integration Methods Techniques to integrate coherence protocols
Read-to-Write conversion S (Shared) state removal Shared signal assertion / de-assertion E (Exclusive) / S (Shared) state removal Integrated coherence protocol Common states from distinct protocols ex) MEI, MESI integration: MEI protocol Snoop-hit Buffer Performance booster Power saving

6 Read-to-Write Conversion
S (Shared) state removal MEI – MESI integration example Operations on cache line X Proc1 (MEI) Proc2 (MESI) Wrapper 1 Wrapper 2 Proc 1 (MEI) Proc 2 (MESI) (1) P2 read (2) P1 read (3) P1 write (4) P2 read Without our technique I E I E S S (Stale) E M M (1) P2 read I E I Without our technique (2) P1 read E S I E Write Read/Write (3) P1 write S (Stale) E M Bus (4) P2 read S (Stale) M (1) P2 read (1) P2 read (1) P2 read (2) P1 read (4) P2 read (3) P1 write I E I Memory Controller With our technique With our technique (2) P1 read (2) P1 read E I I E (3) P1 write (3) P1 write E M I (4) P2 read (4) P2 read M I I E

7 Shared Signal Assertion
E (Exclusive) state removal MSI - MESI integration example Operations on cache line X Proc1 (MSI) Proc2 (MESI) Wrapper 1 Wrapper 2 Proc 1 (MSI) Proc 2 (MESI) (1) P1 read (2) P2 read (3) P2 write (4) P1 read Without our technique I S I S(Stale) M I E S E M (1) P1 read I S I Without our technique (2) P2 read I E S Shared Read (3) P2 write S(Stale) E M Bus (4) P1 read S(Stale) M (1) P1 read (1) P1 read (2) P2 read (3) P2 write (4) P1 read (1) P1 read I S I Memory Controller With Our technique With Our technique (2) P2 read (2) P2 read I S S (3) P2 write (3) P2 write S M I (4) P1 read (4) P1 read I S M S

8 Snoop-hit Buffer Snoop-hit on M-line requires 2 transactions intended for the same address Performance enhancement and power saving Proc 1 (MEI) Wrapper 1 Proc 2 (MESI) Memory Controller Wrapper 2 Bus Write-back To memory Read Read Snoop-hit Buffer (single cache line)

9 Simulation Environment
3 PowerPC755 (MEI) + 1 ARM920T (no coherence) Verilog-HDL implementation Simulators: Seamless CVE + VCS Baseline: Software solution Wrapper nFIQ ARM920T (None) PowerPC755 (MEI) Snoop logic ARTRY ASB Arbiter

10 Performance Evaluation (1/3)
Worst-case simulation Each task accesses the same critical sections 57 % 0.97 %

11 Performance Evaluation (2/3)
Best-case simulation Each task accesses different critical sections 426% 51%

12 Performance Evaluation (3/3)
Typical-case simulation Each task randomly selects critical sections 68% 22%

13 Performance Evaluation (3/3)
Typical-case simulation Each task randomly selects critical sections 226% 68% 26% 22%

14 Conclusions Propose an integration method of cache coherence protocols for heterogeneous processors Retain common states from distinct coherence protocols Performance improved by Up to 5.26X with 96-cycle miss penalty at the expense of simple hardware Possible power savings from snoop-hit buffer Useful and effective methods for heterogeneous multiprocessor SoC designs

15 Questions ? Thanks for your attention!

16 Backup Slides

17 Performance Evaluation (2/5)
Simulation environments (cont.) Baseline: software solution Lock mechanism: SoCLC [Bilge’02] Seamless CVE (Mentor Graphics) VCS (Synopsys) Simulators PowerPC755: 100MHz ARM920T: 50MHz ASB: 50MHz Operating Frequencies I$ / D$ Enabled Memory Access Time 6 cycles for 1st word 1 cycles for each subsequent word

18 Introduction (2/2) Cache Coherence Example PowerPC755: MEI protocol
#1 D$ Memory #2 #3 #4 32 GBL ARTRY TT ADDR

19 Implementation Examples (1/2)
Intel486: Modified MESI protocol PowerPC755: MEI protocol Intel486 (MESI) Wrapper PowerPC755 (MEI) Arbiter Bus INV ARTRY HLDA BOFF BREQ BG_BAR BR_BAR HOLD HITM

20 Implementation Examples (2/2)
PowerPC755: MEI protocol ARM920T: No cache coherence support Arbiter ASB ARM920T (None) PowerPC755 (MEI) Wrapper ARTRY BG_BAR BR_BAR Snoop logic BGNT BREQ nFIQ Problem: Hardware deadlock due to interrupt response time


Download ppt "Supporting Cache Coherence in Heterogeneous Multiprocessor Systems"

Similar presentations


Ads by Google