Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lu Peng, Jih-Kwon Peir, Konrad Lai

Similar presentations


Presentation on theme: "Lu Peng, Jih-Kwon Peir, Konrad Lai"— Presentation transcript:

1 Lu Peng, Jih-Kwon Peir, Konrad Lai
Signature Buffer: Bridging Performance Gap between Registers and Caches Lu Peng, Jih-Kwon Peir, Konrad Lai

2 Introduction Two types of storage
Registers Fast and small Supply data for operations Memory Large and slow Cache for recently used data Most RISC only operates on data from registers Data communication path Producer -> store -> load -> consumer

3 Introduction Future processors with 35nm technology 10 GHz clock
64 KB L1 cache 3-7 cycles L1 cache access time IPC degrades by 3.5% per additional cycle on L1 cache access time

4 Signature Buffer Zero-cycle load Avoid address calculation
“The load and its dependent instructions can be fetched, dispatched and executed at the same time” Avoid address calculation Each load and store uses a signature for accessing the storage The signature buffer can be accessed in early pipeline stages A signature consists of, Color of the base register Displacement value

5 Outline Motivation Implementation Performance evaluation

6 Motivation – Memory Reference Correlations
Signature correlations Store-load and load-load can be correlated directly by the signature Signature reference locality Nearby memory references often differ by small displacement value with the same base register

7 Example 1 Signature correlations Signature reference locality
Source and Assembly Codes of Function copy_disjunct from Parser

8 Example 2 Source and Assembly Codes of Function bsW from Bzip

9 Signature Buffer

10 Signature Buffer Initial State 1 2 3 32

11 Signature Buffer 1 2 -> 32 1 100 3 32 -> 33

12 Data Alignment

13 Data Alignment SB Directory SB Data Array SB tag L1 tag Valid Bound
L1 Data Array L1 Tag Array Tag Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000

14 Data Alignment SB MISS! SB Directory SB Data Array SB tag L1 tag Valid
Bound A C I-V 101 001 L1 Data Array L1 Tag Array 100 SB MISS! Tag C D Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000

15 Data Alignment SB MISS! SB Directory SB Data Array SB tag L1 tag Valid
Bound A C V-V 101 101 001 L1 Data Array L1 Tag Array 100 000 SB MISS! Tag C D Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000

16 Data Alignment SB MISS! SB Directory SB Data Array SB tag L1 tag Valid
Bound A C V-V 101 B D I-V 101 001 010 L1 Data Array L1 Tag Array 100 101 000 SB MISS! Tag C D Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000

17 Data Alignment SB MISS! Invalidate high A, low B SB Directory
SB Data Array SB tag L1 tag Valid Bound A C I-V 101 B D I-I 101 001 010 L1 Data Array L1 Tag Array 100 101 000 SB MISS! Invalidate high A, low B Tag C D Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000

18 Microarchitecture Bypass I Bypass II
SB hit or an early store-load forwarding Bypass II Normal store-load forwarding

19 Microarchitecture

20 Performance Evaluation

21 Performance Evaluation – IPC
SB – nospec 13% speedup SB – perfect 14% speedup

22 Performance Evaluation – Load Distribution
Normal S-L Forw. & L1 access reduced t0 30%, 70% of loads benefit from SB SB With perfect memory dependence predictor obtains 23% zero-cycle load

23 Performance Evaluation – SB Hit Ratio
Average SB hit rate is about 51%

24 Performance Evaluation – Comparison with L0 Cache
Performance benefit of SB goes up with L1 latency and always above having a L0 cache

25 Performance Evaluation – Comparison with L0 Cache
Larger L0 => higher hit rate SB is less sensitive to size.

26 Advantages Non-speculative
Data obtained from the SB without intervening stores is always correct All loads can access the data from the SB without any restriction on the type of the loads or base registers. Loads through the SB can bypass the address generation and cache access completely. Store/Load correlation is established from the instruction encoding bits to simplify hardware requirement. SB uses line-based granularity to capture spatial locality.

27 Questions?

28 Loads – SB Specific Early S-L forwarding Early SB access
A load has identical signature with an early store in the LSQ with no intervening store in between. (zero-cycle load & SB hit) Early SB access SB is accessed after a load is fetched and decoded (zero-cycle load & SB hit) Delayed SB access SB is accessed after memory dependence resolutions because of intervening stores (SB hit) Non-Signature Forwarding Consecutive SB misses to the same SB line gets forwarded data from previous misses (SB miss)


Download ppt "Lu Peng, Jih-Kwon Peir, Konrad Lai"

Similar presentations


Ads by Google