Download presentation
Presentation is loading. Please wait.
1
Lu Peng, Jih-Kwon Peir, Konrad Lai
Signature Buffer: Bridging Performance Gap between Registers and Caches Lu Peng, Jih-Kwon Peir, Konrad Lai
2
Introduction Two types of storage
Registers Fast and small Supply data for operations Memory Large and slow Cache for recently used data Most RISC only operates on data from registers Data communication path Producer -> store -> load -> consumer
3
Introduction Future processors with 35nm technology 10 GHz clock
64 KB L1 cache 3-7 cycles L1 cache access time IPC degrades by 3.5% per additional cycle on L1 cache access time
4
Signature Buffer Zero-cycle load Avoid address calculation
“The load and its dependent instructions can be fetched, dispatched and executed at the same time” Avoid address calculation Each load and store uses a signature for accessing the storage The signature buffer can be accessed in early pipeline stages A signature consists of, Color of the base register Displacement value
5
Outline Motivation Implementation Performance evaluation
6
Motivation – Memory Reference Correlations
Signature correlations Store-load and load-load can be correlated directly by the signature Signature reference locality Nearby memory references often differ by small displacement value with the same base register
7
Example 1 Signature correlations Signature reference locality
Source and Assembly Codes of Function copy_disjunct from Parser
8
Example 2 Source and Assembly Codes of Function bsW from Bzip
9
Signature Buffer
10
Signature Buffer Initial State 1 2 3 32
11
Signature Buffer 1 2 -> 32 1 100 3 32 -> 33
12
Data Alignment
13
Data Alignment SB Directory SB Data Array SB tag L1 tag Valid Bound
L1 Data Array L1 Tag Array Tag Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000
14
Data Alignment SB MISS! SB Directory SB Data Array SB tag L1 tag Valid
Bound A C I-V 101 001 L1 Data Array L1 Tag Array 100 SB MISS! Tag C D Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000
15
Data Alignment SB MISS! SB Directory SB Data Array SB tag L1 tag Valid
Bound A C V-V 101 101 001 L1 Data Array L1 Tag Array 100 000 SB MISS! Tag C D Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000
16
Data Alignment SB MISS! SB Directory SB Data Array SB tag L1 tag Valid
Bound A C V-V 101 B D I-V 101 001 010 L1 Data Array L1 Tag Array 100 101 000 SB MISS! Tag C D Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000
17
Data Alignment SB MISS! Invalidate high A, low B SB Directory
SB Data Array SB tag L1 tag Valid Bound A C I-V 101 B D I-I 101 001 010 L1 Data Array L1 Tag Array 100 101 000 SB MISS! Invalidate high A, low B Tag C D Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000
18
Microarchitecture Bypass I Bypass II
SB hit or an early store-load forwarding Bypass II Normal store-load forwarding
19
Microarchitecture
20
Performance Evaluation
21
Performance Evaluation – IPC
SB – nospec 13% speedup SB – perfect 14% speedup
22
Performance Evaluation – Load Distribution
Normal S-L Forw. & L1 access reduced t0 30%, 70% of loads benefit from SB SB With perfect memory dependence predictor obtains 23% zero-cycle load
23
Performance Evaluation – SB Hit Ratio
Average SB hit rate is about 51%
24
Performance Evaluation – Comparison with L0 Cache
Performance benefit of SB goes up with L1 latency and always above having a L0 cache
25
Performance Evaluation – Comparison with L0 Cache
Larger L0 => higher hit rate SB is less sensitive to size.
26
Advantages Non-speculative
Data obtained from the SB without intervening stores is always correct All loads can access the data from the SB without any restriction on the type of the loads or base registers. Loads through the SB can bypass the address generation and cache access completely. Store/Load correlation is established from the instruction encoding bits to simplify hardware requirement. SB uses line-based granularity to capture spatial locality.
27
Questions?
28
Loads – SB Specific Early S-L forwarding Early SB access
A load has identical signature with an early store in the LSQ with no intervening store in between. (zero-cycle load & SB hit) Early SB access SB is accessed after a load is fetched and decoded (zero-cycle load & SB hit) Delayed SB access SB is accessed after memory dependence resolutions because of intervening stores (SB hit) Non-Signature Forwarding Consecutive SB misses to the same SB line gets forwarded data from previous misses (SB miss)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.