Lu Peng, Jih-Kwon Peir, Konrad Lai

Lu Peng, Jih-Kwon Peir, Konrad Lai
Signature Buffer: Bridging Performance Gap between Registers and Caches Lu Peng, Jih-Kwon Peir, Konrad Lai

Introduction Two types of storage
Registers Fast and small Supply data for operations Memory Large and slow Cache for recently used data Most RISC only operates on data from registers Data communication path Producer -> store -> load -> consumer

Introduction Future processors with 35nm technology 10 GHz clock
64 KB L1 cache 3-7 cycles L1 cache access time IPC degrades by 3.5% per additional cycle on L1 cache access time

Signature Buffer Zero-cycle load Avoid address calculation
“The load and its dependent instructions can be fetched, dispatched and executed at the same time” Avoid address calculation Each load and store uses a signature for accessing the storage The signature buffer can be accessed in early pipeline stages A signature consists of, Color of the base register Displacement value

Outline Motivation Implementation Performance evaluation

Motivation – Memory Reference Correlations
Signature correlations Store-load and load-load can be correlated directly by the signature Signature reference locality Nearby memory references often differ by small displacement value with the same base register

Example 1 Signature correlations Signature reference locality
Source and Assembly Codes of Function copy_disjunct from Parser

Example 2 Source and Assembly Codes of Function bsW from Bzip

Signature Buffer

Signature Buffer Initial State 1 2 3 32

Signature Buffer 1 2 -> 32 1 100 3 32 -> 33

Data Alignment

Data Alignment SB Directory SB Data Array SB tag L1 tag Valid Bound
L1 Data Array L1 Tag Array Tag Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000

Data Alignment SB MISS! SB Directory SB Data Array SB tag L1 tag Valid
Bound A C I-V 101 001 L1 Data Array L1 Tag Array 100 SB MISS! Tag C D Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000

Bound A C V-V 101 101 001 L1 Data Array L1 Tag Array 100 000 SB MISS! Tag C D Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000

Bound A C V-V 101 B D I-V 101 001 010 L1 Data Array L1 Tag Array 100 101 000 SB MISS! Tag C D Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000

Data Alignment SB MISS! Invalidate high A, low B SB Directory
SB Data Array SB tag L1 tag Valid Bound A C I-V 101 B D I-I 101 001 010 L1 Data Array L1 Tag Array 100 101 000 SB MISS! Invalidate high A, low B Tag C D Requests (Signature): A-001 -> A-101 -> B-010 -> X-000 (Real Address) : C D D D-000

Microarchitecture Bypass I Bypass II
SB hit or an early store-load forwarding Bypass II Normal store-load forwarding

Microarchitecture

Performance Evaluation

Performance Evaluation – IPC
SB – nospec 13% speedup SB – perfect 14% speedup

Performance Evaluation – Load Distribution
Normal S-L Forw. & L1 access reduced t0 30%, 70% of loads benefit from SB SB With perfect memory dependence predictor obtains 23% zero-cycle load

Performance Evaluation – SB Hit Ratio
Average SB hit rate is about 51%

Performance Evaluation – Comparison with L0 Cache
Performance benefit of SB goes up with L1 latency and always above having a L0 cache

Performance Evaluation – Comparison with L0 Cache
Larger L0 => higher hit rate SB is less sensitive to size.

Advantages Non-speculative
Data obtained from the SB without intervening stores is always correct All loads can access the data from the SB without any restriction on the type of the loads or base registers. Loads through the SB can bypass the address generation and cache access completely. Store/Load correlation is established from the instruction encoding bits to simplify hardware requirement. SB uses line-based granularity to capture spatial locality.

Questions?

Loads – SB Specific Early S-L forwarding Early SB access
A load has identical signature with an early store in the LSQ with no intervening store in between. (zero-cycle load & SB hit) Early SB access SB is accessed after a load is fetched and decoded (zero-cycle load & SB hit) Delayed SB access SB is accessed after memory dependence resolutions because of intervening stores (SB hit) Non-Signature Forwarding Consecutive SB misses to the same SB line gets forwarded data from previous misses (SB miss)

Lu Peng, Jih-Kwon Peir, Konrad Lai

Similar presentations

Presentation on theme: "Lu Peng, Jih-Kwon Peir, Konrad Lai"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lu Peng, Jih-Kwon Peir, Konrad Lai

Similar presentations

Presentation on theme: "Lu Peng, Jih-Kwon Peir, Konrad Lai"— Presentation transcript:

Similar presentations

About project

Feedback