Download presentation
Presentation is loading. Please wait.
Published byClementine McDonald Modified over 9 years ago
1
Low-Power Cache Organization Through Selective Tag Translation for Embedded Processors with Virtual Memory Support Xiangrong Zhou and Peter Petrov Proceedings of the 16th ACM Great Lakes Symposium on VLSI (GLSVLSI '06) pp , Apr. 2006 Citation Count: 6 Presenter: Chun-Hung Lai 2017/4/24
2
Abstract In this paper we present a novel cache architecture for energy efficient data caches in embedded processors with virtual memory. Application knowledge regarding the nature of memory references is used to eliminate tag address translations for most of the cache accesses. We introduce a novel cache tagging scheme, where both virtual and physical tags co-exist in the cache tag arrays. Physical tags and special handling for the super-set cache index bits are used for references to shared data regions in order to avoid cache consistency problems. By eliminating the need for address translation on cache access for the majority of references, a significant power reduction is achieved. We outline an efficient hardware architecture for the proposed approach, where the application information is captured in a reprogrammable way and the cache architecture is minimally modified. Our experimental results show energy reductions for the address translation hardware in the range of 90%, while the reduction for the entire cache architecture is within the range of 25%-30%.
3
What’s the Problem Cache organization with virtual memory support is very power consuming The address translation (TLB lookup) is performed each time the cache (for PIPT and VIPT cache) is accessed The TLB power constitutes 20-25% of the total cache power Goal: reduce power by minimizing the number of address translation on cache accesses Propose a selective tag translation cache architecture For private data: can be handled with virtual tag (w.o. addr. translation) For shared data: the physical tag is required (need addr. translation) Different virtual addresses are mapped to the same physical address eliminating the TLB lookup on cache accesses Synonym problem:
4
Background- VIVT & PIPT
Virtually Indexed Virtually Tagged (VIVT) cache Pros: fast and low power (since no address trans. on cache access) Cons: Synonym Different virtual addresses (> 1 task) are mapped to the same physical address (shared data) For inter-process communication Physically Indexed Physically Tagged (PIPT) cache Pros: Synonym is no longer an issue Cons: delay and power overhead (address trans. for each cache access) Since virtual address is used to access $ - The shared data will in different blocks Cache Consistency Problem
5
Background- VIPT Virtually Indexed Physically Tagged (VIPT) cache
Pros: Hide address translation latency Since perform address translation only for tags Cache indexing can be overlapped with the tag trans. Can eliminate the cache synonym problem By imposing certain restriction to the OS memory manager Cons: Power overhead (address trans. for each cache access) Most typical cache architecture for general-purpose processor - Discuss in this paper
6
Selective Tag Translation Cache Architecture
Both virtual and physical tags are utilized at the same time All cache lines are virtually-indexed Non-shared data are tagged with virtual tags Shared data are tagged with physical tags Save power since no addr. translation is required Special Care Can be identified in advanced; physically tagged when placed in cache Virtual Index VPN Virtual Tag Different VAs are mapped to the same PA Physical Tag Mode bit
7
The Proposed Technique can Work Correctly When Synonyms are Aligned
What is the Aligned Synonym The superset bits of virtual address are identical to superset bits of physical address Thus, virtual index is the same as the physical index What is the superset bits (or color bits) The intersection bits of cache index and VPN When the cache way size is larger than the page size Virtually Indexed Physically Tagged (VIPT) = Physically Indexed Physically Tagged (PIPT) The MSB of virtual index overlaps with VPN Virtual address: To eliminate synonym problem in VIPT: Align synonyms in OS memory manager When used to access $:
8
However, When the Synonyms are Not Aligned
The virtual superset bits are not identical to the physical superset bits Conflict with other virtual indexes which don’t belong to the same synonym group Two virtual addresses have the same virtual superset bits The same virtual index If we use VIPT Indicate to the same $ line Same physical tag: - Misunderstand they are the same However, they have different PPNs Physical tag part is the same Only the physical superset bits is different
9
To Avoid the Previous Conflict When Synonyms are Not Aligned
Goal: translate the virtual superset bits to the physical superset bits with minimal cost Add an offset to the virtual superset bits Since shared data buffer is allocated in consecutive physical addresses It is also mapped to consecutive virtual addresses Superset offset adder: - Translate physical superset bits with little delay Concatenate Physical Superset bits Page offset adder: Replace TLB to translate physical tag (power efficient) Physical Tag Virtual Tag
10
Compiler and OS Support
To apply the proposed scheme The shared data buffer and the hot-spots are identified During program profile, compile, and load phases Two extra bits are encoded in the memory reference instruction (which access shared data buffer) Case1: for the most frequently accessed shared data buffer Utilize the offset adjustment address translation method Index of the offset table is encoded in the memory reference inst. as well Case2: for the less frequently accessed shared data buffer Translate physical tag by the D-TLB Case3: for the non-shared data Handle with virtual tag For each shared buffer: An entry is reserved The offset is determined by OS Offset Table: Case2: Translate physical superset bits and physical tag by the D-TLB Benefit comes from here
11
No performance overhead
Hardware Support First, one additional bit is associated to each cache line Indicate a physical tag or virtual tag Second, implement the offset table Third, superset offset adder and page offset adder Translate the physical superset bits and PPN for synonym access The introduced delay is small The superset offset adder on cache access path is small (2 bits typically) Though, the page offset adder is longer The adder delay < TLB delay TLB is replaced with adder The Label Part (L) of each memory inst. Index of offset table The synonym bit (the previous 3 cases) - Use virtual or physical tag Offset Table: Offset table access and cache access are pipelined -> not critical path Access the offset table is outside the critical path: - Completed early in the pipeline No performance overhead
12
Overall Hardware Organization
The different address translation path are controlled by the L field of memory inst. Case1: For frequently used shared data - Use offset adjustment Virtual Superset bits Physical Superset bits Case2: For non-frequently used shared data - Default D-TLB VPN Physical Tag VPN Case3: For non-shared data - No power spend on address translation Virtual Tag
13
Assume we use D-TLB for physical tag translation
Experimental Results: Energy Reduction for Selective Tag Translation Cache- 1/2 Assume we use D-TLB for physical tag translation dm: direct-mapped cache 2way: 2-way set associative cache A pair of number: 1st number: address translation only 2nd number: entire cache The energy reduction corresponding to a direct-mapped $ For the address translation only: 77.8% ~ 99.3% For the entire cache including address translation: 22.1% ~ 29.4%
14
The energy reduction corresponding to a direct-mapped $
Experimental Results: Energy Reduction for Selective Tag Translation Cache- 2/2 Assume offset adjustment translation for physical superset bits and PPNs is applied The energy reduction corresponding to a direct-mapped $ For the address translation only: 82.1% ~ 99.9% For the entire cache including address translation: 23.6% ~ 29.6% dm: direct-mapped cache 2way: 2-way set associative cache A pair of number: 1st number: address translation only 2nd number: entire cache
15
Conclusions This paper proposed a selectively tagged cache architecture for low-power processors with virtual memory support References to private data Virtual tags are used and power consuming address translation is eliminated References to shared data Physical tags are used to avoid synonym problems Furthermore, due to the consecutive property of shared buffer allocation The address translation can be performed by a adder instead of TLB lookup The energy reduction can be improved further Results show that the proposed scheme Energy reduction for the entire cache: 25%~30%
16
Comment for This Paper The instruction set extension may not easy
The proposed scheme will add a label field for memory instruction Whether the unused bits in the instruction encoding are sufficient for the label field? The relationship between related works and the proposed work is not connected The step further with related works is not concrete Lack comparison with related works in experimental results The area and performance overhead are not listed The additional power consumption caused by the hardware modification is not listed, including Offset table Superset offset adder and page offset adder
17
Related Works ??? This paper:
Techniques for minimizing the power/performance overhead of TLB Reduce the amount of TLB activities Page Sharing table to TLB [2] Replace TLB with more scalable Synonym Lookaside Buffer [4] TLB supports up to two pages per entry [7] Redirect TLB accesses to a register which holds recent TLB entries [8] ??? This paper: Selective tag translation cache architecture
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.