Download presentation
Presentation is loading. Please wait.
Published byAudrey Floyd Modified over 9 years ago
1
ISLPED’99 International Symposium on Low Power Electronics and Design
Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption Koji Inoue, Tohru Ishihara, and Kazuaki Murakami Department of Computer Science and Communication Engineering Kyushu University
2
Conventional 4-Way Set-Associative Cache
Tag subarray Cache-line subarray Way 0 Way 1 Way 2 Way 3 Step1. Address Decode Decode circuit Step2.Read out of a tag and a line from each way Activate of word line Activate senseamps pre(dis)charge bit lines Total energy for an access for decode for I/O pin drive Ecache = Edecode + Ememory + Eio Step3. Tag comparison for SRAM access Hit Miss Step4.Provide the required data Step4.Cache replacement Activate of I/O pins
3
Phased 4-Way Set-Associative Cache for Low Energy Consumption
Energy consumption improvement by sacrificing the performance Step1. Address Decode Step2.Read out of only tags Cycle 1 Step3. Tag comparison Miss Hit Step4. Cache replacement Cycle 2 Step4.Read out of only the desired line Step5.Provide the required data
4
Way-Predicting Set-Associative Cache - Concept -
How can we achieve high-performance and low energy consumption at the same time? Fast access by reading out both of tag and line simultaneously Conventional : Good! Phased : Bad! Low energy by avoiding unnecessary line read access Conventional : Bad! Phased : Good! Predict which way has the data desired by the processor before the cache access is started
5
4Way-Predicting Set-Associative Cache - Operation -
Way Prediction (Cache-line Base MRU Algorithm) Step0.Way prediction Step1. Address decode Step2.Read out the predicted tag and line Cycle 1 Step3. Tag comparison Miss Prediction Hit Step4.Read out the remaining tags and lines Step4.End Cycle 2 Step5. Tag comparison Prediction Miss Cache Miss Step6.End Step6.Cache replacement
6
4Way-Predicting Set-Associative Cache - Organization -
MRU Algorithm
7
Evaluation Environment
Cache Models Conventional 4-way Set-Associative Cache (4SACache) Phased 4-way Set-Associative Cache (P4SACache) Way-Predicting 4-way Set-Associative Cache (WP4SACache) Cache Size : 16 K Byte, Cache-line Size : 32 Byte, Replacement Algorithm : LRU Evaluation Items Performance (Tcache): average number of clock cycles for an access Energy (Ecache) : average energy consumption for an access Energy consumed for accessing a tag-subarray Energy consumed for accessing a line-subarray Ecache ~ Ememory = Ntag x Etag + Ndata x Edata Ave. number of tag-subarray accessed for an access Ave. number of line-subarray accessed for an access
8
Static Analysis - Energy and Performance Expression -
4SACache P4SACache E4SACache EP4SACache 4 Etag + 4 Edata 4 Etag + Edata x CHR T4SACache TP4SACache 1 1 + 1 x CHR EWP4SACache WP4SACache (Etag + Edata) + (3 Etag + 3 Edata) x (1 - PHR) TWP4SACache CHR:Cache Hit Rate PHR:Prediction Hit Rate 1 + 1 x (1 - PHR)
9
Static Analysis - Best and Worst Case -
4SACache (Conventional) P4SACache (Phased) WP4SACache (Ours) Energy Consumption (Etag = 0.078Edata) Performance Compare with Conventional (4SACache) Best Case (PHR = 100%) : 75% energy improvement without any performance degradation Worst Case (PHR = 0%) : 100% performance overhead without any energy improvement
10
Experimental Analysis - Prediction Hit Rate -
11
Experimental Analysis - Result of Instruction Cache -
4SACache = 1.0 P4SACache Normalized Tcache WP4SACache (Our approach) Normalized Ecache
12
Experimental Analysis - Result of Data Cache -
4SACache = 1.0 P4SACache Normalized Tcache WP4SACache (Our approach) Normalized Ecache
13
Experimental Analysis - Energy and Performance -
Average of all benchmarks Conventional (4SACache) Phased (P4SACache) Way-Predicting (WP4SACache) 199.4% 195.8% 200 200 I-Cache D-Cache 113.0% 104.1% Normalized Results (%) 100 Normalized Results (%) 100 30.3% 29.4% 28.1% 35.2% Ecache Tcache Ecache Tcache
14
Cache Power Consumption
Cache Size trend Effect of on-chip caches to total chip power consumption DEC CPU* StrongARM SA-110 CPU* Bipolar ECL CPU** 25% 43% 50% * Kamble, et. Al., “Analytical energy Dissipatiion Models for Low Power Caches”, ILPED’97 ** Joouppi, et. Al., “A 300-MHz 115-W 32-b Bipolar ECL Microprocessor” ,IEEE Journal of Solid-State Circuits’93
15
Energy Consumption Model
Components of the power dissipation Bit line Word line Sense Amp Output driver Addr input Comparator Latche 32KB Direct-mapped I-Cache 32KB 4-way D-Cache Ememory=95.6% Ememory=97.7% Ghose, et. Al. : Energy Efficient Cache Organizations for Superscalar Processors, Power-Driven microarchitecture Workshop in Conjunction with ISCA’98 Average Energy Consumption for an access Energy consumed for accessing a tag-subarray Energy consumed for accessing a line-subarray Ecache ~ Ememory = Ntag x Etag + Ndata x Edata Ave. number of tag-subarray accessed for an access Ave. number of line-subarray accessed for an access
16
Experimental Analysis - Environment -
Benchmarks SPECint95 099.go, 124.m88ksim, 126.gcc, 129.compress, 130.li, 132.ijpeg, 134.perl, 147.vortex SPECfp95 101.tomcatv, 102.swim, 103.su2cor, 104.hydro2d
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.