Kyushu University Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems.

Slides:

Advertisements

Similar presentations

CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Advertisements

Performance of Cache Memory

1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.

Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.

CML CML Presented by: Aseem Gupta, UCI Deepa Kannan, Aviral Shrivastava, Sarvesh Bhardwaj, and Sarma Vrudhula Compiler and Microarchitecture Lab Department.

Chia-Yen Hsieh Laboratory for Reliable Computing Microarchitecture-Level Power Management Iyer, A. Marculescu, D., Member, IEEE IEEE Transaction on VLSI.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

A highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Walid Najjar* *University of California, Riverside **The.

A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang, Vahid F., Lysecky R. Proceedings of Design, Automation and Test in Europe Conference.

A Highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang, Frank Vahid and Walid Najjar University of California, Riverside ISCA 2003.

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.

1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin.

ECE 510 Brendan Crowley Paper Review October 31, 2006.

Automatic Tuning of Two-Level Caches to Embedded Applications Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.

A Characterization of Processor Performance in the VAX-11/780 From the ISCA Proceedings 1984 Emer & Clark.

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

Power, Energy and Delay Static CMOS is an attractive design style because of its good noise margins, ideal voltage transfer characteristics, full logic.

Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.

“Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015.

Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.

CPACT – The Conditional Parameter Adjustment Cache Tuner for Dual-Core Architectures + Also Affiliated with NSF Center for High- Performance Reconfigurable.

1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.

Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu

Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.

ISLPED’99 International Symposium on Low Power Electronics and Design

1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

Using Cycle Efficiency as a System Designer Metric to Characterize an Embedded DSP and Compare Hard Core vs. Soft Core Advisor Dr. Vishwani D. Agrawal.

2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large.

A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor Farhad Mehdipour, H. Noori, B.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

Kyushu University Koji Inoue ICECS'061 Supporting A Dynamic Program Signature: An Intrusion Detection Framework for Microprocessors Koji Inoue Department.

Generating and Executing Multi-Exit Custom Instructions for an Adaptive Extensible Processor Hamid Noori †, Farhad Mehdipour ‡, Kazuaki Murakami †, Koji.

Energy-Effective Issue Logic Hasan Hüseyin Yılmaz.

1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.

Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.

Abdullah Aldahami ( ) March 23, Introduction 2. Background 3. Simulation Techniques a.Experimental Settings b.Model Description c.Methodology.

A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.

1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.

Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)

Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.

Design Space Exploration for a Coarse Grain Accelerator Farhad Mehdipour, Hamid Noori, Morteza Saheb Zamani*, Koji Inoue, Kazuaki Murakami Kyushu University,

Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki.

Minimum Effort Design Space Subsetting for Configurable Caches + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work.

High Performance, Low Power Reconfigurable Processor for Embedded Systems Farhad Mehdipour, Hamid Noori, Koji Inoue, Kazuaki Murakami Kyushu University,

Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.

Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.

Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑教授組員 : R 張馨怡 R 林秀萍.

1 Efficient System-on-Chip Energy Management with a Segmented Counting Bloom Filter Mrinmoy Ghosh- Georgia Tech Emre Özer- ARM Ltd Stuart Biles- ARM Ltd.

An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)

Lightweight Runtime Control Flow Analysis for Adaptive Loop Caching + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing Marisha.

ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009.

Exploiting Dynamic Phase Distance Mapping for Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.

1 Dual-V cc SRAM Class presentation for Advanced VLSIPresenter:A.Sammak Adopted from: M. Khellah,A 4.2GHz 0.3mm 2 256kb Dual-V CC SRAM Building Block in.

1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.

1 of 14 Lab 2: Design-Space Exploration with MPARM.

The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.

1 Improved Policies for Drowsy Caches in Embedded Processors Junpei Zushi Gang Zeng Hiroyuki Tomiyama Hiroaki Takada (Nagoya University) Koji Inoue (Kyushu.

Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.

نظام المحاضرات الالكترونينظام المحاضرات الالكتروني Cache Memory.

Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03.

CSE 340 Computer Architecture Summer 2016 Understanding Performance.

“Temperature-Aware Task Scheduling for Multicore Processors” Masters Thesis Proposal by Myname 1 This slides presents title of the proposed project State.

Overview Motivation (Kevin) Thermal issues (Kevin)

Evaluating Register File Size

Ann Gordon-Ross and Frank Vahid*

Program Phase Directed Dynamic Cache Way Reconfiguration

Automatic Tuning of Two-Level Caches to Embedded Applications

Presentation transcript:

Kyushu University Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems Hamid Noori, Maziar Goudarzi, Koji Inoue, and Kazuaki Murakami Kyushu University

Las Vegas, June 2007 Outline Motivations and Observations Motivations and Observations Energy Evaluation Energy Evaluation Problem Definition Problem Definition Experimental Results Experimental Results Conclusion Conclusion

Kyushu University Las Vegas, June 2007 Outline Motivations and Observations Motivations and Observations Problem Formulation Energy Evaluation Model Experimental Results Conclusion

Kyushu University Las Vegas, June 2007 Motivations and Observations (1/2) Caches contribute a large portion of energy consumption in embedded systems Caches contribute a large portion of energy consumption in embedded systems Leakage power is increasing in new nanometer-scale technologies Leakage power is increasing in new nanometer-scale technologies

Kyushu University Las Vegas, June 2007 Motivations and Observations (2/2) 4-way set-associative cache with 16-byte block size 4-way set-associative cache with 16-byte block size Dynamic: 180nm ~ 4x 100nm & 9x 70nm (CACTI 4.1) Dynamic: 180nm ~ 4x 100nm & 9x 70nm (CACTI 4.1) Static: 70nm ~ 400x 180nm & 5x 100nm (CACTI 4.1) Static: 70nm ~ 400x 180nm & 5x 100nm (CACTI 4.1)

Kyushu University Las Vegas, June 2007 Goal The effect of different nanometer- scale technologies on cache configuration selection in low-energy embedded systems The effect of different nanometer- scale technologies on cache configuration selection in low-energy embedded systems

Kyushu University Las Vegas, June 2007 Outline Energy Evaluation Energy Evaluation

Kyushu University Las Vegas, June 2007 Energy Evaluation (1/3) Static Static Dynamic Dynamic energy_memory(Config, Tech) = energy_dynamic(Config, Tech) + energy_dynamic(Config, Tech) + energy_static(Config, Tech)

Kyushu University Las Vegas, June 2007 Energy Evaluation (2/3) energy_dynamic(Config, Tech) = energy_dynamic(Config, Tech) = cache_accesses(Config) * energy_cache_access(Config, Tech) + cache_misses(Config) * energy_miss(Config,Tech) energy_miss(Config, Tech) = energy_off_chip_access + energy_cache_block_refill(Config,Tech) energy_cache_block_refill(Config,Tech) energy_static(Config, Tech) = executed_clock_cycles(Config) * clock_period * leakage_power(Config, Tech)

Kyushu University Las Vegas, June 2007 Energy Evaluation (3/3) Simplescalar Simplescalar –cache_accesses –cache_misses –executed_clock_cycles CACTI 4.1 CACTI 4.1 –energy_cache_access –energy_cache_block_refill –leakage_power energy_off_chip_access = 20 nJ energy_off_chip_access = 20 nJ Clock freq = 200MHz Clock freq = 200MHz

Kyushu University Las Vegas, June 2007 Outline Problem Definition Problem Definition

Kyushu University Las Vegas, June 2007 Problem Definition “ For a given application, processor architecture, technology, and instruction- and data-cache organization (i.e. the cache associativity and line-size), find the cache size that results in minimum energy consumption (i.e. minimizes Equation 1 for a given technology) over the entire application run. ”

Kyushu University Las Vegas, June 2007 Outline Experimental Results Experimental Results

Kyushu University Las Vegas, June 2007 Experimental Results Applications from Mibench Applications from Mibench SimpleScalar SimpleScalar CACTI 4.1 CACTI 4.1 –Three technologies: 180nm, 100nm, and 70nm

Kyushu University Las Vegas, June 2007 Instruction Cache

Kyushu University Las Vegas, June 2007 Energy Evaluation for three different technologies - qsort

Kyushu University Las Vegas, June 2007 Energy Saving There are two different points for a minimum-energy cache size which are 64K (180nm), and 16K (100nm and 70nm). There are two different points for a minimum-energy cache size which are 64K (180nm), and 16K (100nm and 70nm). Total energy is reduced by 38% and 55% respectively in 100nm and 70nm processes when selecting 16KB size for the instruction cache instead of 64KB. Total energy is reduced by 38% and 55% respectively in 100nm and 70nm processes when selecting 16KB size for the instruction cache instead of 64KB. In this application (qsort), this saving comes at a performance penalty of 37% In this application (qsort), this saving comes at a performance penalty of 37% We also note that energy is reduced by 50% in 180nm process when employing a 64KB cache instead of 16KB; i.e., bigger cache used to result in less energy. But as shown above, this trend is reversed in nanometer technologies. We also note that energy is reduced by 50% in 180nm process when employing a 64KB cache instead of 16KB; i.e., bigger cache used to result in less energy. But as shown above, this trend is reversed in nanometer technologies.

Kyushu University Las Vegas, June 2007 Other Applications Cache Size100nm70nm 180nm100nm70nmEnergy saving Performance penalty Energy saving Performance penalty basicmath32K 0.0 bitcounts2K 0.0 Cjpeg16K 4K Djpeg16K 4K Lame32K8K dijkstra16K 1K patricia32K 0.0 blowfish32K 8K rijndael32K 16K average

Kyushu University Las Vegas, June 2007 Data Cache

Kyushu University Las Vegas, June 2007 Energy Evaluation for three different technologies - qsort

Kyushu University Las Vegas, June 2007 Energy Saving According to the results 32K, 2K and 1K are minimum-energy data cache sizes for 180nm, 100nm and 70nm, respectively. According to the results 32K, 2K and 1K are minimum-energy data cache sizes for 180nm, 100nm and 70nm, respectively. The minimum-energy caches for 100nm (2KB) and 70nm (1KB) technologies respectively consume 88% and 56% less energy compared to the minimum-energy cache of 180nm process (i.e. 32KB). The minimum-energy caches for 100nm (2KB) and 70nm (1KB) technologies respectively consume 88% and 56% less energy compared to the minimum-energy cache of 180nm process (i.e. 32KB). The corresponding performance penalty is only 9% and 14% respectively. The corresponding performance penalty is only 9% and 14% respectively. In 180nm technology, the optimal cache size (32KB) consumes 28% and 40% less energy than 2KB and 1KB caches, but this relation is reversed, with increasing significance, in 100nm and 70nm technologies. In 180nm technology, the optimal cache size (32KB) consumes 28% and 40% less energy than 2KB and 1KB caches, but this relation is reversed, with increasing significance, in 100nm and 70nm technologies.

Kyushu University Las Vegas, June 2007 Other Applications Cache Size100nm70nm 180nm100nm70nmEnergy saving Performance penalty Energy saving Performance penalty basicmath4K2K susan8K2K cjpeg32K8K djpeg32K8K lame32K16K8K dijkstra32K8K patricia32K8K blowfish32K8K4K rijndael32K16K8K sha32K1K average

Kyushu University Las Vegas, June 2007 The effect of miss rate on optimal cache size for different technologies

Kyushu University Las Vegas, June 2007 Energy Evaluation

Kyushu University Las Vegas, June 2007 Results For direct mapped cache, the minimum-energy cache size for three technologies is 32K For direct mapped cache, the minimum-energy cache size for three technologies is 32K For 2-way, 32K, 16K and 16K are candidates with minimum energy for 180nm, 100nm and 70nm. For 2-way, 32K, 16K and 16K are candidates with minimum energy for 180nm, 100nm and 70nm. When the slope of miss rate is very sharp, dynamic energy becomes dominant compared to static energy, and therefore, for any technology we will reach to the same cache size. When the slope of miss rate is very sharp, dynamic energy becomes dominant compared to static energy, and therefore, for any technology we will reach to the same cache size. However when a 2-way set associative cache is used, the sharpness in miss rate diagram flattens and again the static energy becomes more important. That is why in 100nm and 70nm we have a different optimal point compared to 180nm in the 2-way cache. However when a 2-way set associative cache is used, the sharpness in miss rate diagram flattens and again the static energy becomes more important. That is why in 100nm and 70nm we have a different optimal point compared to 180nm in the 2-way cache. Thus, as the miss ratio variations become softer, the optimal cache sizes for different technologies get farther. Thus, as the miss ratio variations become softer, the optimal cache sizes for different technologies get farther. For the instruction cache, where execution clock cycles changes from 800 M to M (~21 times more), the optimal cache sizes are 64K, 16K and 16K, whereas for data cache with softer variation, from 800 M to 1000 M (only 1.2 times more, the minimum- energy cache sizes are 32K, 2K and 1K. For the instruction cache, where execution clock cycles changes from 800 M to M (~21 times more), the optimal cache sizes are 64K, 16K and 16K, whereas for data cache with softer variation, from 800 M to 1000 M (only 1.2 times more, the minimum- energy cache sizes are 32K, 2K and 1K. In the case of the 2-way cache, the optimal cache size for 100nm and 70nm processes (16KB in both of them) respectively consumes 9% and 29% less energy compared to the 180nm optimal cache (32KB) with 25% performance loss. In the case of the 2-way cache, the optimal cache size for 100nm and 70nm processes (16KB in both of them) respectively consumes 9% and 29% less energy compared to the 180nm optimal cache (32KB) with 25% performance loss.

Kyushu University Las Vegas, June 2007 Conclusions The results show that for re-implementing low energy embedded systems in a new technology the cache may need to be re-selected. The results show that for re-implementing low energy embedded systems in a new technology the cache may need to be re-selected. Our study showed that the sharper the slope of miss rate for different cache sizes, the less variation in optimal cache size for different technologies. Our study showed that the sharper the slope of miss rate for different cache sizes, the less variation in optimal cache size for different technologies. The experiments showed that in all cases, the optimal cache size decreases in finer technologies despite the increase in misses and dynamic energy. This is due to high impact of static energy in future technologies and confirms that, unlike micrometer-scale technologies, simply adding more cache does not reduce total system energy in future; cache size must be reduced to minimize total system energy in future nanometer technologies. The experiments showed that in all cases, the optimal cache size decreases in finer technologies despite the increase in misses and dynamic energy. This is due to high impact of static energy in future technologies and confirms that, unlike micrometer-scale technologies, simply adding more cache does not reduce total system energy in future; cache size must be reduced to minimize total system energy in future nanometer technologies. In data cache to due the less cache accesses (less dynamic energy) compared to the instruction cache, this fact is magnified. In data cache to due the less cache accesses (less dynamic energy) compared to the instruction cache, this fact is magnified. Since the smaller caches are more suitable for low energy systems in finer technologies, finding an optimal cache configuration that simultaneously optimizes performance and energy is increasingly more difficult in future. Since the smaller caches are more suitable for low energy systems in finer technologies, finding an optimal cache configuration that simultaneously optimizes performance and energy is increasingly more difficult in future.

Kyushu University Las Vegas, June 2007 Thank you for your attention

Kyushu University Las Vegas, June 2007 Energy Saving & Performance Penalty Energy Saving = (energy_cache180_NTech – energy_cacheNTech) / energy_cache180_NTech Performance Penalty = (exec_time_cacheNTech – exec_time_cache180) / exec_time_cache180

Kyushu University Las Vegas, June 2007 Instruction Cache – Energy Saving 100nm: 8%, 27%, and 41% for 20°C, 60°C, 100°C (max: 65%) 70nm: 1%, 6%, and 16% for 20°C, 60°C, 100°C (max: 45%)

Kyushu University Las Vegas, June 2007 Instruction Cache – Performance Penalty 100nm: 1%, 1.2%, and 2.2% for 20°C, 60°C, 100°C 70nm: 0.6%, 2.3%, and 16% for 20°C, 60°C, 100°C

Kyushu University Las Vegas, June 2007 Data Cache – Energy Saving 100nm: 3.3%, 25%, and 47% for 20°C, 60°C, 100°C (max: 75%) 70nm: 7%, 22%, and 33% for 20°C, 60°C, 100°C (max: 65%)

Kyushu University Las Vegas, June 2007 Data Cache – Performance Penalty 100nm: 0.8%, 5.3%, and 8% for 20°C, 60°C, 100°C 70nm: 3.6%, 10%, and 20% for 20°C, 60°C, 100°C

Kyushu University Las Vegas, June 2007 Architecture and Reconfiguration Flow for a Temperature-Aware Configurable Cache Configurable Cache + Configurable Cache + –Hardware Thermal sensor Thermal sensor Accessible read port Accessible read port –Software A table in Operating System (OS) for recoding temperature ranges and their suitable cache configuration A table in Operating System (OS) for recoding temperature ranges and their suitable cache configuration

Kyushu University Las Vegas, June 2007 Flow of configuring Temperature-Aware Configurable Cache

Kyushu University Las Vegas, June 2007 Temperature measurement accuracy (1/2) T j = T a + θ JA. P T j = T a + θ JA. P – T j : Junction Temperature –T a : Ambient Temperature –P: Power –θ JA : Junction-to-Ambient Thermal Resistance

Kyushu University Las Vegas, June 2007 Temperature measurement accuracy (2/2) ARM7TDMIARM966E-S 180nm Power consumptio n mW 140 mW Frequency 115 MHz 200 MHz 130nm Power consumptio n 7.98 mW 62.5 mW Frequency 133 MHz 250 MHz 90nm Power consumptio n 7.08 mW 51.7 mW Frequency 236 MHz 470 MHz θ JA : 7°C/W ~ 35 °C/W ΔT = (Tj - Ta) ~ 5 °C

Kyushu University Las Vegas, June 2007 Conclusions Our results show that up to 66% and 45% energy consumption can be saved for 100nm and 70nm for instruction cache when the temperature changes from 0°C to 100°C. Our results show that up to 66% and 45% energy consumption can be saved for 100nm and 70nm for instruction cache when the temperature changes from 0°C to 100°C. Due to the increase of leakage effect in finer technologies and higher temperatures, the smaller caches will be more energy efficient for future low energy systems. Due to the increase of leakage effect in finer technologies and higher temperatures, the smaller caches will be more energy efficient for future low energy systems. Since the smaller caches are more suitable for low energy systems in finer technologies and higher temperatures, finding an optimal cache configuration that simultaneously optimizes performance and energy is increasingly more difficult in future, specially at high temperatures. Since the smaller caches are more suitable for low energy systems in finer technologies and higher temperatures, finding an optimal cache configuration that simultaneously optimizes performance and energy is increasingly more difficult in future, specially at high temperatures. Since the accesses to data cache are less than the accesses to instruction cache, the data cache is more easily affected by temperature and technology than instruction cache. By using a configurable data cache, up to 74% and 64% energy can be saved for 100nm and 70nm respectively. Since the accesses to data cache are less than the accesses to instruction cache, the data cache is more easily affected by temperature and technology than instruction cache. By using a configurable data cache, up to 74% and 64% energy can be saved for 100nm and 70nm respectively.

Kyushu University Las Vegas, June 2007 Thank you for your attention Questions?

Kyushu University Las Vegas, June 2007 Motivations and Observations (3/4) BSIM3 equation for subthreshold leakage BSIM3 equation for subthreshold leakage

Kyushu University Las Vegas, June 2007 Experimental Results (1/) Applications from Mibench Applications from Mibench SimpleScalar SimpleScalar CACTI 4.1 CACTI 4.1 –Three technologies: 180nm, 100nm, and 70nm –Six Temperatures: 0°C, 20°C, 40°C, 60°C, 80°C, 100°C Configurable Cache Configurable Cache –Size: 64KB~1KB

Kyushu University Las Vegas, June 2007 Qsort-Instruction Cache

Kyushu University Las Vegas, June 2007 Qsort-Instruction Cache {0°C ~ 80°C}  64KB, {80°C ~ 100°C}  32KB 17% energy saving and 19.6% performance penalty

Kyushu University Las Vegas, June 2007 Qsort-Data Cache 2-way set-associative, 16 bytes line size, 100nm.

Kyushu University Las Vegas, June 2007 Qsort-Data Cache Fig. 12. Static energy for different data cache sizes (100nm).