ISLPED 2003 Power Efficient Comparators for Long Arguments in Superscalar Processors *supported in part by DARPA through the PAC-C program and NSF Dmitry.

Slides:



Advertisements
Similar presentations
09/16/2002 ICCD 2002 A Circuit-Level Implementation of Fast, Energy-Efficient CMOS Comparators for High-Performance Microprocessors* *supported in part.
Advertisements

1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.
Federation: Repurposing Scalar Cores for Out- of-Order Instruction Issue David Tarjan*, Michael Boyer, and Kevin Skadron* University of Virginia Department.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Synonymous Address Compaction for Energy Reduction in Data TLB Chinnakrishnan Ballapuram Hsien-Hsin S. Lee Milos Prvulovic School of Electrical and Computer.
Using Virtual Load/Store Queues (VLSQs) to Reduce The Negative Effects of Reordered Memory Instructions Aamer Jaleel and Bruce Jacob Electrical and Computer.
Energy-efficient Instruction Dispatch Buffer Design for Superscalar Processors* Gurhan Kucuk, Kanad Ghose, Dmitry V. Ponomarev Department of Computer Science.
Leakage-Biased Domino Circuits for Dynamic Fine- Grain Leakage Reduction Seongmoo Heo and Krste Asanović Massachusetts Institute of Technology Lab for.
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI,
2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.
Decomposition of Instruction Decoder for Low Power Design TingTing Hwang Department of Computer Science Tsing Hua University.
Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Oguz Ergin*, Deniz Balkan, Kanad Ghose, Dmitry Ponomarev Department.
PATMOS 2003 Energy Efficient Register Renaming *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry Ponomarev,
Scheduling Reusable Instructions for Power Reduction J.S. Hu, N. Vijaykrishnan, S. Kim, M. Kandemir, and M.J. Irwin Proceedings of the Design, Automation.
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton University M. Franklin – University of Maryland Presented by:
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
Exploiting Load Latency Tolerance for Relaxing Cache Design Constraints Ramu Pyreddy, Gary Tyson Advanced Computer Architecture Laboratory University of.
ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry.
Memory access scheduling Authers: Scott RixnerScott Rixner,William J. Dally,Ujval J. Kapasi, Peter Mattson, John D. OwensWilliam J. DallyUjval J. KapasiPeter.
ISLPED’03 1 Reducing Reorder Buffer Complexity Through Selective Operand Caching *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk,
Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
ICS’02 1 Low-Complexity Reorder Buffer Architecture* *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Dmitry Ponomarev, Kanad.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in partitioned architectures Rajeev Balasubramonian Naveen.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Hybrid-Scheduling: A Compile-Time Approach for Energy–Efficient Superscalar Processors Madhavi Valluri and Lizy John Laboratory for Computer Architecture.
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir.
ISLPED’99 International Symposium on Low Power Electronics and Design
2013/10/21 Yun-Chung Yang An Energy-Efficient Adaptive Hybrid Cache Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman, Yi Zou Computer.
1 Instruction Sets and Beyond Computers, Complexity, and Controversy Brian Blum, Darren Drewry Ben Hocking, Gus Scheidt.
Power Management in High Performance Processors through Dynamic Resource Adaptation and Multiple Sleep Mode Assignments Houman Homayoun National Science.
A Centralized Cache Miss Driven Technique to Improve Processor Power Dissipation Houman Homayoun, Avesta Makhzan, Jean-Luc Gaudiot, Alex Veidenbaum University.
Energy-Effective Issue Logic Hasan Hüseyin Yılmaz.
Complexity-Effective Superscalar Processors S. Palacharla, N. P. Jouppi, and J. E. Smith Presented by: Jason Zebchuk.
Implicitly-Multithreaded Processors Il Park and Babak Falsafi and T. N. Vijaykumar Presented by: Ashay Rane Published in: SIGARCH Computer Architecture.
Microprocessor Microarchitecture Limits of Instruction-Level Parallelism Lynn Choi Dept. Of Computer and Electronics Engineering.
1 A Cost-effective Substantial- impact-filter Based Method to Tolerate Voltage Emergencies Songjun Pan 1,2, Yu Hu 1, Xing Hu 1,2, and Xiaowei Li 1 1 Key.
Low-Power Cache Organization Through Selective Tag Translation for Embedded Processors with Virtual Memory Support Xiangrong Zhou and Peter Petrov Proceedings.
1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
PATMOS’02 Energy-Efficient Design of the Reorder Buffer* *supported in part by DARPA through the PAC-C program and NSF Dmitry Ponomarev, Gurhan Kucuk,
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
12/03/2001 MICRO’01 Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources* *supported in part.
Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.
Dynamic Associative Caches:
Multiscalar Processors
Introduction CPU performance factors
Lynn Choi Dept. Of Computer and Electronics Engineering
Department of Electrical & Computer Engineering
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Energy-Efficient Address Translation
Presented by: Divya Muppaneni
Reducing Cache Traffic and Energy with Macro Data Load
Code Transformation for TLB Power Reduction
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Restrictive Compression Techniques to Increase Level 1 Cache Capacity
Presentation transcript:

ISLPED 2003 Power Efficient Comparators for Long Arguments in Superscalar Processors *supported in part by DARPA through the PAC-C program and NSF Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY International Symposium on Low Power Electronics and Design (ISLPED’03), August 27 th 2003

ISLPED 2003 Outline Motivations 8-bit comparator designs Traditional Comparator Dissipate on Match Comparator (ICCD’02) 32-bit Comparator Designs Results : Application to the Load-Store Queue Conclusions

ISLPED 2003 Motivation Equality comparators are very pervasive in today’s superscalar datapaths. Wake-up logic of the Issue Queues Dependency checking logic Load-Store queues Translation Lookaside Buffers (TLB) Caches Branch Target Buffers (BTB)

ISLPED 2003 Motivation (continued) Traditional comparators dissipate energy on mismatches in any bit position of the arguments In many cases, mismatches are much more frequent than matches Issue queue : Only 3% of all comparisons result in a match (Ergin et.al., ICCD’02) For energy efficiency, dissipate-on-match designs can be considered

ISLPED 2003 Traditional 8-bit Pull-Down Comparator precharge Evaluation

ISLPED 2003 Dissipate-on-Match Comparator (DMC, ICCD’02) Propagation Precharge Discharge Evaluation

ISLPED 2003 Use of the Long Comparators Load-Store queues – to allow loads to bypass earlier stores TLBs – for associative lookup Caches – for associative lookup BTBs – for associative lookup

ISLPED 2003 Traditional 8-bit Pull-Down Comparator

ISLPED 2003 Dissipate-on-Match Comparator (DMC)

ISLPED 2003 Comparison of larger operands : Some Alternatives ~270 ps >400 ps

ISLPED 2003 Choosing the Right Alternative Need to consider impact on : Delay Energy Have to look at distributions of bit values that are compared More than one alternative may be acceptable

ISLPED 2003 Experimental Setup (AccuPower, DATE’02) Compiled SPEC benchmarks Datapath specs Performance stats VLSI layout data SPICE deck SPICE Microarchitectural Simulator Energy/Power Estimator Power/energy stats SPICE measures of Energy per transition Transition counts, Context information

ISLPED 2003 Matching Statistics of 32-bit Addresses in the LSQ

ISLPED 2003 Dissipate-on-Match Comparator (DMC)

ISLPED 2003 Matching Statistics of 32-bit Addresses in the LSQ

ISLPED 2003 Energy savings in comparison of longer operands 19% Energy Increase 19% Energy Savings

ISLPED 2003 Main Results We discussed some energy-efficient 32-bit wide comparator designs 19% comparator-related energy reduction in LSQ is achieved by using a hybrid design (3 TRADs + 1 DMC) compared to the use of 4 TRAD comparators Results can be extended to TLBs and BTBs

ISLPED 2003 THANK YOU ! *supported in part by DARPA through the PAC-C program and NSF LOW POWER RESEARCH GROUP Department of Computer Science State University of New York Binghamton, NY International Symposium on Low Power Electronics and Design (ISLPED’03), August 27 th 2003

ISLPED 2003 Timing Diagrams

ISLPED bit Matching Statistics: LSQ

ISLPED 2003 Traditional 8-bit Pull-Down Comparator

ISLPED 2003 Pass Logic, Single-Stage Comparator (PLSSC)