Bus Serialization for Reducing Power Consumption

Slides:



Advertisements
Similar presentations
The Bus Architecture of Embedded System ESE 566 Report 1 LeTian Gu.
Advertisements

Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
COEN 180 SRAM. High-speed Low capacity Expensive Large chip area. Continuous power use to maintain storage Technology used for making MM caches.
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.
Probabilistic Design Methodology to Improve Run- time Stability and Performance of STT-RAM Caches Xiuyuan Bi (1), Zhenyu Sun (1), Hai Li (1) and Wenqing.
CMOS Circuit Design for Minimum Dynamic Power and Highest Speed Tezaswi Raja, Dept. of ECE, Rutgers University Vishwani D. Agrawal, Dept. of ECE, Auburn.
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.
On-Chip Interconnect Analysis and Evaluation of Delay, Power, and Bandwidth Metrics under Different Design Goals.
Yulei Zhang1, James F. Buckwalter1, and Chung-Kuan Cheng2
A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of Wisconsin-Madison 12/3/03.
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
Introspective 3D Chips S. Mysore, B. Agrawal, N. Srivastava, S. Lin, K. Banerjee, T. Sherwood (UCSB), ASPLOS 2006 Shimin Chen (LBA Reading Group Presentation)
Slide: 1International Conference on Electronics, Circuits, and Systems 2010 Department of Electrical and Computer Engineering University of New Mexico.
Power Reduction for FPGA using Multiple Vdd/Vth
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Written By: Kris Tiri and Ingrid Verbauwhede Presented By: William Whitehouse.
CAD for Physical Design of VLSI Circuits
A Class Presentation for VLSI Course by : Fatemeh Refan Based on the work Leakage Power Analysis and Comparison of Deep Submicron Logic Gates Geoff Merrett.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Washington State University
DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-Graduate Project Improving Timing, Area, and Power Speaker: 黃乃珊 Adviser: Prof.
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Effects of Inductance on the Propagation Delay and Repeater Insertion in VLSI Circuits Yehea I. Ismail and Eby G. Friedman, Fellow, IEEE.
M. Atef, Hong Chen, and H. Zimmermann Vienna University of Technology
High Gain Transimpedance Amplifier with Current Mirror Load By: Mohamed Atef Electrical Engineering Department Assiut University Assiut, Egypt.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 30: November 21, 2012 Crosstalk.
Implementation of LFSR Counter Using CMOS VLSI Technology.
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
Crusoe Processor Seminar Guide: By: - Prof. H. S. Kulkarni Ashish.
Two Dimensional Highly Associative Level-Two Cache Design
HISTORY OF MICROPROCESSORS
The Interconnect Delay Bottleneck.
Cache Memory.
Digital readout architecture for Velopix
Alireza Shafaei, Shuang Chen, Yanzhi Wang, and Massoud Pedram
Low Write-Energy STT-MRAMs using FinFET-based Access Transistors
SECTIONS 1-7 By Astha Chawla
ISPASS th April Santa Rosa, California
Chapter 4 Interconnect.
Architecture & Organization 1
Circuits and Interconnects In Aggressively Scaled CMOS
Cache Memory Presentation I
Downsizing Semiconductor Device (MOSFET)
DESIGN AND IMPLEMENTATION OF DIGITAL FILTER
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Architecture & Organization 1
Downsizing Semiconductor Device (MOSFET)
Performance metrics for caches
Adapted from slides by Sally McKee Cornell University
Guihai Yan, Yinhe Han, Xiaowei Li, and Hui Liu
Performance metrics for caches
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Die Stacking (3D) Microarchitecture -- from Intel Corporation
Computer Evolution and Performance
Reading: Hambley Ch. 7; Rabaey et al. Secs. 5.2, 5.5, 6.2.1
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
A Low-Power Analog Bus for On-Chip Digital Communication
A Random Access Scan Architecture to Reduce Hardware Overhead
Overview Problem Solution CPU vs Memory performance imbalance
Restrictive Compression Techniques to Increase Level 1 Cache Capacity
Presentation transcript:

Bus Serialization for Reducing Power Consumption Naoya Hatta†, Niko Demus Barli††,Chitaka Iwama†, Luong Dinh Hung†,Daisuke Tashiro†, Shuichi Sakai†, Hidehiko Tanaka††† † University of Tokyo †† Texas Instruments Japan ††† Institute of Information Security

Introduction Wiring power consumption is an important issue on VLSI design SoC and Chip Multiprocessor require buses with long wires Bus serialization for reducing bus power consumption

Outline Proposition Evaluations Conclusion Future Works Objective Bus Serialization Layout Optimization Evaluations Conclusion Future Works

Proposition

Objective T = M f P = a T C V2 Throughput must not decrease We want to T: Throughput M: The number of wires f: Bus frequency P = a T C V2 We want to reduce Power P: Power a: Activity C: Bus capacitance V: Voltage swing

Bus Serialization Reduce bus capacitance Low power and high frequency - by decreasing the number of wires Serializer Deserializer Wire Serialized Bus Latch Wire Conventional Bus Low power and high frequency

Layout Changes The number of wires (M) decreases Pitch Pitch The number of wires (M) decreases Wire resistance (R) decreases Wire capacitance (C) decreases Without increasing area

Parameters Change T = M f f ∝ 1 / R C P = a T C V2 ? M decreases Require higher f - for remaining T Meet the requirement Objective ? R, C decrease f ∝ 1 / R C f increases P = a T C V2 C decreases Power decreases

Layout Optimization T > 100 % Minimum C (=Minimum P) Best width

Why power decreases? P = f C V2 P = M f C V2 C  C / 2 f  2 f P = f C V2 Power doesn’t decrease? C  C / 2 f  2 f P = M f C V2 Power decreases! M  M / 2

Evaluation

Condition Bus Specification Wire Configurations (width, height, etc…) Bus width: 64bit The number of wires (conventional): 64 The number of wires (serialized): 32 Wire Configurations (width, height, etc…) From International Technology Roadmap for Semiconductor 2002 Bit pattern Address bus and data bus between L1 cache and L2 cache L1 cache (data/inst) :16KB, 2way, 64byte block SPECint95 benchmark Compare to conventional (fully parallel) bus

Bus Capacitance The effect of serialization increases as gate length shrinks

Bus Power Consumption Power decreases by 34%

Why Power Increases? Power is consumed Extra Transition Conventional Bus 1 The number of transitions increases by serialization When the same bit pattern is transferred every cycle, extra transition occurs. In address bus, this situation frequently appears. Power is consumed Extra Transition Serialized Bus 1

Differential Data Transfer (DDT) Bit Pattern Normal DDT 0010011010 0010011011 0010011100 0010011010 0010011011 0010011100 0010011010 0000000001 0000000111 Extra Transition occurs Extra Transition doesn’t occur Transfer the difference between present data and previous data

Bus Power Consumption (DDT) Power decreases by 27%

Comparison DDT is useful in Address. In Data, not useful In 45 nm technology, power decreases by about 30%

Power of Peripheral Circuits The additional power of peripheral circuits is 2% of the power consumed by wire 180nm process Wire length: 5mm

Conclusion Normal serialized bus is proper to data bus Serialized bus with DDT is proper to address bus Bus serialization technique decreases power consumption by 30% of conventional in 45nm process As gate length shrinks, Bus serialization becomes more effective

Future Works Apply to Chip Multiprocessor Additional costs of DDT Between L1 cache and L2 cache Additional costs of DDT Additional circuits and delay

Capacitance Model

Power increasing by DDT 10 00 10 1

Bus Power Model

Additional Delay Conventional bus: 0.17ns Serialized bus: 0.15ns