Feb. 17, 2011 Midterm overview Real life examples of built chips

Slides:



Advertisements
Similar presentations
ECE555 Lecture 5 Nam Sung Kim University of Wisconsin – Madison
Advertisements

0 - 0.
Addition Facts
Introduction to CMOS VLSI Design Combinational Circuits
EE466: VLSI Design Lecture 7: Circuits & Layout
Static CMOS Circuits.
W. G. Oldham EECS 40 Fall 2001 Lecture 3 Copyright Regents of University of California 1 Who needs to take EE 40: EECS majors and those transferring into.
The scale of IC design Small-scale integrated, SSI: gate number usually less than 10 in a IC. Medium-scale integrated, MSI: gate number ~10-100, can operate.
EE 414 – Introduction to VLSI Design
Exclusive-OR and Exclusive-NOR Gates
ASYNC07 High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar Technion – Israel Institute.
Digital Techniques Fall 2007 André Deutz, Leiden University
The op-amp Differentiator
Addition 1’s to 20.
Test B, 100 Subtraction Facts
Week 1.
UNIVERSITY OF MASSACHUSETTS Dept
ECE 424 – Introduction to VLSI
CPE 626 CPU Resources: Adders & Multipliers Aleksandar Milenkovic Web:
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
EE141 Adder Circuits S. Sundar Kumar Iyer.
Power Reduction Techniques For Microprocessor Systems
Elettronica T A.A Digital Integrated Circuits © Prentice Hall 2003 Inverter CMOS INVERTER.
CSE-221 Digital Logic Design (DLD)
Designing Combinational Logic Circuits: Part2 Alternative Logic Forms:
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Digital Parallelization Y[n] = X[n] +  X[n-1] Input 5GS/s) clk X[n]X[n-1] Y[n] + x  Clk = 5GHz Analog Signal Input 5GS/s) Or (8bits.
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 14: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
IMPLEMENTATION OF µ - PROCESSOR DATA PATH
Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Shifters. n Adders and ALUs.
S. Reda EN1600 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 25: Datapath Subsystems 1/4 Prof. Sherief Reda Division of Engineering,
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 23 - Subsystem.
Lecture 5 – Power Prof. Luke Theogarajan
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Lecture 7: Power.
Digital Integrated Circuits© Prentice Hall 1995 Combinational Logic COMBINATIONAL LOGIC.
Adders. Full-Adder The Binary Adder Express Sum and Carry as a function of P, G, D Define 3 new variable which ONLY depend on A, B Generate (G) = AB.
Lec 17 : ADDERS ece407/507.
Parallel Prefix Adders A Case Study
The CMOS Inverter Slides adapted from:
Review: CMOS Inverter: Dynamic
Arithmetic Building Blocks
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Reference: Digital Integrated.
Arithmetic Building Blocks
Chapter 14 Arithmetic Circuits (I): Adder Designs Rev /12/2003
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics n Shifters. n Adders and ALUs.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
COMP541 Arithmetic Circuits
Basics of Energy & Power Dissipation
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 19: Adder Design
CSE477 VLSI Digital Circuits Fall 2002 Lecture 20: Adder Design
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
EE141 Project: 32x32 SRAM Abhinav Gupta, Glen Wong Optimization goals: Balance between area and performance Minimize area without sacrificing performance.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
Prof. An-Yeu Wu Undergraduate VLSI Course Updated: May 24, 2002
VLSI Arithmetic Adders & Multipliers
Digital Integrated Circuits A Design Perspective
EE115C – Winter 2009 Digital Electronic Circuits
Lecture 9 Digital VLSI System Design Laboratory
Prof. An-Yeu Wu Undergraduate VLSI Course Updated: May 24, 2002
Prof. An-Yeu Wu Undergraduate VLSI Course Updated: May 24, 2002
EE216A – Fall 2010 Design of VLSI Circuits and Systems
A 200MHz <insert E #>pJ 6-bit Absolute-Value Detector
Arithmetic Building Blocks
Arithmetic Circuits.
Presentation transcript:

Feb. 17, 2011 Midterm overview Real life examples of built chips Clock Skew Arithmetic Data Centers Power reduction techniques Dynamic Voltage / Frequency Scaling Clock Throttling Power Gating Others? Project – 4b adder with Razor recovery

Go Over Problems 1c 2a; 2b 3c

Crossbar Design

Mirror Adder Stick Diagram

The Mirror Adder The NMOS and PMOS chains are completely symmetrical. A maximum of two series transistors can be observed in the carry- generation circuitry. When laying out the cell, the most critical issue is the minimization of the capacitance at node Co. The reduction of the diffusion capacitances is particularly important. The capacitance at node Co is composed of four diffusion capacitances, two internal gate capacitances, and six gate capacitances in the connecting adder cell . The transistors connected to Ci are placed closest to the output. Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size.

Transmission Gate Full Adder

Manchester Carry Chain

Manchester Carry Chain

Carry-Bypass Adder Also called Carry-Skip

Carry-Bypass Adder (cont.)

Carry Ripple versus Carry Bypass

Carry-Select Adder

Carry Select Adder: Critical Path

Linear Carry Select

Square Root Carry Select

Adder Delays - Comparison

LookAhead - Basic Idea

Look-Ahead: Topology Expanding Lookahead equations: All the way:

Carry Lookahead Trees Can continue building the tree hierarchically.

Power Reduction Techniques Stop the clock Dynamic power reduction Power gating Reduce the leakage How fast can you turn something on/off? Nothing to do  sleep How can you save power while in operation? Near-threshold design

Power Gating

Kevin Nowka, IBM

Gate Leakage

Digital Parallelization Y[n] = X[n] + X[n-1] Input (5bits @ 5GS/s) Analog Signal X[n-1] X[n] Input (5bits @ 5GS/s) Or (8bits @ 100MHz) clk clk x Y[n] +  Clk = 5GHz ANALOG DIGITAL

DSP Parallelization Y[n] = X[n] + X[n-1] Y[n-1] = X[n-1] + X[n-2] Input (5bits @ 5GS/s)  Y[n] + clk clk x Y[n-1] + clk X[n-1] x clkb clk  CLK = 5GHz CLK = 2.5GHz

DSP Parallelization Clock speed reduced by ½ Intuition? Can parallelize further Increase number of MACs(multiply/accumulates) by 2 Intuition? Area goes up by 2 Power decreases (clock rate down by 2, computations up by 2, but easier timing constraints) What about clock power? Save a little power, but double the area?

Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation http://www.eecs.umich.edu/~taustin/papers/MICRO36-Razor.pdf

Project Description Minimal: 4b Adder, Implemented with Razor Simulations into near-threshold domain Grad. Student: requires more advanced design Analog: Opamps built using inverters Digital: Adiabatic Near-Threshold Power Gating: add power gating to your design Undergrad: extra credit if do any of the above

Problem 1: On-Chip Wires Consume Energy On-chip wire power does not scale Dominated by interconnect capacitance (CVDD2) VDD Eb 1V 150fJ/mm ON-CHIP (Status Quo): 100 - 300fJ/bit/mm On-chip wires start to dominate power consumption, as computational logic energy is minimized when operating in near-threshold. Here is a graph for a recent DOE exascale study, showing that in the next 8 years, the energy to perform a double-precision FLOP will improve by 5x, but on-chip wires will not. For example, 1mm and 5mm on-chip links will not have changed, because energy is proportional to capacitance, and fringe capacitance will not improve with technology scaling. Note that from our initial work with near-threshold computation, the amount of energy it takes to perform a 16b multiply/Accumulate is 200fJ for Vdd=0.4V. The amount Of energy to move that 16b parallel bus 300um distance will cost 250fJ – or more energy than it takes to perform computation. Hence, low-Vdd operation accentuates to the problem of energy-consumption within on-chip wires. We will need to propose another 5-10x improvement in energy-efficiency for on-chip wires in order to close this gap when logic operates in near-threshold regime. OUR GOAL: < 5fJ/bit/mm NOTE: Sub/Near-Threshold doesn’t help this problem! [DOE, Exascale Workshop]

Data Center Design http://www.spectrum.ieee.org/feb09/7327