The Effects of Operating Conditions on Speed and Power of Replica – Based SRAM Circuits Nika Sharifvaghefi Nicholas Kumar EE241 - Spring 2012.

Slides:



Advertisements
Similar presentations
Transmission Gate Based Circuits
Advertisements

COEN 180 SRAM. High-speed Low capacity Expensive Large chip area. Continuous power use to maintain storage Technology used for making MM caches.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Stream Scheduling.
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Clock Design Adopted from David Harris of Harvey Mudd College.
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
1 The Basic Memory Element - The Flip-Flop Up until know we have looked upon memory elements as black boxes. The basic memory element is called the flip-flop.
1 A Variation-tolerant Sub- threshold Design Approach Nikhil Jayakumar Sunil P. Khatri. Texas A&M University, College Station, TX.
4/28/05Vemula: ELEC72501 Enhanced Scan Based Flip-Flop for Delay Testing By Sudheer Vemula.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
EE 141 Project 2May 8, Outstanding Features of Design Maximize speed of one 8-bit Division by: i. Observing loop-holes in 8-bit division ii. Taking.
EE141 Spring 2003 Discussion 7 CMOS Gate Design and Circuit Optimization Related Material — Homework 6, Project 1.
Die-Hard SRAM Design Using Per-Column Timing Tracking
S. RossEECS 40 Spring 2003 Lecture 28 Today… Analyzing digital computation at a very low level! The Latch Pipelined Datapath Control Signals Concept of.
Low-Power CMOS SRAM By: Tony Lugo Nhan Tran Adviser: Dr. David Parent.
EECE476: Computer Architecture Lecture 19: Pipelining Reducing Control Hazard Penalty Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Registers  Flip-flops are available in a variety of configurations. A simple one with two independent D flip-flops with clear and preset signals is illustrated.
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
EE141 © Digital Integrated Circuits 2nd Combinational Circuits 1 Logical Effort - sizing for speed.
Lightning Detector Michael Bloem December 5, 2002 Engr 311.
EE 261 – Introduction to Logic Circuits Module #8 Page 1 EE 261 – Introduction to Logic Circuits Module #8 – Programmable Logic & Memory Topics A.Programmable.
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
156 / MAPLD 2005 Rollins 1 Reducing Energy in FPGA Multipliers Through Glitch Reduction Nathan Rollins and Michael J. Wirthlin Department of Electrical.
1 VLSI Design SMD154 LOW-POWER DESIGN Magnus Eriksson & Simon Olsson.
Power Reduction for FPGA using Multiple Vdd/Vth
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Washington State University
Arithmetic Building Blocks
Filip Tavernier Karolina Poltorak Sandro Bonacini Paulo Moreira
Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
INSTRUCTION PIPELINE. Introduction An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 5, 2010 Memory Overview.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 26: October 31, 2014 Synchronous Circuits.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Outline Introduction: BTI Aging and AVS Signoff Problem
Bi-CMOS Prakash B.
An Introduction to VLSI (Very Large Scale Integrated) Circuit Design
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
An Improved “Soft” eFPGA Design and Implementation Strategy
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
EE415 VLSI Design THE INVERTER [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Clocking System Design
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
EE141 Project: 32x32 SRAM Abhinav Gupta, Glen Wong Optimization goals: Balance between area and performance Minimize area without sacrificing performance.
Dynamic Logic.
EE141 Combinational Circuits 1 Chapter 6 (I) Designing Combinational Logic Circuits Dynamic CMOS LogicDynamic CMOS Logic V1.0 5/4/2003.
Comparison of Various Multipliers for Performance Issues 24 March Depart. Of Electronics By: Manto Kwan High Speed & Low Power ASIC
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.
Written by Whitney J. Wadlow
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
CENG 241 Digital Design 1 Lecture 13
Memories.
Floating-Point FPGA (FPFPGA)
Two-phase Latch based design
Subject Name: Fundamentals Of CMOS VLSI Subject Code: 10EC56
Presentation transcript:

The Effects of Operating Conditions on Speed and Power of Replica – Based SRAM Circuits Nika Sharifvaghefi Nicholas Kumar EE241 - Spring 2012

Sense-Amp in SRAMS Instead of waiting for the actual bitline to discharge we take an early measurement Saves power Saves time

Sense-Amp Timing When do we take the measurement? Too early  error Too late  waste of power and time Conventional solution: Inverter chain

Problems Static fixed delay – Hard to find the proper amount without putting restrictions on usage condition – Have to pay a penalty (power) in normal conditions PVT variations affect SRAM cells and inverters in different ways – Again have to put more slack

Solution: Replica Use SRAM cells instead of inverters Affected the same way by PVT variations Replica is not a full column

Problems with Replica Still have variations due to process Have to add unnecessary slack When Vdd goes down, we have fewer replica cells, so variation becomes important

Different Solutions Divide the bitline into independent parts – Works when #stages is low – Delay of extra logic between stages becomes important if #cells is low or #stages is high Pick replica cells from a large set of configurable cells. Pick the best ones – Works very well – Requires post-silicon testing

Different Solutions Use a Timing Multiply Circuit – Use more replica cells than we should – We can’t use the raw replica output or else we’d get errors from sense-amp – Make up for the reduced delay by multiplying it using a TMC

TMC The clock signal goes through a forward path As soon as the input signal goes up, the forward path gets transported to a backward path If the backward path uses N times as many delay cells as the forward path does, the input delay gets multiplied by N+1

Implementation Implemented in Cadence using 90nm tech Result:

Conclusion Using a TMC – Lowers the variation and therefore the unnecessary slack – Doesn’t need post-silicon test – Viable for low Vdd Disadvantages – Adds to the area – Variation in delay cells