TRAMS PROJECT PTC meeting June 23rd 2011 WP3 progress

Slides:

Advertisements

Similar presentations

Subthreshold SRAM Designs for Cryptography Security Computations Adnan Gutub The Second International Conference on Software Engineering and Computer Systems.

Advertisements

Introduction to the TRAMS project objectives and results in Y1 Antonio Rubio, Ramon Canal UPC, Project coordinator CASTNESS’11 WORKSHOP ON TERACOMP FET.

Tunable Sensors for Process-Aware Voltage Scaling

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.

Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 261 Lecture 26 Logic BIST Architectures n Motivation n Built-in Logic Block Observer (BILBO) n Test.

Robust Low Power VLSI R obust L ow P ower VLSI Sub-threshold Sense Amplifier (SA) Compensation Using Auto-zeroing Circuitry 01/21/2014 Peter Beshay Department.

A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.

Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.

Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.

Chung-Kuan Cheng†, Andrew B. Kahng†‡,

1 Razor: A Low Power Processor Design Presented By: - Murali Dharan.

1 Advanced Digital Design Asynchronous Design: Research Concept by A. Steininger and M. Delvai Vienna University of Technology.

1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University

1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University

Dose Map and Placement Co-Optimization for Timing Yield Enhancement and Leakage Power Reduction Kwangok Jeong, Andrew B. Kahng, Chul-Hong Park, Hailong.

TASK ADAPTATION IN REAL-TIME & EMBEDDED SYSTEMS FOR ENERGY & RELIABILITY TRADEOFFS Sathish Gopalakrishnan Department of Electrical & Computer Engineering.

1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University

“ Near-Threshold Computing: Reclaiming Moore’s Law Through Energy Efficient Integrated Circuits ” By Ronald G. Dreslinski, Michael Wieckowski, David Blaauw,

Accuracy-Configurable Adder for Approximate Arithmetic Designs

High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.

Power Reduction for FPGA using Multiple Vdd/Vth

Low-Power Wireless Sensor Networks

Dept. of Computer Science, UC Irvine

Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.

Jia Yao and Vishwani D. Agrawal Department of Electrical and Computer Engineering Auburn University Auburn, AL 36830, USA Dual-Threshold Design of Sub-Threshold.

1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.

Canary SRAM Built in Self Test for SRAM VMIN Tracking

Low Power – High Speed MCML Circuits (II)

XIAOYU HU AANCHAL GUPTA Multi Threshold Technique for High Speed and Low Power Consumption CMOS Circuits.

Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.

LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches Aarul Jain, Cambridge Silicon Radio, Phoenix Aviral Shrivastava, Arizona State.

Outline Introduction: BTI Aging and AVS Signoff Problem

Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.

Patricia Gonzalez Divya Akella VLSI Class Project.

A Novel, Highly SEU Tolerant Digital Circuit Design Approach By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.

Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.

TRAMS PROJECT WP3 (T3.3) FP PTC, November 4 th, 2011 Paul Zuber, Miguel Miranda Imec Acknowledgments: Pablo Royer, Peter Buchegger.

University of Michigan Advanced Computer Architecture Lab. 2 CAD Tools for Variation Tolerance David Blaauw and Kaviraj Chopra University of Michigan.

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

Partial Reconfigurable Designs

Temperature and Power Management

Raghuraman Balasubramanian Karthikeyan Sankaralingam

Alireza Shafaei, Shuang Chen, Yanzhi Wang, and Massoud Pedram

2014 Spring ASIC/SOC Design

Abbas Rahimi‡, Luca Benini†, and Rajesh Gupta‡ ‡CSE, UC San Diego

Supervised Learning Based Model for Predicting Variability-Induced Timing Errors Xun Jiao, Abbas Rahimi, Balakrishnan Narayanaswamy, Hamed Fatemi, Jose.

Hot Chips, Slow Wires, Leaky Transistors

Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Anne Pratoomtong ECE734, Spring2002

Fine-Grain CAM-Tag Cache Resizing Using Miss Tags

Challenges in Nanoelectronics: Process Variability

Timing Analysis 11/21/2018.

M.S. Thesis Defense Murali Dharan Advisor: Dr. Vishwani D. Agrawal

Circuits Aging Min Chen( ) Ran Li( )

Dual Mode Logic An approach for high speed and energy efficient design

Post-Silicon Tuning for Optimized Circuits

Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.

Circuits Aging Min Chen( ) Ran Li( )

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults Slavko Vasilic Dr Mladen Kezunovic Texas A&M University.

FPGA Glitch Power Analysis and Reduction

Post-Silicon Calibration for Large-Volume Products

Department of Electrical Engineering

Hardware Assisted Fault Tolerance Using Reconfigurable Logic

Low Power Digital Design

A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P

FAULT-TOLERANT TECHNIQUES FOR NANOCOMPUTERS

Abbas Rahimi‡, Luca Benini†, and Rajesh Gupta‡ ‡CSE, UC San Diego

Fault Mitigation of Switching Lattices under the Stuck-At Model

Presentation transcript:

TRAMS PROJECT PTC meeting June 23rd 2011 WP3 progress FP7 248789

TASK 3.1 TASK 3.2 TASK 3.3

DYNAMIC REDUNDUNDANCY INFORMATION REDUNDANCY MECHANISMS TO IMPROVE RELIABILITY FOR A YIELD OF 90% AND DIFFERENT CELL PPF YIELD = 90% NO REDUNDUNDANCY (0%) DYNAMIC REDUNDUNDANCY RECONFIGURATION (0-100%) HARDWARE REDUNDANCY RMR (200% …) INFORMATION REDUNDANCY ECC SEC-DED (37.5%) PPF = (10-3 … 1) 3

DYNAMIC REDUNDUNDANCY INFORMATION REDUNDANCY MECHANISMS TO IMPROVE RELIABILITY FOR A YIELD OF 90% AND DIFFERENT CELL PPF YIELD = 90% NO REDUNDUNDANCY (0%) DYNAMIC REDUNDUNDANCY RECONFIGURATION (0-100%) D3.1 D3.2 HARDWARE REDUNDANCY RMR (200% …) D3.4 D3.2 INFORMATION REDUNDANCY ECC SEC-DED (37.5%) PPF = (10-3 … 1) 4

DYNAMIC REDUNDUNDANCY INFORMATION REDUNDANCY Minimum size (D3.6) MECHANISMS TO IMPROVE RELIABILITY FOR A YIELD OF 90% AND DIFFERENT CELL PPF YIELD = 90% NO REDUNDUNDANCY (0%) DYNAMIC REDUNDUNDANCY RECONFIGURATION (0-100%) HARDWARE REDUNDANCY RMR (200% …) INFORMATION REDUNDANCY ECC SEC-DED (37.5%) PPF = (10-3 … 1) 32nm 22nm 18,16nm 13nm 45nm 5

DYNAMIC REDUNDUNDANCY INFORMATION REDUNDANCY Optimized size (D3.6) MECHANISMS TO IMPROVE RELIABILITY FOR A YIELD OF 90% AND DIFFERENT CELL PPF YIELD = 90% NO REDUNDUNDANCY (0%) DYNAMIC REDUNDUNDANCY RECONFIGURATION (0-100%) 18,16nm HARDWARE REDUNDANCY RMR (200% …) 22nm 13nm INFORMATION REDUNDANCY ECC SEC-DED (37.5%) PPF = (10-3 … 1) 32nm 45nm 6

Task 3.1 Mitigation mechanisms D3.1 Report on mitigation mechanism mechanims at layout and circuit level D3.2 Report on new architectures based on redundancy

D3.1 Report on mitigation mechanism mechanims at layout and circuit level V and R mitigating mechanisms at layout level: regularity, proximity effect, dimensions roughness Mechanisms to mitigate the effect of degradation mechanisms, (mainly NBTI and PBTI): Temperature control, voltage scaling, relaxation phases, strained channels, starting burning (Esteve Amat). 3. Robustness enhancement techniques considering Voltage biasing and device sizing TRANSISTOR LEVEL: Width and length, Same aspect ratio, Same area, Threshold voltag CIRCUIT LEVEL: VDD and VSS, Full boost, Partial boost VBL ARCHITECTURAL LEVEL: Word dimension, Column dimension 4. Performance variability mitigation mechanisms considering device sizing All the work will be concentrated, as main vehicle, on 6T SRAM cells, and the Si-bulk technologies investigated in D3.6. Potentially other cells (like 3T1D) and other technologies (like Finfet) could be included.

D3.2 Report on new architectures based on redundancy Fault-Tolerant Architectures (uncertainties: permanent and transient faults) Redundancy (1956 Von Neumann) Static Redundancy (built into the system, masks the faults effects) Dynamic Redundancy (fault detection, location, containment, recovery) Space (Hardware) Reconfiguration T3.2 D3.4 Time T3.1 D3.2 Information

Well-known HR-techniques R-Fold Modular Redundancy (RMR) Cascaded R-Fold Modular Redundancy (CRMR) R-Fold Interwoven Redundancy (RIR) Multiplexing Techniques Voters

Averaging Design Averaging and Adaptable Thresholding Decision Gates Inspired by Biological Systems Numerous cells, autonomous, significant variability, sensitive to external factors, faulty elements Evolvable in time and space Redundancy and plasticity for learning and adaption Robustness to overcome deficient components and transmission lines Analog computation Averaging and thresholding Perceptron as an Artificial Neural Network (ANN)

Averaging Design Adaptive Averaging Cell (AD-AVG) Adaption according to optimum weights

INFORMATION REDUNDANCY (CODES) (D3.2 too) Modified Berger adaptive codes for high efficient K-error correction.

D3.3 (M24) Mechanisms to detect latency

Idea Behind the Concept In essence, a system that can discretize chips based on static variations and sense dynamic variations over time. Prime Requirements Use gradient sensing rather than absolute sensing. Minimum area over head. Should be hidden from program execution (shadowed). Provide a platform for cross-layer optimizations.

Characteristics of the 3T1D Cells We place the 3T1D next to a 6T on the same wordline to measure the access time and leakage. Transistors of 3T1D are sized to make sure they have similar access times as the 6T to avoid synchronization and control overhead. The access time variation with temperature is similar to 6T but at high temperatures it performs better. The retention time (function of leakage), can vary as much as 7X across chips maintained at same temperature due to high dependence of leakage of physical parameters such as channel length and threshold.

Embedded 3T1D (Shadow Cell) Due to close proximity, it is safe to assume that both 3T1D and 6T will suffer the same amount of intra-die and systematic variation. As read access lies on the critical path, by measuring this we can have a rough estimation about performance of a given block. As retention time is a strong function of leakage, by approximating the retention time it is possible to estimate the leakage (dominant source of power).

Power/Performance Binning - Result We tested the proposal on a 32KB cache built with shadow cells and complete memory periphery with 45nm PTM. Nearly 40% chips fall into high-performance low-power indicating the goodness of the yield. It can be noted that max-power low-performance bin has no chips. This is characteristic to our scheme. As we use a grouped-bin scheme, the bounds of every bin are very loose. Thus lot of chips in that bin are distributed into adjacent bins along the cartesian co-ordinates.

Conclusions and Future Work Reliability is restored by making circuits aware of their composition. Power/Performance is improved by providing fine-grain guardbands. Variation-tolerant 3T1D cells can be used for classification based on power/performance. The existing scheme can track both high and low frequency variations. On-going Work Understand the impact of random variations and see if there is scope for classification within a memory array. Determine the total number of cells required for monitoring the entire cache structure. Extend the scheme as standalone mechanism for logic. Provide this information to the above layers (microarchitecture or OS) for cross-layer optimizations.

D3.4 (M24) Compensating/reconfiguration mechanism to reduce variability and improve reliability

Directions Resiliency in high-variability scenarios through run-time sensing and dynamic fine-grain body bias to leverage recoverable errors. Use bias to increase speed or reduce power Decode-directed fine-grain tuning. Activate each block with the optimal bias/error detection combination Proactive reconfiguration

Expected Outcomes Lifetime adaptability: Through the use of sensors and feedback from error correcting codes, the PRMU detects any reduction/increase in the power, performance or error rate of each block. Architecture enabled to dynamically adapt the circuits to meet the performance and power goals set. PVT impact reduction and Yield enhancement: Due to the dynamic self-tuning adaptability of the architecture.

Potential: DFGBB at the L1 cache level Experimental framework 32KB cache under process variations at 45nm (PTM), 1KB blocks. σ for systematic and random variation of Vth is 6.4% and 3.7% for Leff. Inter-die variations are 3%. Vdd=1V. 500 samples simulated on HSPICE.

Potential: DFGBB at the L1 cache level Effect on yield of FGBB FBB voltages and corresponding yield

Conclusions Extensive use of forward and reverse bias provides inherent resiliency to fabrication and run-time variations. Plus, it offers an “on/off” functionality and extraordinary power savings. Other reconfiguration techniques under evaluation.

New proactive reconfiguration mechanisms to improve reliability and enlarge litetime.

Proactive Recovery/Reconfiguration Motivation: BTI mechanisms are recoverable The redundancy is used to allow non faulty microarchitecture to be temporarily deactivated and activated on a rotating basis. Transitioning between two modes of active mode and recovery mode Has advantages over reactive mode specially in recoverable mechanisms like NBTI/PBTI: Prolongs the time that failure happens Balancing the life time amount all units

Recovery Methods Recovery modes: Natural recovery Power off Strong Recovery Single pair 2 transistors Double pair 4 transistors 4PR worst/ 4PR best 4 Transistor recovery circuit

Preliminary conclusions about proactive reconfiguration: In a memory system with spare parts and reconfiguration mechanism, proactive continuous proactive reconfiguration allows an increase of lifetime Drawback: overhead Potential advantage 7X lifetime enlarge (for a typical case)

WP3. T3.3 Task 3.3 Design Flow for timing monitor insertion for runtime monitoring of an ASIC during synthesis. [Task Leader: IMEC (13 pms) Duration: M13-M24]. Design flow for the automatic insertion of timing monitors in RTL descriptions for near-failure timing violation detection. Input: RTL level system description of the ASIC Output: Synthesized description of the block including monitoring circuitry. D3.5: Report on the method that instantiates the monitor insertion in ASICs descriptions. (IMEC) T0+24

Two main types of low cost: Timing Monitors (Reactive) Copyright IMEC Two main types of low cost: Timing Monitors (Reactive) G O R G O R DFF DFF != Logic Image source: http://publications.csail.mit.edu/abstracts/abstracts07/cadlerun/cadlerun.html Notes: Papers of Reactive-Proactive Monitors Razor, Crystal ball, .... etc DFF clock T. Austin, D. Blaauw, T. Mudge, K. Flautner, “Making typical silicon matter with Razor”, IEEE Computer Society, Vol. 37, Iss.3, pp.57-65, March 2004.

Two main types of low cost: Timing Monitors (Proactive) Copyright IMEC Two main types of low cost: Timing Monitors (Proactive) G O R Sh G O R DFF DFF != Logic Sh Image source: http://publications.csail.mit.edu/abstracts/abstracts07/cadlerun/cadlerun.html Notes: Papers of Reactive-Proactive Monitors Razor, Crystal ball, .... etc DFF clock M. Eireiner, et al “In-Situ Delay Characterization and Local Supply Voltage Adjustment for Compensation of Local Parametric Variations”, IEEE Journal of Solid-State Circuits, Vol.42, No.7, July 2007

RT level monitor insertion flow Single Step RTL2RTL Monitor (knob) circuit in RTL SoC Design in RTL Elaboration script Selected Statistical Critical Paths Tool Properties Verify After Monitor insertion Automated insert, connect, &route Single step before synthesis Builds on the top of standard existing tools Non invasive: no change in design interface Transparent to designer Re-use the original TestBenches Automated extended testing RTL-Netlist Synthesis and Place & Route

Next steps Implement the flow Apply it to existing RTL ASIC design (imec internal) Benchmark the outcome: Area overhead Power overhead Design time overhead