Radiation Effects and Mitigation Strategies for modern FPGAs 10 th annual workshop for LHC and Future experiments Los Alamos National Laboratory, USA.

Slides:



Advertisements
Similar presentations
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Advertisements

FPGA (Field Programmable Gate Array)
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Sana Rezgui 1, Jeffrey George 2, Gary Swift 3, Kevin Somervill 4, Carl Carmichael 1 and Gregory Allen 3, SEU Mitigation of a Soft Embedded Processor in.
Scrubbing Approaches for Kintex-7 FPGAs
Xilinx CPLDs and FPGAs Module F2-1. CPLDs and FPGAs XC9500 CPLD XC4000 FPGA Spartan FPGA Spartan II FPGA Virtex FPGA.
Multi-Bit Upsets in the Virtex Devices Heather Quinn, Paul Graham, Jim Krone, Michael Caffrey Los Alamos National Laboratory Gary Swift, Jeff George, Fayez.
Radiation Effects on FPGA and Mitigation Strategies Bin Gui Experimental High Energy Physics Group 1Journal Club4/26/2015.
HPEC 2012 Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing Quinn Martin Alan George.
1 Fault Tolerant FPGA Co-processing Toolkit Oral defense in partial fulfillment of the requirements for the degree of Master of Science 2006 Oral defense.
ICAP CONTROLLER FOR HIGH-RELIABLE INTERNAL SCRUBBING Quinn Martin Steven Fingulin.
Survey of Reconfigurable Logic Technologies
DC/DC Switching Power Converter with Radiation Hardened Digital Control Based on SRAM FPGAs F. Baronti 1, P.C. Adell 2, W.T. Holman 2, R.D. Schrimpf 2,
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
Fault-Tolerant Softcore Processors Part I: Fault-Tolerant Instruction Memory Nathaniel Rollins Brigham Young University.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Lecture 2: Field Programmable Gate Arrays I September 5, 2013 ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Evolution of implementation technologies
Dynamic Power Consumption In Large FPGAs WILLIAM GARCIA, ANDREW MORTELLARO.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Benefits of Partial Reconfiguration Reducing the size of the FPGA device required to implement a given function, with consequent reductions in cost and.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
FPGA and CADs Presented by Peng Du & Xiaojun Bao.
Power Reduction for FPGA using Multiple Vdd/Vth
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 4 Programmable.
Section II Basic PLD Architecture. Section II Agenda  Basic PLD Architecture —XC9500 and XC4000 Hardware Architectures —Foundation and Alliance Series.
Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.
M. Adinolfi – University of Oxford – MAPMT Workshop – Imperial College 27 June Rad-Hard qualification for the LHCb RICH L0 electronics M. Adinolfi.
System Arch 2008 (Fire Tom Wada) /10/9 Field Programmable Gate Array.
Reconfiguration Based Fault-Tolerant Systems Design - Survey of Approaches Jan Balach, Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010.
SiLab presentation on Reliable Computing Combinational Logic Soft Error Analysis and Protection Ali Ahmadi May 2008.
1 Moore’s Law in Microprocessors Pentium® proc P Year Transistors.
J. Christiansen, CERN - EP/MIC
® SPARTAN Series High Volume System Solution. ® Spartan/XL Estimated design size (system gates) 30K 5K180K XC4000XL/A XC4000XV Virtex S05/XL.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Programmable Logic Devices
Vendor Independent SEE Mitigation Solution For FPGAs Kamesh Ramani Pravin Bhandakkar Darren Zacher Melanie Berg (MEI – NASA Goddard)
P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
MAPLD 2005/202 Pratt1 Improving FPGA Design Robustness with Partial TMR Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael.
Swankoski MAPLD 2005 / B103 1 Dynamic High-Performance Multi-Mode Architectures for AES Encryption Eric Swankoski Naval Research Lab Vijay Narayanan Penn.
Synthesis Of Fault Tolerant Circuits For FSMs & RAMs Rajiv Garg Pradish Mathews Darren Zacher.
BR 1/991 Issues in FPGA Technologies Complexity of Logic Element –How many inputs/outputs for the logic element? –Does the basic logic element contain.
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering.
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Xilinx V4 Single Event Effects (SEE) High-Speed Testing Melanie D. Berg/MEI – Principal Investigator Hak Kim, Mark Friendlich/MEI.
Chandrasekhar 1 MAPLD 2005/204 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan.
MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.
Issues in FPGA Technologies
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
MAPLD 2005 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan Dr. V. Kamakoti.
SEU Mitigation Techniques for Virtex FPGAs in Space Applications
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
We will be studying the architecture of XC3000.
The Xilinx Virtex Series FPGA
Evaluation of Power Costs in Triplicated FPGA Designs
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
Dynamic High-Performance Multi-Mode Architectures for AES Encryption
The Xilinx Virtex Series FPGA
Presentation transcript:

Radiation Effects and Mitigation Strategies for modern FPGAs 10 th annual workshop for LHC and Future experiments Los Alamos National Laboratory, USA

Introduction FPGA benefits in instrumentation design –High density logic –User configurable SRAM and antifuse technologies popular Reliability issues in radiation environments –Latchup –Single event upsets (SEUs) –Multiple bit upsets (MBUs)

Introduction Fault mitigation strategies –Scrubbing SRAM devices (Xilinx specific) Periodic readback and verification Some limits on readback –RAM contention –Half latch constant generation –Fault tolerant design techniques Triple module redundancy (TMR) –Entire design vs. persistent logic –Effectiveness in the face of MBUs difficult to quantify

FPGA Architecture (Xilinx Vertex) SRAM based devices –RAM bits control configuration Logic definition Signal routing Xilinx Vertex family –Configurable logic blocks (CLB) Split into two slices –Look-up tables (LUT)s define logic –Flip flops and carry generation –Routing matrix Pass transistor and buffered connections between CLBs Generous supply of global and local interconnect

FPGA Architecture (Xilinx Vertex) Vertex family (continued) –Block RAM 4K bit blocks Configurable in various widths –I/O blocks (IOB) Many I/O standards supported I/O registers CLB BLOCK RAM IOB 24 To/From Adjacen t CLB To/From CLB 6 position s away Switch boxes

FPGA Architecture (Xilinx Vertex) RAM utilization –Configuration dominates –Sparsely utilized Rarely more than 30% Even in designs where logic is fully utilized –Still dominates by an order of magnitude Virtex XCV1000 memory Utilization Memory Type # of bits% Configuration5,810, Block RAM131, CLB flip-flops26,1120.4

FPGA Architecture (Xilinx Vertex) Half-latch or weak keepers –Provide constants –Save logic resources –Used throughout device –Subject to SEU upset Can reset over time –Not observable Not defined by configuration bits –Reinitialized as part of device initialization Full reconfiguration required Configuration Bits Half-latch T1 T2 T3 A

Failure Modes Latchup –Parasitic bipolar transistors Created as a by product of CMOS fab techniques When activated, short power to ground –Can burn out the device –Epitaxial processing eliminates parasitics Eliminates latchup completely –Lower Vcc decreases vulnerability Bipolar transistors barely forward biased –Xilinx V2 (1,5 Vcc) is latchup immune to 160MeV

Failure Modes Single event upsets (SEUs) –Logic Content Usually manifested as a “glitch” Can be persistent in a feedback element –Counter or ALU –Logic Configuration Altered logic definition Always persistent –Usually results in undesirable operation –Routing Statistically most probable Always persistent –Least likely to result in logic failure

Failure Modes Single event functional interrupts –Power on reset or other global function Usually results in immediate functional interrupt –Device needs to be reconfigured –JTAG or other configuration interface Can inhibit or corrupt readback operations –Device reset required to restore test functionality Multiple bit upsets (MBUs) –Multiple configuration bits altered Can defeat fault tolerant design (TMR)

Mitigation Techniques Scrubbing –Readback and verification of configuration Sets limits on duration of upsets –Partial configuration Supported by Vertex family Allows fine grained reconfiguration Does not reset entire device –Allows user logic to continue to function –Complete reconfiguration Required after SEFI No user functionality for the duration of reconfiguration

Triple Module Redundancy Simple triple module redundancy Three copies of user logic Two of three voting on output –Counter example Simple TMR handles faults –Cannot resynchronize on the fly –Requires logic reset after repair –OK for stateless logic Counter Voter

Triple Module Redundancy Feedback TMR Three copies of user logic State feedback from voter –Counter example Handles faults Resynchronizes –Operational through repair Speed penalty due to feedback Desirable for state based logic Counter Voter

Triple Module Redundancy Feedback TMR can be SEU immune –Must TMR clocks as well –Scrubbing frequency provides upset rate tolerance –For low SEU rates, fault probability becomes SEFI rate –Xilinx has automated TMR tool in beta test Unfortunately, MBUs also occur –Can defeat TMR –Current TMR tools do not floorplan –Occur.1% on vertex, up to 2% on vertexII –Implications still under investigation

Triple Module Redundancy TMR costs –Triple logic utilization At least 3x logic utilization Need to floorplan for MBU resistance –Also for operation during repair No fully automated tool at present –Triple power consumption SRAM devices already inefficient –Slower operation Feedback TMR inherently slower Worse when floorplaning requirements taken into account

Other TMR Techniques Selective TMR –Identify persistent, or state based logic –TMR only these sections Other critical sections may also be TMRed –Application dependent –Subject of ongoing development and test 90% of full TMR performance (preliminary result) Much lower device utilization, power, etc Automated tool in development

Other Pitfalls (virtex) Half-Latches –Unobservable failure mode –Requires device reinitialization to reset –Design tools insert automatically No switch to stop software from inserting them –Los Alamos has developed removal tool Works on completed design –Can fail when design is heavily utilized –Too memory inefficient for largest virtexII devices

Other Pitfalls (virtex) Block RAM has shared output register –Readback can collide with user logic RAM cannot be verified by scrubbing User logic must handle RAM verification Distributed RAM has shared output as well –Similar collision problem Clock delay lock loop module –Status bits inaccurate during upset related failures

Alternatives Antifuse –Configuration based on physical shorts Invulnerable to upset Cannot be altered –Over 90% smaller upset cross section for comparable geometry –Signal routing more efficient Much lower power dissipation for similar device geometry –Lags SRAM in fabrication technology Usually one generation behind Latch up more of a problem than in SRAM devices

Alternatives Rad-hard Antifuse –All flip-flops TMRed in silicon Unmatched reliability High cost Unimpressive performance –Feedback TMR built in –Usually larger geometry –Not available in highest densities offered by antifuse –Some devices even have TMRed RAM Not ECC, but self correcting feedback TMR

When to Use Antifuse Where requirements are well known –Also stable over time Logic density does not exceed what is available –About 2M gates currently Where power consumption is critical –Also low noise Many mixed mode designs and analog/digital front ends

When to use SRAM In system reprogrammability required –Unstable requirements –Desire for generic hardware Cost of TMR and scrubbing tolerated –Schedule does not allow for proper system engineering –NRE for TMRed hardware small compared to total system NRE Fluid hardware/software functional tradeoff

Conclusion FPGAs can be used in elevated Radiation –Errors can be detected and corrected –Fault tolerant design can be utilized TMR can produce designs virtually immune to upset SRAM devices are the only choice for in system reprogrammability Antifuse is naturally more radiation tolerant –A natural choice if reprogrammability not required