HPEC 2012 Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing Quinn Martin Alan George.

Slides:



Advertisements
Similar presentations
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Advertisements

RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
FPGA (Field Programmable Gate Array)
Hao wang and Jyh-Charn (Steve) Liu
Introduction to Programmable Logic John Coughlan RAL Technology Department Electronics Division.
Sana Rezgui 1, Jeffrey George 2, Gary Swift 3, Kevin Somervill 4, Carl Carmichael 1 and Gregory Allen 3, SEU Mitigation of a Soft Embedded Processor in.
10/14/2005Caltech1 Reliable State Machines Dr. Gary R Burke California Institute of Technology Jet Propulsion Laboratory.
Scrubbing Approaches for Kintex-7 FPGAs
Discussion of: “Terrestrial-based Radiation Upsets: A Cautionary Tale” CprE 583 Tony Kuker 12/06/05.
Multi-Bit Upsets in the Virtex Devices Heather Quinn, Paul Graham, Jim Krone, Michael Caffrey Los Alamos National Laboratory Gary Swift, Jeff George, Fayez.
Radiation Effects on FPGA and Mitigation Strategies Bin Gui Experimental High Energy Physics Group 1Journal Club4/26/2015.
1 Fault Tolerant FPGA Co-processing Toolkit Oral defense in partial fulfillment of the requirements for the degree of Master of Science 2006 Oral defense.
ICAP CONTROLLER FOR HIGH-RELIABLE INTERNAL SCRUBBING Quinn Martin Steven Fingulin.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
Reconfigurable Computers in Space: Problems, Solutions and Future Directions Neil W. Bergmann, Anwar S. Dawood CRC for Satellite Systems Queensland University.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Micro-RDC Microelectronics Research Development Corporation A Programmable Scrubber for FPGAs ACKNOWLEDGMENT OF SUPPORT: This material is based upon work.
Fault-Tolerant Softcore Processors Part I: Fault-Tolerant Instruction Memory Nathaniel Rollins Brigham Young University.
EECE579: Digital Design Flows
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Introduction to Reconfigurable Computing CS61c sp06 Lecture (5/5/06) Hayden So.
Lecture 2: Field Programmable Gate Arrays I September 5, 2013 ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
Configurable System-on-Chip: Xilinx EDK
Configuration. Mirjana Stojanovic Process of loading bitstream of a design into the configuration memory. Bitstream is the transmission.
Introduction to FPGA and DSPs Joe College, Chris Doyle, Ann Marie Rynning.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
General FPGA Architecture Field Programmable Gate Array.
Dr. Konstantinos Tatas ACOE201 – Computer Architecture I – Laboratory Exercises Background and Introduction.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
With Scott Arnold & Ryan Nuzzaci An Adaptive Fault-Tolerant Memory System for FPGA- based Architectures in the Space Environment Dan Fay, Alex Shye, Sayantan.
Radiation Effects and Mitigation Strategies for modern FPGAs 10 th annual workshop for LHC and Future experiments Los Alamos National Laboratory, USA.
Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan
12004 MAPLD: 141Buchner Single Event Effects Testing of the Atmel IEEE1355 Protocol Chip Stephen Buchner 1, Mark Walter 2, Moses McCall 3 and Christian.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
Normal text - click to edit Configuring of Xilinx Virtex-II Kjetil Ullaland, Ketil Røed, Bjørn Pommeresche, Johan Alme TPC Electronics meeting. CERN
Reconfiguration Based Fault-Tolerant Systems Design - Survey of Approaches Jan Balach, Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010.
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
J. Christiansen, CERN - EP/MIC
2/2/2009 Marina Artuso LHCb Electronics Upgrade Meeting1 Front-end FPGAs in the LHCb upgrade The issues What is known Work plan.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
MAPLD 2005/202 Pratt1 Improving FPGA Design Robustness with Partial TMR Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael.
Basic Sequential Components CT101 – Computing Systems Organization.
“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)
Wang-110 D/MAPLD SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA Mandy M. Wang JPL R&TD Mobility Avionics.
Rinoy Pazhekattu. Introduction  Most IPs today are designed using component-based design  Each component is its own IP that can be switched out for.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Petrick_P2261 Virtex-II Pro SEE Test Methods and Results David Petrick 1, Wesley Powell 1, James Howard 2 1 NASA Goddard Space Flight Center, Greenbelt,
LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering.
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
Aerospace Conference ‘12 A Framework to Analyze, Compare, and Optimize High-Performance, On-Board Processing Systems Nicholas Wulf Alan D. George Ann Gordon-Ross.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Xilinx V4 Single Event Effects (SEE) High-Speed Testing Melanie D. Berg/MEI – Principal Investigator Hak Kim, Mark Friendlich/MEI.
Programmable Logic Devices
CFTP ( Configurable Fault Tolerant Processor )
SEU Mitigation Techniques for Virtex FPGAs in Space Applications
Radiation Tolerance of an Used in a Large Tracking Detector
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Anne Pratoomtong ECE734, Spring2002
Lecture 41: Introduction to Reconfigurable Computing
Reconfigurable FPGAs for Space – Present and Future
Xilinx Kintex7 SRAM-based FPGA
Presentation transcript:

HPEC 2012 Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing Quinn Martin Alan George

SOAP 2 Background  FPGAs and Radiation in Space  Traditional Scrubbing Methods SOAP Approach  Mission Parameters  Markov Models Mission Case Studies Results Conclusions

FPGAs 3 Field-Programmable Gate Arrays (FPGAs)  Implement custom digital logic hardware with fabric of logic resources and interconnect Lookup tables (LUTs) implement combinational logic User flip flops (FFs) implement sequential logic Switch and connection boxes route among resources  Many are reconfigurable Allows update of routing and logic state Partial reconfiguration can update partition of device E.g., Virtex from Xilinx and Stratix from Altera

Reconfigurable FPGAs in Space 4 Advantages  Very high performance/power ratio  Reconfigurable (fully and partially) Adaptable to changing environments and mission requirements Can update design after launch Disadvantages  Relatively difficult to design/test applications  Configuration memory vulnerable to radiation Can change application processor architecture in unpredictable way Must repair upsets via configuration scrubbing

Radiation Effects on FPGAs 5 Single-event Effects (SEE)  Single-event Latchup (SEL) – Causes current spike that may damage device  Single-event Upset (SEU) – Changes state of bit(s), e.g. from logic ‘0’ to ‘1’ Can be single-bit upset (SBU) or multi-bit upset (MBU)  Single-event Functional Interrupt (SEFI) – Like SEU, but affecting critical device resource Total Ionizing Dose  Degrades performance over time leading to eventual device failure

Xilinx V-5/V-6 Configuration 6 Programmed via SelectMAP interface  Runtime configuration interface  Also allows readback of existing configuration  32 bits per configuration word  Parallel bus width of 8, 16, or 32 bits  Max clock frequency 100 MHz Configuration memory arranged in frames  Minimum unit of access to config. memory  Virtex-5 – 41 words per frame  Virtex-6 – 81 words per frame

FPGA Scrubbing 7 FPGA Configuration Scrubbing  Quickly repairs SEUs before accumulation Accumulation defeats redundancy strategies (e.g., TMR) Fast repair can prevent SEUs from manifesting as errors  Can be decomposed into basic scrubbing techniques Correction techniques repair upsets Detection techniques discover and locate upsets

FPGA Scrubbing Techniques 8 Correction Techniques  Golden Copy – Repairs configuration based on know “golden” copy (e.g., in rad-hard PROM)  Frame ECC – Repairs based on per-frame error syndrome code stored on-chip Detection Techniques  Frame ECC – Detects based on per-frame SECDED Hamming code  CRC-32– Detects using device-wide CRC-32

FPGA Scrubbing Strategies 9 Scrubbing Strategies  Any combination of detection and correction techniques with controller to implement algorithm  Blind Scrubbing – Golden copy correction only  Readback Scrubbing – Some detection technique used

FPGA Scrubbing Strategies 10

SOAP Approach 11 Scrubbing Optimization via Availability Prediction (SOAP)  Uses system availability as primary metric for scrubbing efficacy  Models scrubbing strategies as Markov diagrams  Vary free parameters to find optimal scrubbing system Environmental parameters λ and α (orbits) System parameters B and f CCLK (memory and pin constraints) Scrubbing parameters μ and γ (device configuration capability)

SOAP Approach 12

Environmental Parameters 13 λ - SEU rates for devices in various orbits of interest  Calculated per-bit and per-device using CREME96 α – Correction factors for single-bit and multi- bit upsets (SBU/MBU)  From beam tests on Virtex-5 devices

System Parameters 14 Factors chosen by the system designer based on available memories, power budget, etc. Affect scrubbing detection and correction rates (see equations on next slide) B – Configuration bus width in bits f CCLK – Configuration clock speed in Hz

Scrubbing Parameters 15 μ – Repair rate for scrubbing technique (per second) γ – Detection rate for scrubbing technique (per second)

Markov Algorithm Models 16 Blind  No detection Built-in CRC-32  Basic detection Frame ECC with CRC-32  CRC acts as “safety net” for upsets undetected by Frame ECC Frame ECC with CRC-32 and Essential Bits (EB)  Only scrubs errors that may be critical

Blind Scrubbing 17

Readback CRC-32 Scrubbing 18

CRC-32 w/ Frame ECC Scrubbing 19

Case Study 20 Applies SOAP method to hypothetical systems with realistic parameters Devices  Xilinx Virtex-5  Xilinx Virtex-6 Orbits  ISS low earth orbit (LEO)  Molniya highly elliptical orbit (HEO) 8-bit SelectMAP bus at 33 MHz  Accounts for access speed of slow rad-hard PROM

Case Study 21 Two mission types  Non upset critical (non-UC) – System continues to run upon detection and correction of upset Only count critical upsets as system “unavailable”  Upset critical (UC) – System requires reset upon detection of upset to ensure state integrity Requires detection All detected upsets render system unavailable for reset period Will benefit from essential bits mask used in detection

Non-UC Results 22 Continuous blind scrubbing offers highest availability CRC-32 offers similar availability with low implementation complexity Frame ECC suffers because TBUs can be falsely corrected, resulting in further errors

UC Results 23

UC Results 24

Results 25 Frame ECC with CRC-32 and Essential Bits mask offers highest availability  Roughly one extra nine over other methods  Xilinx-provided soft-error mitigation (SEM) core implements similar strategy Other strategies still competitive  Complex state machine or software and additional memory required for Frame ECC/EB  Model does not account for vulnerability associated with internal scrubbing

Conclusions 26 Predicts availability for various FPGA scrubbing strategies on real and hypothetical platforms Uses analytical models rather than experimentation  Markov availability modeling with parametric approach  Allows optimization of scrubbing strategy during design phase In case study, blind scrubbing best for non-UC and Frame ECC with EB mask best for UC