Radiation Tolerance Studies using Fault Injection on the Readout Control FPGA Design of the ALICE TPC Detector Johan Alme Bergen University College, Norway.

Slides:



Advertisements
Similar presentations
TPC / PHOS / HLT Readout Electronics overview Annual Evaluation Meeting for CERN-related Research in Norway November, 2004 University of Oslo Kjetil.
Advertisements

Sana Rezgui 1, Jeffrey George 2, Gary Swift 3, Kevin Somervill 4, Carl Carmichael 1 and Gregory Allen 3, SEU Mitigation of a Soft Embedded Processor in.
Scrubbing Approaches for Kintex-7 FPGAs
Discussion of: “Terrestrial-based Radiation Upsets: A Cautionary Tale” CprE 583 Tony Kuker 12/06/05.
Multi-Bit Upsets in the Virtex Devices Heather Quinn, Paul Graham, Jim Krone, Michael Caffrey Los Alamos National Laboratory Gary Swift, Jeff George, Fayez.
ICAP CONTROLLER FOR HIGH-RELIABLE INTERNAL SCRUBBING Quinn Martin Steven Fingulin.
Chapter 4 Quality Assurance in Context
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Normal text - click to edit Status Report TPC Electronics Meeting, CERN Johan Alme & Ketil Røed, UoB.
June 19, 2002 A Software Skeleton for the Full Front-End Crate Test at BNL Goal: to provide a working data acquisition (DAQ) system for the coming full.
The RCU2 ALICE TPC readout electronics consolidation for Run 2 Johan Alme Bergen University College, Norway on behalf of the ALICE-TPC collaboration TWEPP.
Fault Prediction and Software Aging
Testing safety-critical software systems
CSC271 Database Systems Lecture # 20.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
1 FLIPPER SEU Fault Injection in Xilinx FPGAs Monica Alderighi National Institute for Astrophysics, IASF Milano, Italy Computing.
Normal text - click to edit RCU – DCS system in ALICE RCU design, prototyping and test results (TPC & PHOS) Johan Alme.
Summary of the Workshop on FPGAs for High-Energy Physics
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
Normal text - click to edit Configuring of Xilinx Virtex-II Kjetil Ullaland, Ketil Røed, Bjørn Pommeresche, Johan Alme TPC Electronics meeting. CERN
Reconfiguration Based Fault-Tolerant Systems Design - Survey of Approaches Jan Balach, Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010.
POLITECNICO DI MILANO Reconfiguration 4 Reliability design methodology for reliability assessment and enhancement of FPGA-based systems Dynamic Reconfigurability.
Experience from using SRAM based FPGAs in the ALICE TPC Detector and Future Plans Johan Alme – for the ALICE TPC Collaboration FPGA.
Front-end Electronics for the Alice Detector Kjetil Ullaland Department of Physics and Technology, University of Bergen, Norway NFR meeting, University.
J. Christiansen, CERN - EP/MIC
Status-report TPC Electronics Meeting Dieter Röhrich, Kjetil Ullaland, Ketil Røed, Mattias Richter, Sebastian Bablok, Johan Alme.
Tools - Implementation Options - Chapter15 slide 1 FPGA Tools Course Implementation Options.
FT-UNSHADES Analysis of SEU effects in Digital Designs for Space Gioacchino Giovanni Lucia TEC-EDM, MPD - 8 th March Phone: +31.
2/2/2009 Marina Artuso LHCb Electronics Upgrade Meeting1 Front-end FPGAs in the LHCb upgrade The issues What is known Work plan.
RCU Status 1.RCU design 2.RCU prototypes 3.RCU-SIU-RORC integration 4.RCU system for TPC test 2002 HiB, UiB, UiO.
MAPLD 2005/202 Pratt1 Improving FPGA Design Robustness with Partial TMR Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael.
Bernardo Mota (CERN PH/ED) 17/05/04ALICE TPC Meeting Progress on the RCU Prototyping Bernardo Mota CERN PH/ED Overview Architecture Trigger and Clock Distribution.
Peter JansweijerATLAS week: February 24, 2004Slide 1 Preparatory Design Studies MROD-X Use Xilinx Virtex II Pro –Rocket IO –Power PC –Port the current.
Wang-110 D/MAPLD SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA Mandy M. Wang JPL R&TD Mobility Avionics.
Analytical Approach for Soft Error Rate Estimation of SRAM-Based FPGAs Ghazanfar (Hossein) Asadi and Mehdi B. Tahoori Why Soft Error Rate (SER) Estimation?
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
Upgrade Radiation Issues Christopher O’Grady For the DCH Electronics Upgrade Group Based on work by Jerry Va’vra.
Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.
DAQMB Status – Onward to Production! S. Durkin, J. Gu, B. Bylsma, J. Gilmore,T.Y. Ling DAQ Motherboard (DMB) Initiates FE digitization and readout Receives.
Software Quality Assurance and Testing Fazal Rehman Shamil.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
FPGAs in ATLAS Front-End Electronics Henrik Åkerstedt, Steffen Muschter and Christian Bohm Stockholm University.
FPGA Co-processor for the ALICE High Level Trigger Gaute Grastveit University of Bergen Norway H.Helstrup 1, J.Lien 1, V.Lindenstruth 2, C.Loizides 5,
Calorimeter Digitisation Prototype (Material from A Straessner, C Bohm et al) L1Calo Collaboration Meeting Cambridge 23-Mar-2011 Norman Gee.
First Performance Results of the ALICE TPC RCU2 Chengxin Zhao TWEPP September - Lisbon, Portugal 1 Chengxin Zhao University of Oslo, Norway On.
Frankfurt (Germany), 6-9 June 2011 Manuel Avendaño J. V. Milanović Manuel Avendaño – UK – Session 2 – Paper 0529 METHODOLOGY FOR FLEXIBLE, COST-EFFECTIVE.
Rutherford Appleton Laboratory September 1999Fifth Workshop on Electronics for LHC Presented by S. Quinton.
ROM. ROM functionalities. ROM boards has to provide data format conversion. – Event fragments, from the FE electronics, enter the ROM as serial data stream;
Ketil Røed - LECC2005 Heidelberg Irradiation tests of the ALICE TPC Front-End Electronics chain Ketil Røed Faculty of Engineering, Bergen University.
The ALICE TPC Readout Control Unit 10th Workshop on Electronics for LHC and future Experiments 13 – 17 September 2004, BOSTON, USA Carmen González Gutierrez.
Chandrasekhar 1 MAPLD 2005/204 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan.
Ketil Røed University of Bergen - Department of Physics Ketil Røed MSc student, microelectronics University of Bergen Norway Irradiation tests of Altera.
Gu Minhao, DAQ group Experimental Center of IHEP February 2011
Firmware for the CPLD on the RCU
Overview of the project
Status of the Front-End Electronics and DCS for PHOS and TPC
Radiation Tolerance of an Used in a Large Tracking Detector
A microTCA Based DAQ System for the CMS GEM Upgrade
RCU3 –> RCU4 New Schematics
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Torsten Alt, Kjetil Ullaland, Matthias Richter, Ketil Røed, Johan Alme
M. Aguirre1, J. N. Tombs1, F. Muñoz1, V. Baena1, A. Torralba1, A
Fault Tolerance Distributed Web-based Systems
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
Sheffield team interests
RADIATION induced failures in LHC 28th June 2011
Upset Susceptibility and Design Mitigation of
Single Event Upset Simulation
Xilinx Kintex7 SRAM-based FPGA
PLANNING A SECURE BASELINE INSTALLATION
Presentation transcript:

Radiation Tolerance Studies using Fault Injection on the Readout Control FPGA Design of the ALICE TPC Detector Johan Alme Bergen University College, Norway on behalf of the ALICE-TPC collaboration TWEPP 2012, Oxford, UK September 2012

Overview System overview Fault injection Purpose of the fault injection test Test setup and test flow Results What did we learn? 1 TWEPP 2012, Johan Alme

ALICE detector TPC detector 2 TWEPP 2012, Johan Alme

ALICE TPC Readout Electronics 216 Readout Control Units (RCUs) One RCU: –RCU Motherboard –Detector Control System (DCS) Board –Source Interface Unit card (SIU) –2 branches of backplanes with up to 25 Front End Cards. Commercial FPGAs used. –The radiation environment is a concern 3 TWEPP 2012, Johan Alme

RCU main FPGA The RCU main FPGA sits in the datapath Data readout is handled by the Readout Node –92% CLBs –75% BRAM blocks (Remaining 25% BRAM can not be used due to the Active Partial Reconfiguration) –Result: TMR or any other mitigation techniques are almost not applicable Readout NodeControl Node 4 TWEPP 2012, Johan Alme

Reconfiguration Network Consists of: –A radiation tolerant flash memory, a radiation tolerant flash based FPGA and the DCS board – an Embedded PC with Linux. Corrects SEUs in the Xilinx Virtex-II pro vp7 Why it works: –Active Partial Reconfiguration How it works: –RCU support FPGA reads one frame at the time from the flash memory and Xilinx configuration memory. –The frames are compared bit by bit. –If a difference is found, the faulty frame is overwritten. Keyword: Flexibility 5 TWEPP 2012, Johan Alme

What is Fault Injection? In context of FPGA design: –Fault injection means injecting bitflips in the configuration memory of the FPGA. Purpose: Simulation of radiation related effects. Pros: –Low cost –Simple –Great tool to heighten radiation tolerance during development phase Cons –Sensitivity of the technology is not possible to measure –A systematic test including all possible bit-locations takes time –Not all elements in the FPGA can be tested. 6 TWEPP 2012, Johan Alme

Purpose of the Fault Injection Test 1.Estimate the radiation sensitivity of the RCU main FPGA design 2.Estimate an expected rate of functional failures in the RCU main FPGA as a function of integrated luminocity Two categories of functional failures are recognized: –Reliability faults: System crashes leading to a stop in the data taking for the complete ALICE detector. –Performance faults: Errors in the datastream from the RCU experiencing an SEU. Loss of data Corrupted data 7 TWEPP 2012, Johan Alme

Fault Injection – Test Setup Injection of bitflips* are done using Active Partial Reconfiguration Software for fault injection on the DCS board enables a test setup identical to real life * K. Røed, J. Alme, and D. Fehlker et al., Fault injection as a test method for an FPGA in charge of data readout for a large tracking detector, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 629, no. 1, pp. 260 – 268, TWEPP 2012, Johan Alme

Fault Injection – Test Procedure Data from real events recorded at the ALICE detector are uploaded to Front End Cards One bitflip per event All events & bitflips are logged 9 TWEPP 2012, Johan Alme

RCU Main FPGA Configuration File GCLK, IOB, IOI & LE BRAM interconnect FPGA Editor: Gives an impression of the logic resources used in the RCU main FPGA The black square is the embedded hardcore CPU. 10 TWEPP 2012, Johan Alme

Injected Bitflip Distribution SEUs leading to reliability faultSEUs leading to performance fault BRAM IC frames BRAM IC frames Plots shows bitflips leading to observable functional faults. 11 TWEPP 2012, Johan Alme

Results (I) Type of ErrorTotal # FaultsFault/SEU[%]SEUPI [SEUs/fault] All103415,02~19,9 Reliability22101,07~93,5 Performance81313,94~25,4 Loss of data24991,21~82,6 Data corrupted56322,73~36,6 Number of bitflips injected: –Coverage 6.5% Xilinx conservative estimate: SEUPI = 10 SEUs/fault –Result is in the expected range SEUPI Reliability faults = ~93.5 SEUs/fault –Most functional faults are not critical for the operation of the ALICE detector! 12 TWEPP 2012, Johan Alme

Results (II) Same distribution for each fault type. 53 SEUs gives*: >90% risk to get any functional failure >15% risk to get a reliability fault *Run 2010 (09. Aug 2011): Integrated Luminocity nb -1  Number of SEUs = nb -1 * 0.49 SEUs/nb -1 = 52.8 SEUs 13 TWEPP 2012, Johan Alme

Functional faults vs Integrated Luminocity RCU support FPGA offers an opportunity to gather statistics of SEUs experienced during operation May – August 2011*: –Total number of SEUs (216 RCUs): 1552 –Clear correlation between SEU count and integrated luminocity –Mean value: 0.49 SEU/nb -1 Estimated number of functional failures in the period May – August –16.6 reliability faults –Error rate: 0.13 reliability faults/day * K. Røed, J. Alme, and D. Fehlker et al., First measurement of single event upsets in the readout control FPGA of the ALICE TPC detector, Journal of Instrumentation, vol. 6, no. 12, p. C12022, 2011 Logged and analyzed faults in the period 1. May – 16. June: –~5 faulty situations with high probability that an SEU is the cause –Error rate: 0.11 reliability faults/day 14 TWEPP 2012, Johan Alme

What did we Learn? (present) With the current RCU main FPGA design: –Statistically, 93.5 SEUs are needed in the RCU main FPGA to crash the ALICE data taking –Expected number of reliability faults (crashes): –The fault injection study is important to understand and interpret error situations that happen during daily operations of the ALICE detector. Fault injection is an excellent tool for increasing the robustness of the design against radiation related errors. –Changes to design  Repeat fault injection 15 TWEPP 2012, Johan Alme

What did we Learn? (future) Upgrades of the Electronics are currently being discussed –Focus: Higher data rate – ”more physics” The fault injection study shows that the functional failure rate can become a limiting factor given that the Integrated luminocity increases & the electronics are not upgraded Conclusion: The radiation environment must be taken into account concerning upgrades of the Electronics. Could FPGAs be recommended for the discussed upgrade or similar applications in the future? –Given the fact that close to no mitigation techniques are implemented in the RCU due to area constraints the answer is YES. –And: New FPGA series/tools are more mature concerning radiation tolerance However : –If using FPGAs it might be wise to move as much as possible of complex functionality outside the radiation area. 16 TWEPP 2012, Johan Alme

Thanks for Listening Johan Alme (Bergen University Ketil Røed (University of Oslo) Dominik Fehlker, Kjetil Ullaland, Attiq Ur Rehman (University of Bergen) Christian Lippmann, Magnus Mager (GSI Frankfurt) 17 TWEPP 2012, Johan Alme