Digital systems and FPGAs in Experimental Particle Physics Tullio Grassi (Univ. of Maryland, USA) Laboratori Nazionali di Legnaro, 13 April 2011
22 Summary Introduction to digital electronics Typical environment in high-energy physics experiments Use of digital systems and FPGAs Problems and solutions 2
3 Analog versus Digital electronics Analog systems process time-varying signals that can take any value across a continuous range of voltage or current Digital systems process time-varying signals that can take only one of two* discrete values of voltage or current –Discrete values are called 1 and 0 (ON and OFF, HIGH and LOW, TRUE and FALSE, etc.) bit
4 Digital Logic: basic blocks Logic Gates Registers (flip-flop, memory element) : can store 1 bit D-FF D Q Clk Wires: carry a digital signal Switches
5 Sequential Circuits n Combinational circuit = combination of gates n Sequential circuit = combination of gates and registers Inputs Outputs Sequential circuit
6 Options for building digital circuits Discrete ASIC = Application Specific Integrated Circuit FPGA = Field Programmable Gate Array
7 FPGA Field Programmable Gate Array: –many simple Programmable Logic Blocks (sequential circuits) –fabric of Programmable Interconnects (wires + switches) –over time other features have been added (PLL, RAM, multipliers, CPU, etc) FPGA Architecture Interconnect (wires) Programmable logic block Programmable switch
8 FPGA vs ASIC FPGA AdvantagesASIC Advantages Faster time-to-market - no layout, masks or other manufacturing steps are needed Lower constant/initial expenses (NRE) Simpler design cycle - due to software that handles much of the routing, placement, and timing More predictable project cycle due to elimination of potential re-spins, wafer capacities, etc. Reprogramability: a new configuration can be uploaded Full custom capability (including analog) - since device is manufactured to design specs Lower unit costs - for very high volume designs Smaller form factor - since device is manufactured to design specs Higher clock speeds
9 FPGA vs CPU FPGA AdvantagesCPU Advantages more flexible processing more flexible input/outpt parallel processing multi-clock timing operations programming a CPU in normally easier than programming an FPGA (does not require to understand digital electronics) faster compilation easier code portability lower unit costs - for any volume Often FPGAs and CPUs are complementary: they co-exist in the same system and perform different tasks.
10 Types of FPGAs Technology of the programming element Vendors SRAM (Static RAM)Altera, Atmel, Xilinx, etc Anti-fuse: non reversible (one-time-programmable) Actel (now MicroSemi), Aeroflex? Flash - flash cells: floating gate Actel, Lattice EPROMObsolete ?
11 Environment of high- energy physics experiments We will focus on the environment of the LHC accelerator at CERN. This is the accelerator producing the highest radiation levels as of today. Experiments can experience the following conditions: Radiation: up to ~200 kGy (=20 Mrad) and n/cm 2 over a 10 year period. Magnetic field: up to 5 Tesla Limited access: as rare as one access every ~5 years Limited space limited cabling, limited cooling Limited material: this is to avoid modifications in the trajectory of the particles 11
12 Problems induced by radiation on digital systems 12 1.TID = Total Ionization Dose. Measured in Grey = Gy (or in rad: 1 rad = 100 Gy) 2.SEE = Single-Event Effects : SEL = Single-Event Latchup SEU = Single-Event Upset SET = Single-Event Transient SEFI = Single-Event Functional Interrupt SEGR = Single-Event Gate Rupture SEB = Single-Event Burnout on high-voltage or high-power electronics; not covered in this presentation
13 TID 13 Most commercial electronic components fail between 20 Gy and 2000 Gy. We divide hardware systems in three categories: Hardware exposed to < 10 Gy : Normally ok to use commercial components SEEs can be an issue Hardware exposed to > 2000Gy: need ASICs designed to be radiation-tolerant need big money there is a lot of literature not covered in this presentation Hardware exposed to intermediate levels (10 Gy, 2000 Gy) : big money is rarely available try to re-use parts meant for the two previous cases qualification of commercial parts use of rad-tol ASICs meant for other applications some dirty tricks
14 Designer’s point of view 14 Normally the goal of the designer is to develop a working system. minimize R&D reuse existing results reuse parts of other designs design results are shown within a collaboration but not necessarily published on certain topics the literature is limited
15 A study case: the CMS detector 15
16 Digital systems in the CMS detector 16 Sub-systemTID. Neutrons > 100 kEv Present system, Tracker [2]200 kGy n/cm 2 ASICs only ECAL [3,4]25 kGy.ASICs only HCAL [5]3 Gy n/cm 2 ASIC, Actel anti-fuse FPGA, commercial components Muon detectors [6, 7] 0.4 Gy. 5x10 10 n/cm 2 All types, including SRAM FPGAs. Counting room ~0All, not radiation-qualified
17 Digital systems in the LHCb detector 17 L0 sub-ystem [9, 10] TID. Neutrons (1 Mev equiv) Present system, Inner Tracker 60 kGy n/cm 2 ASIC Outer Tracker 70 Gy n/cm 2 ASIC Calorimeters50 Gy n/cm 2 Antifuse FPGA (Actel AX) Muon detectors [11] 80 Gy n/cm 2 ASIC, Actel ProAsicPlus, commercial components Counting room~0All (not radiation-qualified)
18 Commercial vs RadTol FPGAs 18 Some vendors (Microsemi, Xilinx, etc) sells commercial-grade FPGAs and RT-grade FPGAs. Typically RT-grade FPGAs are 10 times more expensive than commercial FPGAs. In particle physics experiments it is very rare to use RT-grade parts. Motivations: a failure will not cause damage to people a failure will not cause damage to properties outside the lab big number of channels compared to other applications Sometimes RadTol parts are labeled and sold as commercial parts...
19 SEU prevention in FPGAs 19 SEU on Flip-flops TMR, fault-tolerant FSM. There are two commercial synthesisers that can do automatic TMR of flip-flops, in order to prevent SEU: 1) Synplify: It is in use in Cern by a few groups, so far so good (circuits not yet deployed). 2) Precision RT: tolerant/ tolerant/ SEU on memory encoding
20 SET prevention in FPGAs 20 Prevention of SETs: TMR that includes combinatorial logic filtering with guard-gate Precision Rad-Tolerant can do TMR of combinatorial logic. Apparently this feature is not supported for Actel FPGA (as of today). Commercially available tools are evolving rapidily wrt SEU and SET keep watching. In the Microelectronics Section of CERN, some designers have been using a custom script that generates automatically TMR on registers and combinatorial logic. The script supports only Verilog 1995 designs. The script is available to people registered on the CERN FPGARadTol web page.
21 SEL prevention 21 A SEL is a latch-up caused by a particle crossing the circuit. It can happen on the internal nodes (while normal latchups occur mostly on the I/Os due to ESD). Most modern FPGAs are immune from SEL. But other commercial components can be affected by SEL. external SEL protection circuit.
22 SEL-protection circuits: a generic scheme 22 R (< 1 ohm) V CC_IN Voltage threshold Monostable circuit ~ 1 s PMOS SWITCH V CC SEL- sensitive circuit When a SEL-sensitive circuit develop a SEL, it draws more current. An external circuit can detected this situation and cycle the power. Problem: also the protection circuit can be affected by radiation. But being a simpler circuit, it is possible to design it so that it is very unlikely that it develops a problem.
23 Prevention of SEL [RepFIP card, LHC ] A few samples have been tested under radiation, in one case the component U9 (L4931CD25) has failed at 2x10 11 p/cm2 (corresponding to TID 100 Gy). A few dozen cards are presently installed. supplies various logic components Small R in series on the power line Threshold on V R29 current Mono-stable PMOS SWITCH
24 Prevention of SEL [AMS experiment] supplies various logic components MAX891 “smart” switch: now obsolete. Newer parts need to be qualified
25 SEFI SEFI = Single Event Functional Interrupt The definition can vary according to the authors, but it normally indicates an SEE which affects the entire device, for instance: power-on reset global reset, global tristate problems in the circuit that program the rest of the FPGA For an FPGA, it is difficult or impossible to mitigate SEFI within the FPGA design. SEFI could be mitigated at the system level. 25
26 SEEs on ASICs and FPGAs at LHC (LET < 40MeV∙cm 2 /mg [8]) 26 Technolo gy SELSEU on configuration SEU on user logic SET on configur ation SET on user logic SEFI RadTol ASIC Anti-fuse FPGA No [12]NoDepend on design ? No, on RTAX, Aeroflex Flash- based FPGA No, on recent families (ProASIC3x) [8]. NoDepend on design YesDepend on design No for LET < 96 MeV∙cm 2 / mg [14] SRAM- based FPGA No, on recent Xilinx families [12, 13]. Yes on Altera Yes scrubbing or reconfiguration required Depend on design YesDepend on design ? Strong interest on ProASIC3x for new designs. ASICs can mitigate all SEEs if properly designed
27 The following slides show some examples of problems peculiar to particle physics experiments, not related to radiations
28 Huge data acquisition systems 28 Millions of sensors are built read out with different technologies. Nonetheless sensor data must correspond to the same event all channels must be re-synchronized with the main system clock (40MHz) and realigned in time to the same event huge “synchronous” system. Mostly custom hardware and interface troubleshooting it can be a nightmare. CMS Sub-systemNumber of sensor channels Number of readout fibers Pixel (Tracker) [15] 66· ·10 3 Si strips (Tracker) 11· ·10 3 ECAL + Preshower 200·10 3 HCAL 9·10 3 3·10 3 Muon detectors 600·10 3
29 Time-to-Digital-Conversion (TDC) A TDC is a system which converts the timing of the occurrence of an event (“hit”) into a digital representation of it. The measurement is relative to a reference instant. The TDC itself is a purely digital system, which follow a discriminator (sometimes a sophisticated discriminator, e.g. CFD). Normally the term TDC is used when the resolution is better than a clock cycle. TDCs are common in high energy physics experiments, while they are very unusual in other fields. In other fields, it is normally sufficient to measure the time by mean of a simple clock counter.
30 TDC implementations Various approaches to obtain sub-clock resolution [16]: time stretching method time-to-amplitude method Vernier method tapped delay line method differential delay line method wave union method TDCs tend to have a big dependency on process, voltage, temperature variations. Traditionally they are designed as ASIC, but in the last decade TDC have been implemented on FPGAs. Recent results and performance: TECHNOLOGYRESOLUTION ASIC1ps [17] SRAM-based FPGA10 ps [18] Flash-based FPGANo literature. CMS and LHCb upgrades working on a ~1ps TDC (not optimized).
31 References 31 [2] “Neutron damage studies of semiconductor lasers for the CMS tracker optical data links”, K. Gill et al. [3] “The Electromagnetic Calorimeter of CMS, Summary and Status”, Werner Lustermann [4] “The MGPA Electromagnetic Calorimeter Readout Chip for CMS “, M.Raymond et al. [5] “Radiation Validation of CMS HCAL ESR”, internal presentation by Julie Whitmore [6] “EMU DAQ MotherBoard”, internal presentation by Jianhui Gu [7] “TECHNICAL PROPOSAL FOR THE UPGRADE OF THE CMS DETECTOR THROUGH 2020”, version of 2011/01/14 [8] Private communication, Federico Faccio [9] [10] [11] “Muon Off Detector Electronics Board”, A. Balla, P. Ciambrone [12] “Radiation Effects in FPGAs,” J. Wang, in 9th Workshop on Electronics for LHC Experiments, October [13] “Radiation test results of the Virtex FPGA and ZBT SRAM for Space Based Reconfigurable Computing”, MAPL [14] ”RT ProASIC3: The Low-Power, Non-Volatile, Re-programmable and Radiation-Tolerant Flash-based FPGA”, Sana Rezgui, 2010 CMOS Emerging Technologies Workshop, [15] “Commissioning and performance of the CMS pixel tracker with cosmic ray muons”, 2010 JINST 5 T [16] “Review of methods for time interval measurements with picosecond resolution”, Józef Kalisz, Institute of Physics Publishing Metrologia, vol 41 pp17 ~ [17] Jen-Chien Hsu and Chauchin Su, “BIST for Measuring Clock Jitter of Charge-Pump Phase-Locked Loops,” IEEE Trans. Instrum. Meas., vol. 57, no. 2, Feb. 2008, pp. 276–285. [18] “The 10-ps Wave Union TDC: Improving FPGA TDC Resolution beyond Its Cell Delay”, Jinyuan Wu et al.,