Calliope-Louisa Sotiropoulou FTK: E RROR D ETECTION AND M ONITORING Aristotle University of Thessaloniki FTK WORKSHOP, ALEXANDROUPOLI: 10/03/2014.

Slides:



Advertisements
Similar presentations
Microprocessors A Beginning.
Advertisements

Fault-Tolerant Delay-Insensitive Inter-Chip Communication Yebin Shi Apt Group The University of Manchester.
Buffered Data Processing Procedure Version of Comments MG / CCSDS Fall Meeting 2012 Recap on Previous Discussions Queue overflow processing.
Chapter 5: Loops and Files.
June 2006Juan A. Valls - FPA Project1 Producción y validación de los RODs Read-Out Driver (ROD)
DSP online algorithms for the ATLAS TileCal Read Out Drivers Cristobal Cuenca Almenar IFIC (University of Valencia-CSIC)
1 Computer System Overview OS-1 Course AA
Computer System Overview
HCAL FIT 2002 HCAL Data Concentrator Status Report Gueorgui Antchev, Eric Hazen, Jim Rohlf, Shouxiang Wu Boston University.
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
FTK poster F. Crescioli Alberto Annovi
DARPA Digital Audio Receiver, Processor and Amplifier Group Z James Cotton Bobak Nazer Ryan Verret.
Rabie A. Ramadan Lecture 3
SVT workshop October 27, 1998 XTF HB AM Stefano Belforte - INFN Pisa1 COMMON RULES ON OPERATION MODES RUN MODE: the board does what is needed to make SVT.
Elad Hadar Omer Norkin Supervisor: Mike Sumszyk Winter 2010/11, Single semester project. Date:22/4/12 Technion – Israel Institute of Technology Faculty.
Chapter 4 TIMER OPERATION
Prototype Test of SPring-8 FADC Module Da-Shung Su Wen-Chen Chang 02/07/2002.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Top Level View of Computer Function and Interconnection.
U N C L A S S I F I E D FVTX Detector Readout Concept S. Butsyk For LANL P-25 group.
LANL FEM design proposal S. Butsyk For LANL P-25 group.
CPT Week, April 2001Darin Acosta1 Status of the Next Generation CSC Track-Finder D.Acosta University of Florida.
Time Management.  Time management is concerned with OS facilities and services which measure real time, and is essential to the operation of timesharing.
AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.
1 ARM University Program Copyright © ARM Ltd 2013 Using Direct Memory Access to Improve Performance.
FPGA firmware of DC5 FEE. Outline List of issue Data loss issue Command error issue (DCM to FEM) Command lost issue (PC with USB connection to GANDALF)
1 Programming of FPGA in LiCAS ADC for Continuous Data Readout Week 5 Report Tuesday 29 th July 2008 Jack Hickish.
Trigger Meeting: Greg Iles5 March The APV Emulator (APVE) Task 1. –The APV25 has a 10 event buffer in de-convolution mode. –Readout of an event =
Loops and Files. 5.1 The Increment and Decrement Operators.
CERN, 18 december 2003Coincidence Matrix ASIC PRR Coincidence ASIC modifications E.Petrolo, R.Vari, S.Veneziano INFN-Rome.
SCT Bytestream Hacking Bruce Gallop RAL High mu upgrade - 16 th May 2012.
Digital System Design using VHDL
GAN: remote operation of accelerator diagnosis systems Matthias Werner, DESY MDI.
Lecture 4 General-Purpose Input/Output NCHUEE 720A Lab Prof. Jichiang Tsai.
FPLD Decoder: Components & Functions Florida State University Roberto A Brown 6/11/99.
1 FTK AUX Design Review Functionality & Specifications M. Shochet November 11, 2014AUX design review.
TELL1 command line tools Guido Haefeli EPFL, Lausanne Tutorial for TELL1 users : 25.February
1 Level 1 Pre Processor and Interface L1PPI Guido Haefeli L1 Review 14. June 2002.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
11 th April 2003L1 DCT Upgrade FDR – TSF SessionMarc Kelly University Of Bristol On behalf of the TSF team Firmware and Testing on the TSF Upgrade Marc.
Chapter 3 System Buses.  Hardwired systems are inflexible  General purpose hardware can do different tasks, given correct control signals  Instead.
Straw readout status Status and plans in Prague compared with situation now Choke and error Conclusions and plans.
General Tracker Meeting: Greg Iles4 December Status of the APV Emulator (APVE) First what whyhow –Reminder of what the APVE is, why we need it and.
Trigger Matrix Shiuan-Hal. Trigger (“times 4”) μ xhodoscopexhodoscope yhodoscopeyhodoscope discriminatorlevel shifter level 1 v1495 level 2 v1495 trigger.
Evelyn Thomson Ohio State University Page 1 XFT Status CDF Trigger Workshop, 17 August 2000 l XFT Hardware status l XFT Integration tests at B0, including:
Software for tests: AMB and LAMB configuration - Available tools FTK Workshop – Pisa 13/03/2013 Daniel Magalotti University of Modena and Reggio Emilia.
WINLAB Open Cognitive Radio Platform Architecture v1.0 WINLAB – Rutgers University Date : July 27th 2009 Authors : Prasanthi Maddala,
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
FPGA based signal processing for the LHCb Vertex detector and Silicon Tracker Guido Haefeli EPFL, Lausanne Vertex 2005 November 7-11, 2005 Chuzenji Lake,
Buffering Techniques Greg Stitt ECE Department University of Florida.
The Monitoring Problem Firmware for the Lost Synchronization Detection Project Type: MC-IAPP Industry Academia Partnerships and Pathways Project Name:
1 Programming of FPGA in LiCAS ADC for Continuous Data Readout Week 4 Report Tuesday 22 nd July 2008 Jack Hickish.
Firmware development for the AM Board
Federico Lasagni Manghi - University of Bologna
IAPP - FTK workshop – Pisa march, 2013
ATLAS Pre-Production ROD Status SCT Version
Basic Processor Structure/design
Alberto Valero 17 de Diciembre de 2007
OLD LOGIC AMBSlim5.
* Initialization (power-up, run)
2018/6/15 The Fast Tracker Real Time Processor and Its Impact on the Muon Isolation, Tau & b-Jet Online Selections at ATLAS Francesco Crescioli1 1University.
Iwaki System Readout Board User’s Guide
Multilevel Memories (Improving performance using alittle “cash”)
Lecture 25 More Synchronized Data and Producer/Consumer Relationship
Early Stage Researcher: Panos Neroutsos
CSCI206 - Computer Organization & Programming
PID meeting Mechanical implementation Electronics architecture
Multi Chip Module (MCM) The ALICE Silicon Pixel Detector (SPD)
Preliminary design of the behavior level model of the chip
Presentation transcript:

Calliope-Louisa Sotiropoulou FTK: E RROR D ETECTION AND M ONITORING Aristotle University of Thessaloniki FTK WORKSHOP, ALEXANDROUPOLI: 10/03/2014

FTK Tools for Monitoring Two front approach: Tools for error detection CRC, Invalid Input, Lost Sync, FIFO overflow, Truncated output Tools for performance monitoring Execution cycles measurement 2

FTK HW Tools for Monitoring Synchronization Logic Module Spy Buffers and their “freeze” logic Error Registers and Flags VME (and not only) Error Registers Time measurements (execution cycles) 3

Synchronization Logic Module (Sync Module) FPGA logic Input FIFOs An End Event (EE) word, which includes the event tag, separates hits belonging to different events. Data in different streams have to be synchronized to guarantee that hits belonging to the same event are being processed by the AM patterns. It also applies to boards with multiple-parallel data streams. As soon as the first EE word arrives in one stream it is stopped and waits for all streams to receive an EE word. EE words must match or  Lost Sync 4

Synchronization Logic Module (Sync Module) Issues to be addressed/decided: Report only “Lost Sync” or which streams have “Lost Sync”? First case: Just compare all EE words and report if they don’t match Second case: Compare with a reference EE word and report which streams don’t match together with the reference EE word How do we decide which is the “valid” (reference) EE word? Internal counter: Increment LVL1ID by one for each event Must make sure even empty events are received and take into account the LVL1ID reset after overflow (how?) Majority vote: Identify which is the EE word in the majority of streams and consider this one to be the “valid” More complex logic but error proof, not as fast? (to be looked into) 5

Sync Module – Old Version First version for the AMBFTK during summer by Dimitris Currently under development for the AMBSLP – Panos presentation next 6

Spy Buffers: what are they? Pointer: incremented each time a word is popped from FIFO or sent to output. When it overflows it wraps around and an ‘overflow flag’ is set → circular memory TWO MODES: SPY or FREEZE To be read by VME Copying data during run Hold INPUT FIFOs as derandomizers

INPUT FPGAs (AMBFTK) Spy Buffer Location 8

Spy Buffers Issues to be addressed/decided: Size In the Input chip (“HIT”) we have 12 streams: 12 Input FIFOs + 12 VME FIFOS + 12 Spy buffers  quite a lot of buffering Replace the VME FIFOs with the Spy Buffers? Format Word size is different for the Input and different for the Output of the AMB. If storing of extra information is decided (e.g. timing information) then an extra info word per data word could be added Behavior When should we freeze the Spy Buffers? 9

Spy Buffer Freeze Two cases One bit in the EE word received on input stream means “freeze immediately after you have finished to process the current event”. The event to be monitored will be chosen by DF that will set the EE bit into all FTK streams In case of a severe error : Freeze is sent immediately to the previous board together with the event tag meaning “Freeze after processing current event”. Or freeze as soon as the freeze is received. 11

FTK Monitoring Requirements (AMB examples) CRC error : for each link (12 streams) ‘checksum’ could be monitored (not currently supported). Error detection should be registered in a 12 bit word FIFO Overflow : each FIFO full flag should produce error if set. Again 12 bit word. Invalid Input data : for example invalid HIT from ROD (?) Lost Synchronization : event tags in different streams do not match – 12 bit word Truncated output : too many roads in output – 16 bit word 12

FTK: Common Error Word Proposal 13 In the FTK Monitoring Kick Off Meeting it was proposed to use a common error word format in the whole FTK system This word should be propagated from one board to the next, being updated by every board’s error status 32bits available for error identification in the EE word Use 16 bits for the error bits Use 16 bits for the board encoding (Identifying the board that caused the error)

FTK: Common Error Word Proposal 14 Error bits format Use the 8 least significant bits (LSBs) for General Errors (Common to all boards) Use the 8 most significant bits (MSBs) for Board Specific Errors General (Common) Error bits (bits 0 – 7) CRC Error FIFO Overflow Loss of Sync Truncated Output Invalid Input Data Internal Overflow (Two bits still available)

FTK: Common Error Word Proposal 15 Board Specific Error Bits (bits 8 – 15) Some of the Board Specific Error Flags could trigger a General (Common) Error Flag (e.g. FTK_IM Pixel Clustering Error Flags) Dropped Hits  Set Common Invalid Input Data Full LIFO  Set Common Internal Overflow Full Circ Buff  Set Common Internal Overflow FE order  Set Common Invalid Input Data Loss of Sync  Set Common Loss of Sync

FTK: Common Error Word Proposal 16 Board Encoding (Board Identification) Use 16 bits to identify the board causing the error Use the 8 least significant bits to encode the boards using one bit per board. Each Board in the pipeline will use an “OR” to add its bit in the error word. FTK_IM DF (for errors received from another DF board) DF (for errors caused by the current DF board) AUX AMB pSSB fSSB FLIC Will the 2 types of SSB boards have separate monitoring or not?

FTK: Common Error Word Proposal 17 Tower Encoding (Tower Identification) Use the 8 most significant bits to identify the tower Starting from the AUX board the identifier will be “tower_number mod 8” which will return a number from 0 up to 7. This will be transformed to a single bit: 0  bit 8 1  bit 9 2  bit 10 3  bit 11 etc. So each tower from an 8 tower group will have one specific bit

FTK: Common Error Word Proposal 18 Tower Encoding (Tower Identification) In the FLIC 8 channels are received and each channel propagates the information of 8 towers Therefore if one bit is assigned to each tower and an “OR” is used to propagate the tower identifier in the error word to the FLIC we will be able to trace how many and which towers produced an error by reading the tower identifier

FTK: Common Error Word Proposal 19 Error Bits: Board/Tower Identifier

FTK: Common Error Word Proposal 20 Example: There was a Loss of Sync in an AMB and an AUX on Tower 2 and/or Tower 5. (This could mean there was Loss of Sync in an AUX and an AMB on both towers, or in an AUX on Tower 2 and an AMB on Tower 5 etc. By reading the spy buffers or the VME error registers this will be clear.)

32bit Error Word 21 Example from FTK_IM: A Loss of Sync occurred in the Pixel Clustering Module of the FTK_IM

Time measurements (Local vs. Global) Local (per board) Implement a counter on each chip together with each Spy Buffer: Same size, same operating frequency Initialize on “Init_event”, start on first word received, stop on EE, store the value in the Spy Buffer  Execution time per event/per processing element and execution time per event/per board Extra local time measurements (if necessary) Time of empty FIFOs (initialize on “FIFO_empty”) Time of full/HOLD FIFOs (initialize on “FIFO_full”) Could be stored periodically in the Spy Buffer with a dedicated word 22

Time measurements (Local vs. Global) Global (System) Processing Time Do we want to have Global (System) Processing Time? Yes... Do we want to measure the data delay from the detector? Then we need the L1 accept at least for the DF board At this point we can only add up the processing time per event from each board (without interconnection delays etc.) just to get a very rough estimate Final Fit-HW 4 DOs AMBoard 4 TFs 4 HWs ROS DF FLI C 23

Conclusions FTK Monitoring Tools are somewhat defined and development has been initialized for most of the boards Questions that need addressing How extensive and strict should monitoring be (e.g. Loss of Sync reporting) Define an error word format  which are the error flags we actually need  Presented in last FTK weekly meeting The Error Word is currently being defined for the FTK_IM Define time measurements requirements 24