Federico Alessio, CERN Zbigniew Guzik, IPJ, Swierk, Poland Richard Jacobsson, CERN A 40 MHz Trigger-free Readout Architecture for the LHCb experiment 16th IEEE-NPSS Real Time Conference, May 2009, Beijing, China
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. LHC Instantaneous luminosity in IP: tunable from 2x10 32 cm -2 s -1 to 5x10 32 cm -2 s -1 (factor 50 less than nominal LHC lumi) Future of LHCb “The LHCb Readout System and Real-Time Event Management”, TDA2-1, Thursday, 8h30 Expected ∫ L = 10 fb -1 collected after 5 years of operations Probe/measure NewPhysics at 10% level of sensitivity Measurements limited by statistics and detector itself, NOT BY LHC Federico Alessio2 S-LHC Collect ∫ L = 100 fb -1 a factor 10 increase in data sample and in reasonable time probe NewPhysics down to a percent level Increase luminosity by a factor LHCb, up to 2x10 33 cm -2 s -1 assuming same bunch structure, 30 MHz S-LHCb effective interaction rate vs. 10 MHz LHCb 1 MHz bb-pair S-LHCb vs. 100 LHCb
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. How to survive? How? Original LHCb performance as a baseline, new technologies for sub- detectors to be replaced More radiation hard Reduced spill-over Improved granularity Continuous 40 MHz Trigger-free Readout Architecture all detector data passed through the readout network all detector data available for High-Level Trigger (HLT) In practice Federico Alessio3 Pile-up problem: Current LHCb not designed for multiple interactions per crossing = 2x10 32 cm -2 s -1 and = 20x10 32 cm -2 s -1 Higher radiation damages over time Spill-over not minimized completely First-level trigger limited for hadronic modes at >2x10 32 cm -2 s -1 25% efficiency vs. 75% for muonic modes Increase hadron trigger efficiency by at least a factor 2 At S-LHCb 1 MHz bb-pair rate Trigger-free
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. SWITCH HLT farm Detector Timing & Fast Control System SWITCH READOUT NETWORK LHC clock MEP Request Event building Front-End CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU Readout Board VELO STOT RICH ECalHCalMuon SWITCH MON farm CPUCPU CPUCPU CPUCPU CPUCPU Readout Board L0 trigger L0 Trigger LHCb Readout SystemUpgraded Rethink/ Redraw/ Adapt/ Upgrade/ Replace Federico Alessio4 FE Electronics
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. No L0-trigger Point-to-point bidirectional high-speed optical links Same technology and protocol type for readout, TFC and throttle Reducing number of links to FE by relaying ECS and TFC information via ROB THROTTLE TFC DATA BANK EVENT REQUESTS FEFARM TFC ROB L0 TRIGGER TFC DATA TFC DATA LHC CLOCK ECS THROTTLE TFC DATA BANK EVENT REQUESTS FEFARM TFC ROB L0 TRIGGER TFC DATA TFC DATA LHC CLOCK ECS Architectures, Old vs New Federico Alessio5 S-FES-FARM S-TFC DATA TFC DATA BANK EVENT REQUESTS S-ROB S-ECS TFC, DATA, ECS TFC, THROTTLE S-LHC CLOCK
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. Need to define protocols. Very likely the readout link FE-ROB and the protocol will be based on CERN-GigaBitTransceiver (GBT) Need to define buffer sizes and truncation scheme to be compliant with the worst scenario possible (big consecutive events which could overflow memories). Need to fully control the phase of the recovered clock at the FE. Necessary reproducibility of the clock phase each time the system is switched off/on The jitter of the reconstructed clock must be very small (< 10ps RMS). Need to control the rate in order to allow a “staged” installation Partitioning as a crucial aspect for parallel stand-alone tests and sub-detectors development (test-bench support) Overall Requirements Federico Alessio6
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. The S-FE records and transmits 40 MHz, via optical 4.8 Gb/s (3.2 Gb/s data) Implications for Front-End It is necessary that Zero Suppression is performed in rad-hard FE Asynchronous data transfer Data has to be tagged with identifiers in header Realigned in Readout Boards Federico Alessio7 Courtesy Ken Wyllie, LHCb NZS data, event size is = ~16TB/s!! S-FE logical scheme
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. Timing and Fast Control (1) Federico Alessio8 Readout system requires timing, synchronization and various synchronous and asynchronous commands Receive, distribute and align LHC clock and revolution frequency to readout electronics Transmit synchronous reset commands, calibration sequences and control the latency of commands Back-pressure mechanism from S-ROB to handle network congestion 1. Effectively, throttle the readout rate 2. Possibly implementing an “intelligent” throttle mechanism, capable of distinguish interesting physics events locally in each S-ROB
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. Farm has to grow in size, speed and bandwidth Destination Control for the event packets in order to let the S-ROBs know where to send the event (to which IP address) Request Mechanism (EVENT REQUESTS) to let the destination controller in the TFC system know if a node is available or not. The definition of such a readout scheme is a “push protocol with a passive pull” A data bank has to contain info about the identity of an event and trigger source information. This info is added to each event (TFC DATA BANK) New TFC system (prototype of S-TFC) has to be ready well before the rest of the electronics in order to allow development and testing, and validate conformity with the overall specs Timing and Fast Control (2) Federico Alessio9
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. S-TFC Architecture (i.e. the new s-heartbeat of the LHCb experiment) Federico Alessio10
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. S-TFC Master S-TFC Interface link TFC control fully synchronous 2.4 Gb/s (max MHz 3.0 Gb/s) 1.Reed Solomon-encoding used on TFC links for maximum reliability (header ~16 bits) (ref. CERN-GBT) 2.Asynchronous data TFC info must carry Event ID Throttle(“trigger”) protocol 1. Must be synchronous (currently asynchronous) Protocol will require alignment TFC control protocol incorporated on link between S-FE and S-ROB (i.e. CERN GBT) Federico Alessio11 S-TFC Protocols S-TFC Interface S-ROB Copper or backplane technology (In practice 20 HI-CAT bidirectional links) TFC synchronous control protocol same as S-TFC Master S-TFC Interface One GX transmitter with external transmitter 20x-fan-out (PHYs - electrical) Throttle(“trigger”) protocol using 20x SERDES interfaces <1.6 Gb/s
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. Need to define protocols. Very likely the readout link FE-ROB and the protocol will be based on CERN-GigaBitTransceiver (GBT) Need to define buffer sizes and truncation scheme to be compliant with the worst scenario possible (big consecutive events which could overflow memories). Need to fully control the phase of the recovered clock at the FE Necessary reproducibility of the clock phase each time the system is switched off/on The jitter of the reconstructed clock must be very small (< 10ps RMS). Need to control the rate in order to allow a “staged” installation Partitioning as a crucial aspect for parallel stand-alone tests and sub-detectors development (test-bench support) Overall Requirements Federico Alessio12
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. Federico Alessio13 Reaching the requirements: phase control Use of commercial electronics: Clock fully recovered from data transmission (lock-to-data mode) Phase adjusted via register on PLL Jitter mostly due to transmission over fibres, could be minimized at sending side 1. Use commercial or custom-made Word-Aligner output2. Scan the phase of clock within “eye diagram” Still investigating feasibility and fine precision
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. Federico Alessio14 Simulation Full simulation framework to study buffer occupancies, memories sizes, latency, configuration and logical blocks
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. Federico Alessio15 Summary New approach towards a 40 MHz trigger-free architecture Evaluated old system and carry over experience Use of point-to-point optical link technology for the entire readout system Maximum level of flexibility reached by using FPGA-based boards no need of complex routing, no need of big number of “different” boards GX transceiver as IP cores from Altera No First-Level Trigger and no direct link to FE from TFC Essential to fully control the phase and latency of the clock and TFC info Validation with prototype is underway TFC System prototypes must be ready before any other Developing full simulation framework Thanks for your attention
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. Backup Federico Alessio16
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. Intro: Giving out Numbers From today (2009) to the near future (2013): Clock rate of 40 MHz, effective rate of events of 10MHz Expects to collect 10 fb -1, which allows for wide range of analysis, with high sensitivity to new physics. Foreseen spectacular progress in heavy flavour physics Readout Supervisor based on 4-FPGAs fully programmable (total of ~25k logical elements) and customizable (40/80 MHz clock speed and output based on 1 Gbit/s cards) Readout network based on optical links (200 MB/s, ~400 links) ~16000 CPU cores foreseen for the LHCb Online Farm; ~1000 3GHz Intel Harpertown quad-cores (~4500 individual cores) at present Storage system of > 50 TB at 400/500 MB/s Uninterrupted readout of data at 1MHz, effective reduction of factor 10 in selecting events. Event size of 3,5x10 4 Bytes “dumped” in the Grid. Full (and 100% reliable) Readout Control System in place (ETM PVSS II). It is able to start, configure and control some ~20000 elements between FARM and FEE and ROBs Federico Alessio17
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. Board with one big central FPGA (Altera Stratix IV GX or alt. Stratix II GX for R&D) Instantiate a set of TFC Master cores to guarantee partitioning control for sub- detectors TFC switches is a programmable patch fabric: a layer in FPGA no need of complex routing, no need of “discrete” electronics Shared functionalities between instantiations (less logical elements) More I/O interfaces based on bidirectional transceivers depend on #S-ROBs crates No direct links to FE Common server that talks directly to each instantiation: TCP/IP server in NIOS II Flexibility to implement (and modify any protocol) GX transceiver as IP cores from Altera Bunch structure (predicted/measured) rate control State machines for sequencing resets and calibrations Information exchange interface with LHC Federico Alessio18 S-TFC Master, specs
16th IEEE NPSS Real Time Conference, May 2009, Beijing, China. Board with FPGA entirely devoted to fan-out TFC information/fan-in throttle info Controlled clock recovery Shared network for Throttling (Intelligent) & TFC distribution All links bidirectional 1 link to S-TFC Master, Gb/s, optical 1 link per S-ROB, 20 max per board (full crate) Technology for S-ROBs links could be backplane (ex. xTCA) or copper HI-CAT Protocol flexible: compatibility with flexibility of S-TFC Master We will provide the TFC transceiver block for S-ROBs’ FPGA to bridge data to FE through readout link S-FE S-ROB For stand-alone test benches, the Super-TFC Interface would do the work of a single TFC Master instantiation Federico Alessio19 S-TFC Interface, specs