SuperB FCTS/DAQ Protocol Steffen Luitz and Gregory Dubois-Felsmann SuperB Computing Workshop, Frascati 12/17/08.

Slides:



Advertisements
Similar presentations
System Integration and Performance
Advertisements

Instruction Set Design
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
24-1 Chapter 24. Congestion Control and Quality of Service (part 1) 23.1 Data Traffic 23.2 Congestion 23.3 Congestion Control 23.4 Two Examples.
Do we need fine time in NA62? Fine time means greater accuracy than the system clock of 25 ns. YES Here are some reasons we need to match identified kaons.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Communication as an Engineering Problem 1. Communication requirement #1 1)There must be some characteristic of the receiver’s environment that can be.
Buffered Data Processing Procedure Version of Comments MG / CCSDS Fall Meeting 2012 Recap on Previous Discussions Queue overflow processing.
Programming with Alice Computing Institute for K-12 Teachers Summer 2011 Workshop.
Chapter 11 Data Link Control
Chapter 221 Chapter 22: Fundamentals of Signal Timing: Actuated Signals Explain terms related to actuated signals Explain why and where actuated signals.
Dynamic Tuning of the IEEE Protocol to Achieve a Theoretical Throughput Limit Frederico Calì, Marco Conti, and Enrico Gregori IEEE/ACM TRANSACTIONS.
1 Version 3.0 Module 6 Ethernet Fundamentals. 2 Version 3.0 Why is Ethernet so Successful? In 1973, it could carry data at 3 Mbps Now, it can carry data.
6 June 2002UK/HCAL common issues1 Paul Dauncey Imperial College Outline: UK commitments Trigger issues DAQ issues Readout electronics issues Many more.
Hashing General idea: Get a large array
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Medium Access Control Sublayer
NETE 0510 Presented by Dr.Apichan Kanjanavapastit
Switching Techniques Student: Blidaru Catalina Elena.
Objectives of Multiple Regression
Wireless Medium Access. Multi-transmitter Interference Problem  Similar to multi-path or noise  Two transmitting stations will constructively/destructively.
ARQ Mechanisms Rudra Dutta ECE/CSC Fall 2010, Section 001, 601.
PicoTDC Features of the picoTDC (operating at 1280 MHz with 64 delay cells) Focus of the unit on very small time bins, 12ps basic, 3ps interpolation Interpolation.
Chapter 20: Actuated Signal Control and Detection
1 Lecture 14 High-speed TCP connections Wraparound Keeping the pipeline full Estimating RTT Fairness of TCP congestion control Internet resource allocation.
Threshold Phenomena and Fountain Codes Amin Shokrollahi EPFL Joint work with M. Luby, R. Karp, O. Etesami.
Summer Computing Workshop. Introduction  Boolean Expressions – In programming, a Boolean expression is an expression that is either true or false. In.
CSC 311 Chapter Eight FLOW CONTROL TECHNIQUES. CSC 311 Chapter Eight How do we manage the large amount of data on the network? How do we react to a damaged.
CE Operating Systems Lecture 14 Memory management.
NA62 Trigger Algorithm Trigger and DAQ meeting, 8th September 2011 Cristiano Santoni Mauro Piccini (INFN – Sezione di Perugia) NA62 collaboration meeting,
Ideas about Tests and Sequencing C.N.P.Gee Rutherford Appleton Laboratory 3rd March 2001.
Background Subtraction and Likelihood Method of Analysis: First Attempt Jose Benitez 6/26/2006.
Improving Loss Resilience with Multi- Radio Diversity in Wireless Networks by Allen Miu, Hari Balakrishnan and C.E. Koksal Appeared in ACM MOBICOM 2005,
LHCb front-end electronics and its interface to the DAQ.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
RTP Splicing Status Update draft-ietf-avtext-splicing-for-rtp-11 Jinwei Xia.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.
Doc.: IEEE /1324r0 November 2012 Very Low Energy Paging Date: Authors: Slide 1 S. Merlin et al.
Calorimeter Digitisation Prototype (Material from A Straessner, C Bohm et al) L1Calo Collaboration Meeting Cambridge 23-Mar-2011 Norman Gee.
1 Buffering Strategies in ATM Switches Carey Williamson Department of Computer Science University of Calgary.
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Principles of reliable data transfer 0.
Upgrade PO M. Tyndel, MIWG Review plans p1 Nov 1 st, CERN Module integration Review – Decision process  Information will be gathered for each concept.
Chi-Cheng Lin, Winona State University CS412 Introduction to Computer Networking & Telecommunication Data Link Layer Part II – Sliding Window Protocols.
SuperB DAQ U. Marconi Padova 23/01/09. Bunch crossing: 450 MHz L1 Output rate: 150 kHz L1 Triggering Detectors: EC, DC The Level 1 trigger has the task.
August 24, 2011IDAP Kick-off meeting - TileCal ATLAS TileCal Upgrade LHC and ATLAS current status LHC designed for cm -2 s 7+7 TeV Limited to.
Buffering Techniques Greg Stitt ECE Department University of Florida.
Some thoughs about trigger/DAQ … Dominique Breton (C.Beigbeder, G.Dubois-Felsmann, S.Luitz) SuperB meeting – La Biodola – June 2008.
Error Correcting Codes for Serial links : an update
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Electronics Trigger and DAQ CERN meeting summary.
ETD summary D. Breton, S.Luitz, U.Marconi
ETD/Online Report D. Breton, U. Marconi, S. Luitz
Trigger, DAQ and Online Closeout
Modelisation of SuperB Front-End Electronics
SuperB FCTS/DAQ Protocol Proposal Tradeoffs
ETD/Online Report D. Breton, U. Marconi, S. Luitz
Trigger, DAQ, & Online: Perspectives on Electronics
ATLAS L1Calo Phase2 Upgrade
Modelisation of control of SuperB Common Front-End Electronics
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
HADES goes SIS-100* SIS-18 DAQ upgrade possibilities
Dominique Breton, Jihane Maalmi
SVT detector electronics
DC trigger Present DC design Lab tests and implementation.
SuperB FCTS/DAQ Protocol Proposal
ETD parallel session March 18th 2010
SVT detector electronics
U. Marconi, D. Breton, S. Luitz
Presentation transcript:

SuperB FCTS/DAQ Protocol Steffen Luitz and Gregory Dubois-Felsmann SuperB Computing Workshop, Frascati 12/17/08

From Gregory’s Talk at the Elba Meeting We are proposing this model - that is, a model with time- addressable pre-L1Accept buffering in the FE electronics. Next steps for us (core DAQ people - please consider joining us!): ◦Define the time addressing protocol and buffer depth. ◦Specify a detailed model for overlapping-event readout. ◦Write this up. ◦Do modeling to estimate the ring buffer depth needed in Model 2. Next steps for you: ◦Consider the consequences to FEE design and estimate the marginal cost of variable latency (Model 2)’s additional requirements. ◦If those costs look substantial, someone then needs to evaluate the additional complexity and cost of the multiple-path command-and- clock distribution and resynchronization scheme needed in the FCTS in Model 1.

Status Update Not much has happened since then ◦Gregory and Steffen have new assignments at SLAC ◦No time to do SuperB homework As a result, this talk will look familiar to those who attended the Elba meeting Significant work towards the TDR still needed ◦Can’t be done as a spare time effort

This Talk … is very similar to what was presented at the Elba meeting … is intended to introduce newcomers to the challenges of a kHz virtually dead-time-free DAQ system... and to (re-)stimulate a vibrant discussion about SuperB DAQ and FCTS design

Basic Assumptions Previously established ◦Physics goals require an open trigger, e.g., similar to BaBar’s. ◦Level 1 Accept rate of ~100kTps (triggers per second)  Unless highly effective Level 1 Bhabha veto is developed ◦ Expecting 50kTps Bhabhas...  Stretch goal: be able to handle 150kTps ◦Level 3 Accept rate of ~20kTps ◦Expect “Level 4” offline filter beyond that

More Assumptions and Issues Level 1 trigger provided with maximum latency similar to BaBar’s: ~12 microseconds ◦There has been no study at all of whether the latency could be reduced significantly (i.e., by at least a factor of two). It’s plausible that this could be done by using more capable components and parallelizing more, but this has not been investigated by experts. ◦Other experiments have been able to do better, but SuperB does not have quite the same prompt and spectacular signals that some do. DCH and EMC must provide “side channels” of data to trigger, perhaps SVT will also contribute ◦Their ability to deliver data quickly is essential to limiting trigger latency Trigger may need higher-resolution data than in BaBar (e.g., to allow for more precise 3D tracking)  This might push latency longer (more complex computations needed) but lower the Level 1 rate in compensation.

Start from BaBar BaBar-Note-281 (v1.1) described the protocol for communication between the ROMs and the FEEs The Conceptual Design document for the FCTS describes the extension of this protocol through the FCTS system. ◦More detail is available in the “FCTS Architecture” note For now we are considering only the event data protocol, corresponding to the “run-time commands” in the BaBar protocol (viz. Section 3.1 of BaBar-Note-281) Also keep in mind actual BaBar DAQ performance: ◦Design requirement: Overall 2kTps, Front-Ends: 10kTps ◦Achieved performance after multiple upgrades: 7kTps

BaBar Event Data Protocol In normal triggered data acquisition running : 1.Signal from Level 1 trigger (GLT lines) goes to FCGM 2.If system is not “busy” (within 2.x us of previous command) or “full” (no FE buffers available) or “inhibited” (external signal), send L1Accept command through FCTS to DAQ crates & ROMs 3.ROM forwards L1Accept command to FEEs 4.FEEs capture a previously specified window of data into a buffer (in triggered, i.e., non-EMC, systems) 5.Some variable time later, when resources available, ROM sends ReadEvent command to FEEs, which send back the earliest available buffer (not addressable, state modeled by ROM) ◦This relies on a trigger delivery latency that is confined to a fixed jitter interval, with readout windows large enough to cover the uncertainty. ◦(This is about 1us in BaBar, though most trigger lines have much better resolution, ~ ns.)

Two Choices Basic requirement is no intrinsic per- L1Accept deadtime, and deadtime due to “full” designed to be at most ~1% at the nominal 100 kTps trigger rate. This can be achieved in two ways…

Two Choices 1. BaBar-like fixed-latency model Works (roughly) if it is possible to deliver L1Accepts at a minimum spacing equal to the shortest time interval by which the (assumed fully pipelined) trigger can distinguish consecutive events.  Essentially this means that there can’t be a meaningful limit on the minimum command spacing (100 ns ~ 1% deadtime). In practice, in almost any scenario this requires being able to handle overlapping readout windows.  You don’t have to be able to do this for an unlimited number of events, of course, just for bursts long enough that statistically you don’t get significant deadtime due to “full”. ◦Places very stringent requirements on FCTS-DAQ link 2. Variable latency with addressing by time, and queueing of triggers In general requires additional pre-L1Accept “ring buffer”-type space, as closely-spaced triggers will effectively be delayed in transit.

Buffering We assume two levels of buffering in the FEEs, in both models: ◦A continuously-running ring buffer upstream of the L1Accepts, long enough for the maximum trigger latency, in either model.  In Model 1, its length can be essentially equal to the trigger latency (plus some constant offset)  In Model 2, it needs to be longer than this by enough to handle 99% of anticipated trigger bursts. We need to do modeling to be quantitative about this, but we anticipate that the answer will be O(10us) of additional capacity (i.e., roughly a doubling (a guess so far, but will do modeling)). ◦A post-L1Accept buffer. This would likely be constructed as a number of fixed-size slots as in BaBar’s design. The number of L1Accept slots required needs to be determined by modeling, but is very likely to be substantially more than the four in the BaBar design.  The driving parameters are that events must be able to be acquired about ten times faster than in BaBar, but the actual readout probably cannot be comparably faster (link speeds are only 2x faster, and we probably cannot afford to have significantly more ROMs).  The amount of such buffering needed would be somewhat larger in Model 1.

Buffering - Tradeoffs Larger buffers add to channel costs, provide more targets for radiation upsets and transient data loss ◦Variable latency: pre-L1Accept buffers are larger ◦Fixed latency: post-L1Accept buffers are larger (probably less so) Buffer addressability adds some complexity to front ends ◦Minimal digital delay line approach is not applicable ◦Variable-length readout adds (marginally?) more complexity (see next slides)

Overlapping Readout Windows Overlapping readout windows need to be supported More details: ◦This is very difficult to avoid  At 100kTps, 1us readout windows will overlap O(10%) of the time ◦Several cases: readout physics readout Bhabha (not triggered) physics

Overlaps – Consequences For a given distribution of window overlap probability... ◦A function of the trigger rate and readout window width  With the window width a function of intrinsic signal width and trigger jitter the conditional probability that the signals overlap is a function of the ratio of the signal width to the trigger jitter. ◦I.e., narrow signals could still overlap because of unlucky trigger jitter outcomes, but less often If the signals do not overlap, only the windows, the ~only issue is of occupancy on the uplinks ◦Bandwidth that you spend on re-reading data isn’t available to handle new data ◦A fixed-latency design still works, but redundant (or useless) readout can’t be avoided without additional complexity in the front ends If the signals overlap, there are additional issues (L3, reco)!

Overlaps - Bhabhas If a physics event occurs shortly after a vetoed Bhabha, and if removing the Bhabha’s effect from the event requires acquiring most or all of the signals from the Bhabha itself... then you have to go “back in time” to get the Bhabha. ◦The trigger has to have remembered that it just saw and vetoed a Bhabha ◦The data still have to be available in the front ends, perhaps ~1us longer than the nominal trigger latency. Both models can handle “going back in time” ◦Fixed latency: make the fixed latency longer ◦Variable latency: almost for free - it’s just like a queued trigger ◦Readout issues are otherwise exactly as for other trigger overlaps

Buffering (2) Overlapping readout windows need to be supported ◦This has implications for the copy of event data from the ring buffer to the post-L1Accept buffer. ◦We propose that the protocol support copy by reference when windows overlap. This reduces the internal bandwidth required in the FEEs.  This requires the system to model an event as composed potentially of one or more by-reference segments followed by a by-value segment. At a minimum, the by-value segment of an event must be retained somewhere in the system until enough time has passed that a future event cannot need any part of it. This could be done either in the FEE or in the ROM, trading off complexity in the FEE against complexity in the FCTS protocol and the ROM. ◦We prefer putting the greater complexity in the FCTS and ROM, not in the FEE ◦We propose that the L1Accept command include a “length” field in either model, in addition to the time address in Model 2

Command Protocol BaBar ◦ROM-to-FEE commands are 12 bits: a 0, a 1, a 5- bit command code, and a 5-bit trigger tag (sequence number). At 60 MHz, this takes 192 ns to transmit. ◦FCTS-to-ROM commands are 104 bits: the full 56- bit 60MHz clock counter (the unique event key), the post-FCTS state of the 32 trigger bits, the 12-bit ROM-to-FEE command, and four flag bits. These take ~1.75 us to transmit, a significant fraction of the minimum command spacing.

Command Protocols for SuperB 1 Model 1… The BaBar ROM-to-FEE command content may be OK. ◦Overall performance would be included by including a length field The BaBar ROM-to-FEE command timing is barely compatible with Model 1 at the same clock speeds. ◦Ideally, for 1% deadtime a 100ns command interval would be needed. Somewhat longer intervals could be acceptable in several scenarios:  If the trigger cannot generate separate trigger decisions that close together, then they don’t need to be processed, but the longer this interval becomes, the more necessary it becomes for the trigger itself to be able to handle overlapping events and make appropriate decisions (e.g., not vetoing an time interval that contains both a Bhabha and a physics event).  If triggers must be delayed in transit because of a somewhat longer command interval, this can be OK as long as the accumulated delay in the maximum burst that needs to be supported (from modeling) is compatible with the trigger jitter specification.

Command Protocols for SuperB 2 Model 1… The BaBar FCTS-to-ROM command timing is completely unacceptable. ◦It would lead to intolerable trigger delivery delays. ◦The command word length could be somewhat shortened. Two possibilities:  The post-FCTS-decision trigger bits could be treated as event data, with the FCTS read out as an additional detector system. (This would preclude the ROMs’ making FEX or other decisions based on trigger content.)  The timestamp could probably be shortened from 56 bits to bits by treating it as relative within each data run. This would require a run identifier to become part of the unique event key. ◦These measures are probably insufficient by themselves. The command delivery link would still need to be made several times faster.  It looks like we need speeds greater than 1Gbps ◦This in turn might preclude combining commands with clock distribution in the FCTS.  If these paths are separated, the ROMs would likely have to be able to resync commands with the clock in order for a BaBar-like event build and out-of-sync detection scheme to work.  We are still thinking this through; it seems to add significant complexity to the FCTS protocol and ROM implementation, but apparently not to the FEEs.

Command Protocols for SuperB 3 Model 2… The ROM-to-FEE command word needs to be extended to include a ring buffer address field. ◦A length field may also be needed if the resolution of overlapping events is made a responsibility of the FCTS and the ROMs. ◦The time resolution of the addressing will need to be somewhere between the system clock period (e.g., 16 ns) and the minimum useful time resolution of the trigger (perhaps 125 ns).  Finer resolution allows some reduction in bandwidth, but...  After discussions at Elba, it seems that ns is the right range. ◦The ring buffer in Model 2 needs to be somewhat longer than the intrinsic trigger latency, to accommodate queued triggers. Pending detailed modeling, we are guessing that a buffer depth of several tens of usecs should be adequate.  It is not necessary to be able to address the range before the shortest possible trigger latency. ◦This means that the address field should be in the range of 6-9 bits.

Command Protocols for SuperB 4 Model 2 ROM-to-FEE command word ◦The length field needs to be able to select fractions of the normal readout window. “50%” of the possible optimization is obtained merely by allowing reading a half-size window. ◦For Bhabha-followed-by-physics readout, a larger-than-normal window is needed.  This can alternatively be done in Model 2 by issuing two triggers, if the system can guarantee that there will be no gap between their readout windows. ◦Either way the length field should not need to be more than 2-4 bits. ◦In Model 2, the resulting increase in command transmission time from adding these fields is almost irrelevant: it just adds somewhat to the required ring buffer depth. (The additional delay should be no more than ~200ns.)

Command Protocols for SuperB 5 Model 2… The FCTS-to-ROM command word needs to be extended by at least the address and length fields. ◦If the resolution of overlapping events is made a responsibility of the FCTS and the ROMs, the command word would need to be further extended to include a set of descriptors for the multiple segments of an event with overlap.  Each descriptor would probably be both an address and a length. ◦Here again slower link speeds just translate into additional ring buffer space required.  Even 2-3 us command transmission time (e.g., if the command word is much more complex than for BaBar) could be accommodated.

Conclusion & Proposal Model 2 appears to us to be significantly more attractive, and provides additional flexibility in DAQ that we think is advisable. ◦It appears to significantly loosen the requirements on the performance of the FCTS and ROM-to-FEE command links, and it provides a natural and uniform way to solve the overlapping- event problem. ◦The ability to deliver triggers and read out events as soon as possible should increase the overall performance of the system. ◦The corresponding cost is the increased complexity and size of the FEE needed to support an addressable - and significantly larger - ring buffer.  Additional complexity and size also raises radiation damage concerns. Ultimately overall system cost needs to be optimized. ◦This turns out to be a quantitative question.

Next Steps Next steps – for core DAQ people ◦Define the time addressing protocol and buffer depth. ◦Specify a detailed model for overlapping-event readout. ◦Write this up. ◦Do modeling to estimate the ring buffer depth needed in Model 2. Next steps for you: ◦Consider the consequences to FEE design and estimate the marginal cost of variable latency (Model 2)’s additional requirements. ◦If those costs look substantial, someone then needs to evaluate the additional complexity and cost of the multiple-path command-and- clock distribution and resynchronization scheme needed in the FCTS in Model 1. ◦Additionally, significant effort needs to go into studying the TRIGGER. My final conclusion: The SuperB FCTS and DAQ system is NOT just a suitably scaled-up version of the BaBar system