SuperB FCTS/DAQ Protocol Steffen Luitz and Gregory Dubois-Felsmann SuperB Computing Workshop, Frascati 12/17/08
From Gregory’s Talk at the Elba Meeting We are proposing this model - that is, a model with time- addressable pre-L1Accept buffering in the FE electronics. Next steps for us (core DAQ people - please consider joining us!): ◦Define the time addressing protocol and buffer depth. ◦Specify a detailed model for overlapping-event readout. ◦Write this up. ◦Do modeling to estimate the ring buffer depth needed in Model 2. Next steps for you: ◦Consider the consequences to FEE design and estimate the marginal cost of variable latency (Model 2)’s additional requirements. ◦If those costs look substantial, someone then needs to evaluate the additional complexity and cost of the multiple-path command-and- clock distribution and resynchronization scheme needed in the FCTS in Model 1.
Status Update Not much has happened since then ◦Gregory and Steffen have new assignments at SLAC ◦No time to do SuperB homework As a result, this talk will look familiar to those who attended the Elba meeting Significant work towards the TDR still needed ◦Can’t be done as a spare time effort
This Talk … is very similar to what was presented at the Elba meeting … is intended to introduce newcomers to the challenges of a kHz virtually dead-time-free DAQ system... and to (re-)stimulate a vibrant discussion about SuperB DAQ and FCTS design
Basic Assumptions Previously established ◦Physics goals require an open trigger, e.g., similar to BaBar’s. ◦Level 1 Accept rate of ~100kTps (triggers per second) Unless highly effective Level 1 Bhabha veto is developed ◦ Expecting 50kTps Bhabhas... Stretch goal: be able to handle 150kTps ◦Level 3 Accept rate of ~20kTps ◦Expect “Level 4” offline filter beyond that
More Assumptions and Issues Level 1 trigger provided with maximum latency similar to BaBar’s: ~12 microseconds ◦There has been no study at all of whether the latency could be reduced significantly (i.e., by at least a factor of two). It’s plausible that this could be done by using more capable components and parallelizing more, but this has not been investigated by experts. ◦Other experiments have been able to do better, but SuperB does not have quite the same prompt and spectacular signals that some do. DCH and EMC must provide “side channels” of data to trigger, perhaps SVT will also contribute ◦Their ability to deliver data quickly is essential to limiting trigger latency Trigger may need higher-resolution data than in BaBar (e.g., to allow for more precise 3D tracking) This might push latency longer (more complex computations needed) but lower the Level 1 rate in compensation.
Start from BaBar BaBar-Note-281 (v1.1) described the protocol for communication between the ROMs and the FEEs The Conceptual Design document for the FCTS describes the extension of this protocol through the FCTS system. ◦More detail is available in the “FCTS Architecture” note For now we are considering only the event data protocol, corresponding to the “run-time commands” in the BaBar protocol (viz. Section 3.1 of BaBar-Note-281) Also keep in mind actual BaBar DAQ performance: ◦Design requirement: Overall 2kTps, Front-Ends: 10kTps ◦Achieved performance after multiple upgrades: 7kTps
BaBar Event Data Protocol In normal triggered data acquisition running : 1.Signal from Level 1 trigger (GLT lines) goes to FCGM 2.If system is not “busy” (within 2.x us of previous command) or “full” (no FE buffers available) or “inhibited” (external signal), send L1Accept command through FCTS to DAQ crates & ROMs 3.ROM forwards L1Accept command to FEEs 4.FEEs capture a previously specified window of data into a buffer (in triggered, i.e., non-EMC, systems) 5.Some variable time later, when resources available, ROM sends ReadEvent command to FEEs, which send back the earliest available buffer (not addressable, state modeled by ROM) ◦This relies on a trigger delivery latency that is confined to a fixed jitter interval, with readout windows large enough to cover the uncertainty. ◦(This is about 1us in BaBar, though most trigger lines have much better resolution, ~ ns.)
Two Choices Basic requirement is no intrinsic per- L1Accept deadtime, and deadtime due to “full” designed to be at most ~1% at the nominal 100 kTps trigger rate. This can be achieved in two ways…
Two Choices 1. BaBar-like fixed-latency model Works (roughly) if it is possible to deliver L1Accepts at a minimum spacing equal to the shortest time interval by which the (assumed fully pipelined) trigger can distinguish consecutive events. Essentially this means that there can’t be a meaningful limit on the minimum command spacing (100 ns ~ 1% deadtime). In practice, in almost any scenario this requires being able to handle overlapping readout windows. You don’t have to be able to do this for an unlimited number of events, of course, just for bursts long enough that statistically you don’t get significant deadtime due to “full”. ◦Places very stringent requirements on FCTS-DAQ link 2. Variable latency with addressing by time, and queueing of triggers In general requires additional pre-L1Accept “ring buffer”-type space, as closely-spaced triggers will effectively be delayed in transit.
Buffering We assume two levels of buffering in the FEEs, in both models: ◦A continuously-running ring buffer upstream of the L1Accepts, long enough for the maximum trigger latency, in either model. In Model 1, its length can be essentially equal to the trigger latency (plus some constant offset) In Model 2, it needs to be longer than this by enough to handle 99% of anticipated trigger bursts. We need to do modeling to be quantitative about this, but we anticipate that the answer will be O(10us) of additional capacity (i.e., roughly a doubling (a guess so far, but will do modeling)). ◦A post-L1Accept buffer. This would likely be constructed as a number of fixed-size slots as in BaBar’s design. The number of L1Accept slots required needs to be determined by modeling, but is very likely to be substantially more than the four in the BaBar design. The driving parameters are that events must be able to be acquired about ten times faster than in BaBar, but the actual readout probably cannot be comparably faster (link speeds are only 2x faster, and we probably cannot afford to have significantly more ROMs). The amount of such buffering needed would be somewhat larger in Model 1.
Buffering - Tradeoffs Larger buffers add to channel costs, provide more targets for radiation upsets and transient data loss ◦Variable latency: pre-L1Accept buffers are larger ◦Fixed latency: post-L1Accept buffers are larger (probably less so) Buffer addressability adds some complexity to front ends ◦Minimal digital delay line approach is not applicable ◦Variable-length readout adds (marginally?) more complexity (see next slides)
Overlapping Readout Windows Overlapping readout windows need to be supported More details: ◦This is very difficult to avoid At 100kTps, 1us readout windows will overlap O(10%) of the time ◦Several cases: readout physics readout Bhabha (not triggered) physics
Overlaps – Consequences For a given distribution of window overlap probability... ◦A function of the trigger rate and readout window width With the window width a function of intrinsic signal width and trigger jitter the conditional probability that the signals overlap is a function of the ratio of the signal width to the trigger jitter. ◦I.e., narrow signals could still overlap because of unlucky trigger jitter outcomes, but less often If the signals do not overlap, only the windows, the ~only issue is of occupancy on the uplinks ◦Bandwidth that you spend on re-reading data isn’t available to handle new data ◦A fixed-latency design still works, but redundant (or useless) readout can’t be avoided without additional complexity in the front ends If the signals overlap, there are additional issues (L3, reco)!
Overlaps - Bhabhas If a physics event occurs shortly after a vetoed Bhabha, and if removing the Bhabha’s effect from the event requires acquiring most or all of the signals from the Bhabha itself... then you have to go “back in time” to get the Bhabha. ◦The trigger has to have remembered that it just saw and vetoed a Bhabha ◦The data still have to be available in the front ends, perhaps ~1us longer than the nominal trigger latency. Both models can handle “going back in time” ◦Fixed latency: make the fixed latency longer ◦Variable latency: almost for free - it’s just like a queued trigger ◦Readout issues are otherwise exactly as for other trigger overlaps
Buffering (2) Overlapping readout windows need to be supported ◦This has implications for the copy of event data from the ring buffer to the post-L1Accept buffer. ◦We propose that the protocol support copy by reference when windows overlap. This reduces the internal bandwidth required in the FEEs. This requires the system to model an event as composed potentially of one or more by-reference segments followed by a by-value segment. At a minimum, the by-value segment of an event must be retained somewhere in the system until enough time has passed that a future event cannot need any part of it. This could be done either in the FEE or in the ROM, trading off complexity in the FEE against complexity in the FCTS protocol and the ROM. ◦We prefer putting the greater complexity in the FCTS and ROM, not in the FEE ◦We propose that the L1Accept command include a “length” field in either model, in addition to the time address in Model 2
Command Protocol BaBar ◦ROM-to-FEE commands are 12 bits: a 0, a 1, a 5- bit command code, and a 5-bit trigger tag (sequence number). At 60 MHz, this takes 192 ns to transmit. ◦FCTS-to-ROM commands are 104 bits: the full 56- bit 60MHz clock counter (the unique event key), the post-FCTS state of the 32 trigger bits, the 12-bit ROM-to-FEE command, and four flag bits. These take ~1.75 us to transmit, a significant fraction of the minimum command spacing.
Command Protocols for SuperB 1 Model 1… The BaBar ROM-to-FEE command content may be OK. ◦Overall performance would be included by including a length field The BaBar ROM-to-FEE command timing is barely compatible with Model 1 at the same clock speeds. ◦Ideally, for 1% deadtime a 100ns command interval would be needed. Somewhat longer intervals could be acceptable in several scenarios: If the trigger cannot generate separate trigger decisions that close together, then they don’t need to be processed, but the longer this interval becomes, the more necessary it becomes for the trigger itself to be able to handle overlapping events and make appropriate decisions (e.g., not vetoing an time interval that contains both a Bhabha and a physics event). If triggers must be delayed in transit because of a somewhat longer command interval, this can be OK as long as the accumulated delay in the maximum burst that needs to be supported (from modeling) is compatible with the trigger jitter specification.
Command Protocols for SuperB 2 Model 1… The BaBar FCTS-to-ROM command timing is completely unacceptable. ◦It would lead to intolerable trigger delivery delays. ◦The command word length could be somewhat shortened. Two possibilities: The post-FCTS-decision trigger bits could be treated as event data, with the FCTS read out as an additional detector system. (This would preclude the ROMs’ making FEX or other decisions based on trigger content.) The timestamp could probably be shortened from 56 bits to bits by treating it as relative within each data run. This would require a run identifier to become part of the unique event key. ◦These measures are probably insufficient by themselves. The command delivery link would still need to be made several times faster. It looks like we need speeds greater than 1Gbps ◦This in turn might preclude combining commands with clock distribution in the FCTS. If these paths are separated, the ROMs would likely have to be able to resync commands with the clock in order for a BaBar-like event build and out-of-sync detection scheme to work. We are still thinking this through; it seems to add significant complexity to the FCTS protocol and ROM implementation, but apparently not to the FEEs.
Command Protocols for SuperB 3 Model 2… The ROM-to-FEE command word needs to be extended to include a ring buffer address field. ◦A length field may also be needed if the resolution of overlapping events is made a responsibility of the FCTS and the ROMs. ◦The time resolution of the addressing will need to be somewhere between the system clock period (e.g., 16 ns) and the minimum useful time resolution of the trigger (perhaps 125 ns). Finer resolution allows some reduction in bandwidth, but... After discussions at Elba, it seems that ns is the right range. ◦The ring buffer in Model 2 needs to be somewhat longer than the intrinsic trigger latency, to accommodate queued triggers. Pending detailed modeling, we are guessing that a buffer depth of several tens of usecs should be adequate. It is not necessary to be able to address the range before the shortest possible trigger latency. ◦This means that the address field should be in the range of 6-9 bits.
Command Protocols for SuperB 4 Model 2 ROM-to-FEE command word ◦The length field needs to be able to select fractions of the normal readout window. “50%” of the possible optimization is obtained merely by allowing reading a half-size window. ◦For Bhabha-followed-by-physics readout, a larger-than-normal window is needed. This can alternatively be done in Model 2 by issuing two triggers, if the system can guarantee that there will be no gap between their readout windows. ◦Either way the length field should not need to be more than 2-4 bits. ◦In Model 2, the resulting increase in command transmission time from adding these fields is almost irrelevant: it just adds somewhat to the required ring buffer depth. (The additional delay should be no more than ~200ns.)
Command Protocols for SuperB 5 Model 2… The FCTS-to-ROM command word needs to be extended by at least the address and length fields. ◦If the resolution of overlapping events is made a responsibility of the FCTS and the ROMs, the command word would need to be further extended to include a set of descriptors for the multiple segments of an event with overlap. Each descriptor would probably be both an address and a length. ◦Here again slower link speeds just translate into additional ring buffer space required. Even 2-3 us command transmission time (e.g., if the command word is much more complex than for BaBar) could be accommodated.
Conclusion & Proposal Model 2 appears to us to be significantly more attractive, and provides additional flexibility in DAQ that we think is advisable. ◦It appears to significantly loosen the requirements on the performance of the FCTS and ROM-to-FEE command links, and it provides a natural and uniform way to solve the overlapping- event problem. ◦The ability to deliver triggers and read out events as soon as possible should increase the overall performance of the system. ◦The corresponding cost is the increased complexity and size of the FEE needed to support an addressable - and significantly larger - ring buffer. Additional complexity and size also raises radiation damage concerns. Ultimately overall system cost needs to be optimized. ◦This turns out to be a quantitative question.
Next Steps Next steps – for core DAQ people ◦Define the time addressing protocol and buffer depth. ◦Specify a detailed model for overlapping-event readout. ◦Write this up. ◦Do modeling to estimate the ring buffer depth needed in Model 2. Next steps for you: ◦Consider the consequences to FEE design and estimate the marginal cost of variable latency (Model 2)’s additional requirements. ◦If those costs look substantial, someone then needs to evaluate the additional complexity and cost of the multiple-path command-and- clock distribution and resynchronization scheme needed in the FCTS in Model 1. ◦Additionally, significant effort needs to go into studying the TRIGGER. My final conclusion: The SuperB FCTS and DAQ system is NOT just a suitably scaled-up version of the BaBar system