A Level-2 trigger algorithm for the identification of muons in the ATLAS Muon Spectrometer Alessandro Di Mattia on behalf of the Atlas TDAQ group Computing in High Energy Physics Interlaken, September 26-30, 2004
Outline: The ATLAS trigger Fast algorithm relevant physics performances Implementation in the Online framework Latency of the algorithm Conclusions
LHC: proton-proton ECM = 14 TeV starting 2007 L = cm -2 s -1 23 collisions per bunch 25 ns interval 1 year at L = cm -2 s -1 ∫ Ldt ≈ 100 fb -1 The LHC challenge to ATLAS Trigger/DAQ Challenge to the ATLAS Trigger/DAQ interaction rate 10 9 Hz, offline computing can handle O(10 2 Hz). cross section of physics processes vary over many order of magnitude: Inelastic: 10 9 Hz W → l : 10 2 Hz tt production: 10 Hz Higgs (100 GeV): 0.1 Hz Higgs (600 GeV):10 -2 Hz ATLAS has O(10 8 ) read-out channels → average event size ~1.5 MByte
The ATLAS Trigger 75 kHz ~ 2 kHz ~ 200 Hz Rate Target processing time ~ 2 s ~ 10 ms 2.5 μs Level-1 Hardware trigger High Level Triggers (HLT) Level-2 + Event Filter Software trigger
Standalone muon reconstruction at Level-2 Task of the Level-2 muon trigger: Confirm the Level-1 trigger with a more precise p t estimation within a “Region of interest (RoI)”. Contribute to the global Level-2 decision. To perform the muon reconstruction RoI data are gathered together and processed in three steps: 1)“Global Pattern Recognition” involving trigger chambers and positions of MDT tubes (no use of drift time); 2)“Track fit” involving drift time measurements, performed for each MDT chamber; 3)Fast “p t estimate” via a Look-up-table (LUT) with no use of time consuming fit methods. Result , ,direction of flight into the spectrometer, and p t at the interaction vertex.
muon Approximated Muon trajectory After L1 emulation 1 hit from each Trigger Station is required to start the Pattern Recognition on MDT data. Global Pattern recognition: seeded by the trigger chamber data Use the L1 simulation code to select the RPC Trigger Pattern Valid coincidence in the Low-P t CMA
Define “ -roads” around this trajectory in each chamber; Collect hit tubes within the roads using the residual of the muon tube. Apply a contiguity algorithm to further remove background hits inside the roads. (muon hits) = 96% backgr. hits ~ 3% Low p t (~ 6 GeV)High p t (~ 20 GeV) Muon Roads and “contiguity algorithm”
Track Fit Use drift time measurement to fit the best straight line crossing all points. Compute the track bending using the sagitta method: three points required For a given chamber the sagitta is: s ~ 150 m for muon p t = 20 GeV s ~ 500 m for muon p t = 6 GeV small effects respect to s m
Use linear relation between 1/s and p T to estimate p T. Prepare Look Up Tables (LUT) as a set of relations between values of s and p t for different regions (s = f ( , , p t )). 30 x 60 ( , ) tables for each detector octant. P T estimate Performances including background simulation for the high luminosity environment Resolution comparable with the ATLAS reconstruction program (factor of about 2). Track finding efficiency of about 97% for muons.
Trigger rates (barrel) Low p t (6 GeV) L1 rate (KHz)L2 rate(KHz) K/ decays b decays c decays Fake L11.0Negligible Total High p t (20 GeV) L1 rate (KHz)L2 rate(KHz) K/ decays b decays c decays W decays Fake L1negligible Total
HLT Event Selection Software HLT Data Flow Software HLT Selection Software Framework ATHENA/GAUDI Reuse offline components Common to Level-2 and EF Offline algorithms used in EF
Bytestream model Standardization of data access forces to model the data according to detector regions …. but …. bytestream should be optimized for a fast access to the detector data. RPC bytestream: the detector regions can’t be easily mapped on the readout structure because this latter is geared towards the trigger needs. Use an ad hoc solution: PAD -> Coincidence Matrix -> Fired CMA channel Data are strictly limited to the needed ones: no overhead introduced in the data decoding. MDT bytestream: readout structure mapped on the MDT chambers. CSM -> AMT hit (AMT data word) Data access according to chambers is not efficient: optimization needed. PAD CM … up to 8 CMCM Fired channel CSM = MDT chamber MdtAmtHitt MdtAmtHitt MdtAmtHit
Standard MDT data access scheme: use LVL1 Muon RoI infomuon 7 MDT chambers to be accessed LVL1 RoI MDT chamber accessed This tail is critical for the MDT converter timing
muon Approximated Muon trajectory After L1 emulation Width < 50 cm Width ~ 5 cm Width < 40 cm Optimized MDT data access scheme: use Muon Roads 3 MDT chambers to be accessed; up to 6 in case Roads overlap two chambers. MDT chamber accessed Only three MDT chambers are accessed in most of the cases.
Further optimization A considerable fraction of the data access time is taken by the “data preparation”. Data preparation is for: –associating space point to detector hits; –resolving ambiguites in some special detector regions (RPC data only); –providing refined info to the reconstruction: t 0 subtraction (to MDT drift time), calibration of the space-time relationship of MDT tubes. To optimize this process, the data preparation is performed inside the algorithm using a standalone detector description that provides 1)description of the readout xxx xxxx 2)description of the detector geometry 3)offline versus online map xxx xxxx Advantages: prepare only the data needed for reconstruction; use code optimized for speed: –detector geometry organized according to readout hierarchy; –minimal use of STL, no memory allocation on demand; minimize the dependencies towards the offline code: ease the integration
CSM Fast sequence diagram RPC data access Level-1 emulation RPC pattern recognition MDT data access Feature extraction MDT pattern recognition Monitoring PAD Id Fast execution RoI reconstruction IDC for RPC PAD CM … up to 8 CMCM Fired channel PADTrigger pattern Muon roads IDC for MDT CSM Amt Amt Amt CSMCSM Amt Amt Amt Amt Amt Amt CSM CSM Muon Features Prepared digits Framework infrastructure Fast sequences Filling histos for monitoring
Fast and Total latency time Optimized code run on 2.4GHz). –Signal: single muon, pt=100 GeV –Cavern Background: High Lumi x 2 The total latency shows timings made on the same event sample before and after optimizing the MDT data access. Optimized version: –total data access time ~ 800 s; –data access takes the same cpu time of Fast; Total Fast Fast takes ~ 10% of the Level-2 latency. Cavern background does not increase the processing time. First implementation
Conclusions Fast is suitable to perform the muon trigger selection in ATLAS L2: BARREL RESULTS: – Fast reconstructs muon tracks into Muon Spectrometer and measures the P T at the interaction vertex with a resolution of 5.5% at 6 GeV and 4% at 20 GeV; – Fast allows to reduce the LVL1 trigger rate from 10.6 kHz to 4.6 kHz (6 GeV), and from 2.4 kHz to 0.24 kHz (20 GeV). algorithm fully implemented in the Online framework. algorithm and data access time match the L2 trigger latency: now ready to undergo a next optimization phase more devoted to standardize the software components.
Backup transparencies
Requirements for implementation L2 latency time set to 10 ms; Thread Safety Data access in restricted Geometrical Region (RoI seeding); Hide aspects of data access behind offline Storgate interfaces; Use RDO (Raw Data Object) as the atomic data component: –translate the bytestream Raw data into RDO; –conversion mechanism integrated into the data access. Standardize the data access for every subdetector: –general region lookup to implement RoI mechanism, –common interfaces for detector specific code, e.g. RDO converters, –force a common structure for the RDOs, as far as it is possible: fit it into detector modules. ROB (ReadOut Buffer) access and data preparation/conversion on demand; Triggerarchitecture
Level-1 RoI is the intersection of a CMA processing RPC eta- view with a CMA processing RPC phi-view inside 1 PAD. RPC bytestream RPC bytestream reflects the organization of the trigger logic: –ROD -> Rx -> PAD -> Coincidence Matrix (CMA) -> CMA channel –1 ROD = 2 Sector Logic = 2 Rx; RPC detector are read by 64 Logic Sector; –Up to 7 PAD into a Rx; up to 8 CMA into a PAD (4 per view); –CMA channel = 32/64 depending on the CMA side (Pivot/Confirm); 1 CMA coincidences between RPC planes in a 3-dimensional area Confirm plane high pt Pivot plane Confirm plane low pt No way to fit RPC bytestream into RPC detector modules! Shown are odd number CMAs only, CMAs overlap in confirm planes, but not in the pivot plane.
RPC RDO Definition Needed different types: –“bare” RDO as persistent representation of bytestream; contains raw data from the Level-1 and are used by Fast to run the Level-1 emulation on one RoI; –“prepared” RDO (or RIO – Reconstruction Input Object) are obtained from the RDO with some manipulation of the data to resolve the overlap regions and to associate space positions to the hits. Used by the offline reconstruction. BARE: Convenient way to organizing RDOs in IDC is according to PAD. Data requests are simplified thanks to the close correspondence between PAD and RoI. PAD -> Coincidence Matrix -> Fired CMA channel Data are strictly limited to the needed ones: no overhead introduced in the data decoding. PREPARED: Stored in Storegate in hierarchical structure as defined by offline identifiers up to the RPC chamber modules. PAD CM … up to 8 CMCM Fired channel
MDT bytestream organization: ROD -> Chamebr System Module (CSM) -> TDC ->TDC channel –1 ROD = 1 trigger tower (f x h x r = 1 x 2 x 3); –1 CSM read 1 MDT chamber; one CSM can have up to 18 TDC; –1 AMT (Atlas Muon TDC) can have up to 24 channel (= “tubes”); MDT bytestream To fit MDT bytestream into MDT detector modules is trivial.
MDT RDO definition Need different types: –“bare” RDO as persistent representation of bytestream; contains MDT raw data and are used by Fast to confirm the Level-1 RoI. –“prepared” RDO contains refined info (drift time, calibarted time, radius, error ). BARE: Convenient way to organizing RDOs in IDC is according to CSM, because can be closely matched both to a detector element and to the trigger tower read-out. No ordering is foreseen for AMT data words. CSM -> AMT hit (AMT data word) Data access according to chambers is not efficient: optimization needed. PREPARED: Stored in Storegate with the same structure as RDO but contains a list of offline MDT digits. CSM = MDT chamber MdtAmtHitt MdtAmtHitt MdtAmtHit
Optimization of MDT data access Standard implementation of MDT data access was not efficient: ~7 chambers required per RoI; … but typically only 3 chambers have muon hits; direct impact on the timing performance because: –MDT occupancy dominated by Cavern Background; –MDT converter time scales linearly with the chamber occupancy; A more efficient access schema has be implemented using: Muon Roads – refinement of the RoI region available after L1-emulation. The widths of Muon Roads are smaller than the chamber size. An optimized way for accessing the detector elements – selects detector elements according to the station (Innermost, Middle, Outermost), to the sector and to the track path.
Bytestream dataflow ROD emulation RoI B. emulation RoI Bytestream RDO ConverterRIO Converter RIO Fast EF and offline rec. Fast uses a dedicated detector description code to reconstruct RDOs: –standalone implementation to ease the integration in HLTSSW; –detector geometry organized according to readout hierarchy; –minimal use of STL container. Readout Cabling Detector Geometry Online vs Offline map L2 Detector Description Offline Detector Description RDO use Simulation use
The processing tasks are implemented by “process” classes –acts on “C style” data structure; no use of offline EDM. –process versioning implemented through inheritance The “sequence” classes manage the execution of processes and publish the data sctucture towards the processes and the sequences –provide interfaces to framework components: MessageSvc, TimerSvc, etc. Fast implementation ProcessTYP ProcessBase ProcessStd Pure virtual implementation Concrete imp. of the data structure I/O and printouts Concrete imp. of the task type Minimal use of STL containers. No memory allocation on demand. Minimal use of STL containers. No memory allocation on demand. Sequence name: string type: integer data: struct Methods: giveData() start() ProcessStd name: string type: integer data&: struct Methods: run() printout() runs