Download presentation
Presentation is loading. Please wait.
Published byBlake Curtis Modified over 8 years ago
1
Trigger, Online, and Computing Readiness A. J. Lankford University of California, Irvine August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas1
2
Trigger, Online, and Computing Readiness A. J. Lankford - University of California, Irvine Outline: 1.DAQ/HLT 2.Trigger 3.Software 4.Computing 5.Summary & Closing Many too many activities & much too much progress to report Select a few illustrative highlights For a more complete picture, see July 2009 ATLAS Week talks. August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas2
3
TDAQ Schematic August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas3
4
DAQ/HLT Overview DAQ/HLT is ready for beams. Activities this calendar year have focused on: – Improving uptime and operational stability of DAQ/HLT e.g.: Start-Stop-Start transition, HLT pre-scale changes at lumi block boundaries – Improving the tools used by detectors for ensuring efficient operations e.g.: Run Control GUI, monitoring, “stopless recovery” – Preparing to sustain a long period of continuous operation Operational documentation updated and completed during May doc standown. TDAQ shifter training, modular for different desks (Run Control, DAQ, LVL1, HLT) DAQ/HLT has also supported detector operations throughout period. – April/May slice weeks, June cosmic run – Fairly continuous detector tests outside of official/combined running periods Regular TDAQ technical runs on production DAQ/HLT system are important to test software changes & measure performance. – Issue: What standalone periods will TDAQ have during data-taking period? – Pre-series DAQ/HLT is being decoupled from the production system to be used for software tests. August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas4
5
Measuring Data-taking Efficiency August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas5 Using cosmic run with simulated beam cycle 77% for this test. As high as ~83% achieved. Study inefficiencies in order to improve further.
6
Improving FSM Transition Times August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas6 10 Sept 2008 2 July 2009 Campaign to improve transition times during April-May slice weeks Feedback to detector groups Put time-consuming operations in parallel Improved start up time from 9 min to 7 min Dominated by HLT configuration Still room to improve!
7
DAQ/HLT Hardware Status August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas7 DAQ, Online, Monitoring ~100% installed ~360 machines New servers recently installed. New central file storage now being installed. HLT ~35% installed 840 nodes Additional 300 nodes to be purchased late this year for 2010 run. (~50%) Following purchases to be driven by run schedule and needs.
8
Trigger Overview August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas8 Recall ATLAS TDAQ architecture Three-level Trigger: Level 1 L1 Calorimeter Trigger L1 Muon Trigger Central Trigger Processor (CTP) High-Level Trigger (HLT) Level 2 - RoI-based Event Filter Trigger slices - selection chains: e/gamma Muon Tau Jet Missing ET Minimum Bias B Physics B tagging Trigger core software, e.g.: Steering, Configuration, Monitoring Trigger Decision Tool
9
Trigger Commissioning & Preparations Commissioning: – Trigger was ready for beam in 2008. – Progress achieved with brief single beam run and with extensive cosmics runs. Level 1 (muon, calorimeter) triggers selected events from the start and reliably provide events for detector commissioning. High-Level Trigger (HLT) successfully streamed single-beam events and select & stream cosmic events. Cosmics runs provided experience with prolonged stable running. – Invaluable experience guiding this year’s activities focused on: addressing weak areas, improving robustness, preparing for the unexpected Further preparations: – Trigger menus: for evolving machine conditions/luminosity. – Trigger tools: for verifying/monitoring correct trigger function to assist physics analysis – Operational model: for menu/algorithm changes/interventions As a result, even better prepared for running in 2009 August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas9
10
Trigger Timing with Single Beam August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas10 First beams on September 10-12 were very useful to synchronize the various sub-detectors, in particular to start timing-in the trigger Timing of the various components (sub-detectors, trigger system) synchronized with respect to beam pick-ups (BPTX) reference. Signal times of various triggers adjusted to match the BPTX reference. Plots show improvement from 10 September to 12 September. 10 September12 September Note different scale RPC not adjusted
11
Highlights from cosmic runs Complete HLT infrastructure was tested – Including algorithm steering & configuration – Also the online and offline monitoring – Weak points were identified and were/are being addressed – e.g. new trigger rate display L2 inner detector tracking algorithms were useful to select events with track candidates (top figure) – Modified algorithms to accept tracks not coming from nominal IP – Used to create stream enriched in good track candidates for inner detector and offline tracking commissioning Many HLT algorithms were exercised with cosmics by relaxing selection thresholds and requirements (bottom figure) … – many other examples possible … August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas11 E 3x7 /E 7x7 cells in LAr calorimeter for trigger clusters matching a cluster reconstructed offline Level 2 tracking efficiency for events with a good track reconstructed offline
12
Trigger Commissioning Plans Combined ATLAS cosmic run will start at T0 – 4 weeks with all systems & 24h coverage – The trigger will start with the already familiar Level 1 selection for cosmic events Menu will be ready at T0 – 10 weeks, to be deployed at T0 – 8 weeks for runs w/ some systems – Have the HLT in passthrough mode – exercise the algorithms, event data, monitoring, etc. without rejecting events – Data streaming by the HLT based on L1 trigger type and on tracking/cosmic event algorithms – Exercise HLT algorithms with loose event selection to accept and process cosmic events Single-beam events to be selected with dedicated menu – Based on use of beam pickup and minimum bias scintillators – Refine timing of signals from various detector systems – Continue to exercise HLT algorithms in passthru mode using beam-gas events & halo muons Initial collisions triggered with Level 1 only – Significant amount of work on e.g. Level 1 calibration needs to be done with initial collisions – This data will be essential for commissioning of detectors, Level 1trigger, HLT selections HLT deployed to reject events only when needed to keep event rate within budget – Both Level 1 and HLT trigger prescales can now be updated during the run to increase operational flexibility – prescale factors constant within luminosity blocks Prepared to have fast feedback from the trigger on the collected samples – Using Tier 1 and/or CAF to process events and create monitoring histograms and dedicated ntuples with fast turnaround August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas12
13
Software Overview + Simulation Software (e.g. Reconstruction + Simulation) is ready for 1 st collisions. – Strong emphasis on S oftware Quality & Validation – Emphasis also over last year on Performance – Software has been extensively exercised through: Simulation & reconstruction for studies of physics potential (CSC book) Cosmics reconstruction Simulation status – ATLAS implementation of Geant 4 simulation is well under control. In continuous production – 300 M events simulated Maintenance requires a thorough cycle of: bug fixing, technical validation, recalibration and retuning, physics validation – Ongoing campaigns: CPU time of full simulation – Large η coverage and shower modeling require large CPU resources (8000 HS06-s/ev) High-luminosity pile-up – Overlay of events requires large memory resources August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas13
14
Cosmics Reconstruction & Reprocessing Reconstruction software has been extensively tested with simulated events during computing challenges and physics studies. Reconstruction of real data recorded during cosmics runs enables tests of reconstruction of real tracks in presence of actual detector imperfections. >300 million events, >500 TB raw data recorded during Oct-Nov 2008 – A large fraction of ~ 1 billion events, 2000 TB expected per year – Reconstructed at Tier-0 – Fully reprocessed twice at Tier-1’s Christmas 2008: Software job failure rate 3% Spring 2009: Software job failure rate improved to 0.3% All reconstruction algorithms enabled, including most of the object ID – Fast reprocessing exercised recently Cosmics have provided excellent exercises in software preparation & validation, handling of jobs and data replication, analysis August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas14
15
Software Performance Challenges - 1 Performance of software is crucial to achieve our physics goals within our limited computing resources. Simulation time (See above.) Reconstruction time – CPU for MC t-tbar within 60 HS06-s (15 kSI2K-s) per event target – Charged particle thresholds & trigger menus must be chosen carefully for pile- up conditions expected during 1 st run (up to 7 events per crossing) Memory usage – Grid processors are multi-core servers with 2GB memory per core. – Memory footprint of reconstruction jobs must be <2GB to fully exploit cores. – Performance Management Board working since 2008 to reduce usage. Goal: run well within 2GB, even with pileup and increased usage by calibration Tools developed to monitor. Developers (many) being trained. Progress being made, but many contributions. – Longer term: share memory among cores doing similar things. Will be especially important as future processor chips have more cores. AOD read speed (See next slide.) August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas15
16
Software Performance Challenges - 2 August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas16 AOD read speed – Read speed presently dominate (simple) analysis time. – Detailed study of per-object read speed and optimization underway. Focus on achieving native performance of ROOT. Some improvements ready for 15.4. All identified improvements will be ready for turn-on. Effort will be ongoing.
17
Software Preparations for 1 st Collisions Emphasis on software “quality” before collisions, e.g. much code clean-up. Construction of “test scaffold” – Nightly release build, followed by few events interactive test & few hundred events batch test - reconstruction expert shifter, since Nov 2008 – Tier0 cache nightly build, followed by Tier0 Chain Test (10k events reconstructed to Data Quality histograms) - validation shifter, since Jan 2008 – Major release: followed by Big Chain Test (BCT) - several million events – Tier0 reconstruction monitoring Tier0 shifter Emphasis on software “stability” – Release for 1 st collisions to be built imminently (15.4.0) Requires ~4 weeks subsequently for integration with TDAQ release and HLT at Point 1. – Possibility of subsequent release (15.5.0) under study, but schedule is very tight. August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas17
18
Computing Infrastructure & Operation ATLAS world-wide computing: ~ 70 sites (including CERN Tier0, 10 Tier-1s, ~ 40 Tier-2 federations) August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas18 WLCG
19
reconstruction analysis Physics analysis at Tier2 Physics analysis at Tier2 Event Summary Data raw data Reprocessing at Tier1 Reprocessing at Tier1 Simulation at Tier1/2 Simulation at Tier1/2 analysis objects (extracted by physics topic) 1st pass raw data reconstruction at Tier0 and export 1st pass raw data reconstruction at Tier0 and export processed data simulation interactive physics analysis Computing 4 main computing operations according to the Computing Model: First-pass processing of detector data at CERN Tier0 and data export to Tier-1’s/Tier-2’s Data re-processing at Tier-1’s using updated calibration constants Simulation of Monte Carlo samples at Tier-1’s and Tier-2’s (Distributed) physics analysis at Tier-2’s and at more local facilities (Tier-3’s) Actual Computing Model (CM) is much more complex: includes data organization, placement and deletion strategy, disk space organization, database replication, bookkeeping, etc. CM and above operations have been exercised and refined over the last years through functional tests and data challenges of increasing functionality, realism and size.
20
STEP09 is Scale Testing for the Experiment Program 09 – Computing systems commissioning test – Performed in conjunction, and at times concurrently, with other LHC experiments Involved all major computing activities: – Simulation production – Full chain data distribution – Reprocessing at Tier-1s – User Analysis Challenge: Hammercloud – ATLAS Central Services infrastructure A successful exercise for ATLAS, with lessons learned – Reprocessing from tape worked well, with CMS concurrently active at shared T1s, 6 T1s exceeded targets, 3 close, 1 lagged – Data distribution was generally good, but learned how fast backlogs can build up – ATLAS central services worked well, but room for improvements, always – Analysis tested at very high rates at T2s, encouraging results, further tests needed – Too much effort required, need more automation Recent Computing Tests - STEP09 August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas20 MB/s June 2009 4 GB/s Data transfer Tier0 Tier-1s and Tier-1s Tier-1s Higher peak rate than nominal (1-2 GB/s at LHC) sustained over 2 weeks Mixture of cosmics and simulated data
21
Distributed Analysis Tests Massive, “chaotic” access of data over the grid by users performing distributed analysis has not yet been thoroughly tested. – Robotic tests (HammerCloud) run on some analysis queues since Fall 2008 and on all analysis queues since Spring 2009. Use a handful of analysis job types from actual users. An important component of STEP09 tests. Provide important benchmarks (and define limitations). – Also need tests by actual users under “battlefield conditions”. STEP 09 involved 14 experienced users running on a single data sample in U.S. Proper test requires a very large data sample (generated with fast sim). Outline of plan: – Start again with experienced users on one cloud, then expand to more users. – Some users run jobs on T2s (D 3 PD production); All do T2 -> local work space. – Expand to all clouds; Include analysis jobs across clouds. – Likely need for multiple test/improve interations; 1 st large scale this summer. August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas21 AGLT2MWT2NET2SLACSWT2 59*80748475 Job success efficiency (%) one user made many attempts here before overcoming user config issue
22
Computing Resources 2009-2010 2009 CPU (kHS06) 2009 Disk (PB) 2009 Tape (PB) 2010 CPU (kHS06) 2010 Disk (PB) 2010 Tape (PB) Tier0+CAF573.45.1674.09.0 Tier-1 (sum) 8112.38.321721.914.2 Tier-2 (sum) 11110.8-24020.9- August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas22 * Requested; not yet approved
23
Trigger, Data Acquisition, Software, and Computing Summary These systems are in a remarkable state of readiness for delivering data to the ATLAS physics program, particularly without yet seeing collisions. – Exploited single beam and cosmics data. – Testing, testing, testing. However, colliding beams will likely present new stresses & surprises. Some remaining challenges: – Trigger: Rapid commissioning (timing, tuning, etc.) - while delivering physics data to storage. Flexible and timely response to rapidly changing conditions. – DAQ/HLT: Efficient operation: minimizing downtime, minimizing deadtime - throughout long run – Software & Computing: New challenges of analysis on the grid. Timely, reliable improvements + reprocessings in response to unforeseen needs. Robust operation: minimum downtime, smooth operation through disturbances. August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas23
24
Trigger, Data Acquisition, Software, and Computing Concluding Remarks My privilege to report to you the status of these projects and activities. These systems transport data from the detector to your desktop and transform the raw data streams into the data sets for your physics. Don’t take these systems for granted. IT is the enabling technology that allows us to sift through the enormous data sets produced by the unprecedented luminosities required at this √s. Although IT industry provides the hardware that makes these systems possible, the system designs and software harness the power into a form that we can use. – This is a remarkable technical challenge, as significant as any faced by ATLAS. When you sit in front of your workstation awaiting your plots, don’t lament, consider contributing to further improvements in these critical areas. August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas24
25
Backup Slides August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas25
26
HLT Tracking – efficiency vs. impact parameter August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas26 The following three figures (starting with this one) show L2 event reconstruction efficiency for 2008 cosmic data, separately for the three Inner Detector tracking algorithms that were running at L2. L2 efficiency is defined with respect to an event with an offline track and was measured for the RPC L1 stream since cosmic algorithms at L2 were only used to trigger events from this stream. The offline track is required to have at least three silicon space points (SP, number of pixel hits plus the number of SCT hits divided by two) in the upper and three in the lower part of the silicon barrel. Either of the two track arms can be independently reconstructed at L2 by silicon algorithms. If there is more than one such track in an event, the track with most pixel hits is used as a reference (most SCT hits if there are no pixel hits). We require |d0| < 200 mm for plots other than d0 dependency. The track is also required to be within Transition Radiation Tracker (TRT) read-out time window. This is required due to large RPC trigger jitter. TRT detector is reading out three bunch crossings (BC). TRT is operating on a principle of a drift chamber, where time of arrival of the signal varies over the range of about 50ns, depending on the track to wire distance within the TRT detector element (straw). Therefore, optimal time window for TRT detector is even considerably smaller than 75ns (three BC). Outside the optimal range, hits have either poor tracking accuracy or are lost completely. This has clear impact on the track reconstruction efficiency as can be seen in the figure. For plots other than this one we therefore require -10ns < TRT EP < 25ns. L2 reconstruction efficiency as a function of TRT EP. Different symbols indicate different L2 algorithms as shown in the legend. TRT L2 efficiency falls off sharply at the edge of TRT read-out time window. Events with invalid EP (mostly events that have no TRT hits on track) are not shown in this plot. With |d0| < 200mm requirement. L2 reconstruction efficiency as a function of track impact parameter d0. Different symbols indicate different L2 algorithms as shown in the legend. With -10ns < TRT EP < 25ns requirement.
27
E/gamma Slice – shower shape August 3-5, 2009Lankford - ATLAS Physics Workshop of the Americas27 The figure shows the shower shape R_eta used for electron and photon selections calculated at Level-2 and Event Filter. It is calculated by the ratio of the energy deposit in 3x7 cells (corresponding to 0.075 x 0.175 in Delta eta x Delta phi) over 7 x 7 cells in the second EM sampling. Only clusters are shown if the cluster could be matched to an offline cluster with ET>5 GeV. Note, the value of R_eta can have values above one due to the electronic shaping function used in LAr. They are set-up in such a way that noise contributions will fluctuate around zero instead of producing on offset, thus cell energies can obtain negative values. This might result that the total energy deposit in Delta eta x Delta phi = 3 x 7x cells is bigger than the one in 7 x 7 cells in case of small signals. The plot was done using run 90272. It shows nicely the form of the R_eta variable obtained in cosmic data taking and conveys the message the HLT e/gamma trigger is technically functioning. E 3x7 /E 7x7 cells in LAr calorimeter for trigger clusters matching a cluster reconstructed offline
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.