Software Validation Infrastructure for the ATLAS Trigger Wolfgang Ehrenfeld – DESY On behalf of the ATLAS Trigger Validation Group CHEP 2009 – Prague –

Slides:

Advertisements

Similar presentations

Configuration management

Advertisements

Sander Klous on behalf of the ATLAS Collaboration Real-Time May /5/20101.

Releases & validation Simon George & Ricardo Goncalo Royal Holloway University of London HLT UK – RAL – 13 July 2009.

Test Case Management and Results Tracking System October 2008 D E L I V E R I N G Q U A L I T Y (Short Version)

Trigger Validation Validation team: – Under the Releases and Validation activity coordinated by Simon – Validation team co-coordinated by David Strom (Oregon)

The First-Level Trigger of ATLAS Johannes Haller (CERN) on behalf of the ATLAS First-Level Trigger Groups International Europhysics Conference on High.

The ATLAS trigger Ricardo Gonçalo Royal Holloway University of London.

Real Time 2010Monika Wielers (RAL)1 ATLAS e/  /  /jet/E T miss High Level Trigger Algorithms Performance with first LHC collisions Monika Wielers (RAL)

March 2003 CHEP Online Monitoring Software Framework in the ATLAS Experiment Serguei Kolos CERN/PNPI On behalf of the ATLAS Trigger/DAQ Online Software.

Automated Tests in NICOS Nightly Control System Alexander Undrus Brookhaven National Laboratory, Upton, NY Software testing is a difficult, time-consuming.

Software Testing Test Design and Implementation. Agenda Test Design Test Implementation Test Design Sources Automated Testing 2.

Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.

1 The ATLAS Online High Level Trigger Framework: Experience reusing Offline Software Components in the ATLAS Trigger Werner Wiedenmann University of Wisconsin,

First year experience with the ATLAS online monitoring framework Alina Corso-Radu University of California Irvine on behalf of ATLAS TDAQ Collaboration.

Large Scale and Performance Tests of the ATLAS Online Software CERN ATLAS TDAQ Online Software System D.Burckhart-Chromek, I.Alexandrov, A.Amorim, E.Badescu,

L3 Filtering: status and plans D  Computing Review Meeting: 9 th May 2002 Terry Wyatt, on behalf of the L3 Algorithms group. For more details of current.

Abstract The automated multi-platform software nightly build system is a major component in the ATLAS collaborative software organization, validation and.

The D0 Monte Carlo Challenge Gregory E. Graham University of Maryland (for the D0 Collaboration) February 8, 2000 CHEP 2000.

Framework for Automated Builds Natalia Ratnikova CHEP’03.

CLAS12 CalCom Activity CLAS Collaboration Meeting, March 6 th 2014.

Geant4 Acceptance Suite for Key Observables CHEP06, T.I.F.R. Mumbai, February 2006 J. Apostolakis, I. MacLaren, J. Apostolakis, I. MacLaren, P. Mendez.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.

Nightly System Growth Graphs Abstract For over 10 years of development the ATLAS Nightly Build System has evolved into a factory for automatic release.

Plans for Trigger Software Validation During Running Trigger Data Quality Assurance Workshop May 6, 2008 Ricardo Gonçalo, David Strom.

Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.

The Region of Interest Strategy for the ATLAS Second Level Trigger

Control in ATLAS TDAQ Dietrich Liko on behalf of the ATLAS TDAQ Group.

5 May 98 1 Jürgen Knobloch Computing Planning for ATLAS ATLAS Software Week 5 May 1998 Jürgen Knobloch Slides also on:

1 / 22 AliRoot and AliEn Build Integration and Testing System.

A portal interface to my Grid workflow technology Stefan Rennick Egglestone University of Nottingham

OFFLINE TRIGGER MONITORING TDAQ Training 5 th November 2010 Ricardo Gonçalo On behalf of the Trigger Offline Monitoring Experts team.

A Technical Validation Module for the offline Auger-Lecce, 17 September 2009  Design  The SValidStore Module  Example  Scripting  Status.

Organization and Management of ATLAS Nightly Builds F. Luehring a, E. Obreshkov b, D.Quarrie c, G. Rybkine d, A. Undrus e University of Indiana, USA a,

Alexander Richards, UCL 1 Atlfast and RTT (plus DCube) Christmas Meeting 18/12/2007.

DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.

The ATLAS Trigger: High-Level Trigger Commissioning and Operation During Early Data Taking Ricardo Gonçalo, Royal Holloway University of London On behalf.

TDAQ Upgrade Software Plans John Baines, Tomasz Bold Contents: Future Framework Exploitation of future Technologies Work for Phase-II IDR.

Analysis trains – Status & experience from operation Mihaela Gheata.

Trigger Software Validation Olga Igonkina (U.Oregon), Ricardo Gonçalo (RHUL) TAPM Open Meeting – April 12, 2007 Outline: Reminder of plans Status of infrastructure.

September 2007CHEP 07 Conference 1 A software framework for Data Quality Monitoring in ATLAS S.Kolos, A.Corso-Radu University of California, Irvine, M.Hauschild.

ATLAS Production System Monitoring John Kennedy LMU München CHEP 07 Victoria BC 06/09/2007.

Artemis School On Calibration and Performance of ATLAS Detectors Jörg Stelzer / David Berge.

David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.

Moritz Backes, Clemencia Mora-Herrera Département de Physique Nucléaire et Corpusculaire, Université de Genève ATLAS Reconstruction Meeting 8 June 2010.

A New Tool For Measuring Detector Performance in ATLAS ● Arno Straessner – TU Dresden Matthias Schott – CERN on behalf of the ATLAS Collaboration Computing.

PESAsim – the e/  analysis framework Validation of the framework First look at a trigger menu combining several signatures Short-term plans Mark Sutton.

Clara Gaspar, July 2005 RTTC Control System Status and Plans.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

G.Govi CERN/IT-DB 1 September 26, 2003 POOL Integration, Testing and Release Procedure Integration  Packages structure  External dependencies  Configuration.

MICE CM28 Oct 2010Jean-Sebastien GraulichSlide 1 Detector DAQ o Achievements Since CM27 o DAQ Upgrade o CAM/DAQ integration o Online Software o Trigger.

The ATLAS TAGs Database - Experiences and further developments Elisabeth Vinek, CERN & University of Vienna on behalf of the TAGs developers group.

General requirements for BES III offline & EF selection software Weidong Li.

The ATLAS DAQ System Online Configurations Database Service Challenge J. Almeida, M. Dobson, A. Kazarov, G. Lehmann-Miotto, J.E. Sloper, I. Soloviev and.

Global ADC Job Monitoring Laura Sargsyan (YerPhI).

ATLAS and the Trigger System The ATLAS (A Toroidal LHC ApparatuS) Experiment [1] is one of the four major experiments operating at the Large Hadron Collider.

The ALICE data quality monitoring Barthélémy von Haller CERN PH/AID For the ALICE Collaboration.

January 2010 – GEO-ISC KickOff meeting Christian Gräf, AEI 10 m Prototype Team State-of-the-art digital control: Introducing LIGO CDS.

MAUS Status A. Dobbs CM43 29 th October Contents MAUS Overview Infrastructure Geometry and CDB Detector Updates CKOV EMR KL TOF Tracker Global Tracking.

Barthélémy von Haller CERN PH/AID For the ALICE Collaboration The ALICE data quality monitoring system.

SYSTEM INTEGRATION TESTING Getting ready for testing shifts Gunter Folger CERN PH/SFT Geant4 Collaboration Workshop 2011 SLAC.

Software Release Build Process and Components in ATLAS Offline Emil Obreshkov for the ATLAS collaboration.

L1Calo DBs: Status and Plans ● Overview of L1Calo databases ● Present status ● Plans Murrough Landon 20 November 2006.

How to Contribute to System Testing and Extract Results

(on behalf of the POOL team)

SuperB and its computing requirements

CMS High Level Trigger Configuration Management

ALICE Monitoring

TDAQ commissioning and status Stephen Hillier, on behalf of TDAQ

Presentation transcript:

Software Validation Infrastructure for the ATLAS Trigger Wolfgang Ehrenfeld – DESY On behalf of the ATLAS Trigger Validation Group CHEP 2009 – Prague – 26 th March 2009

Wolfgang EhrenfeldATLAS Trigger Software Validation2/25 Content  introduction ATLAS trigger ATLAS trigger software project  validation ATLAS nightly build system automatic testing and monitoring  new developments trigger software validation dashboard  summary

Wolfgang EhrenfeldATLAS Trigger Software Validation3/25 ~ 40 ms ATLAS Trigger: Overview software hardware 2.5  s ~ 4 sec 3-Level Trigger System: 1. LVL1 decision based on data from calorimeters and muon trigger chambers; synchronous at 40 MHz; bunch crossing identification 2. LVL2 uses Regions of Interest (identified by LVL1) data (ca. 2%) with full granularity from all detectors 3. Event Filter has access to full event and can perform more refined event reconstruction < 75(100) kHz ~ 3 kHz

Wolfgang EhrenfeldATLAS Trigger Software Validation4/25 ATLAS Trigger & DAQ Architecture ~30 PCs  Storage LVL2 and EF run in large PC farms on the surface 8 core nodes < 75(100) kHz

5/25 ATLAS Project Dependencies LCGCMT AtlasConditions AtlasCore AtlasEvent AtlasTriggerAtlasSimulation AtlasAnalysis Gaudi AtlasProduction AtlasOffline AtlasReconstruction AtlasHLT/HLTMon tdaq-common tdaqDetCommon AtlasTier0 Proposed but not yet implemented Inner Detector LVL1 dqm-common AtlasP1HLT onlineoffline

Wolfgang EhrenfeldATLAS Trigger Software Validation6/25 Trigger Software Project  project details: developers: ~100 number of packages: ~250 distributed over many build projects: DetCommon, AtlasEvent, AtlasConditions, AtlasTrigger, AtlasAnalysis  used for online running at L2, EF offline development and simulation for L1, L2, EF  requirements: stable running on the online farms  ~16800 cores  run period of ~12 hours Stable running for offline studies and MC production on the GRID Language files blank comment code lines XML C Python C/C++ Header AtlasTrigger project size ( Validation goals: deliver high quality software for stable online running, while continuous evolution of trigger code is expected to improve trigger operation and performance  ~10% of full ATLAS software

Wolfgang EhrenfeldATLAS Trigger Software Validation7/25 Code Development during Cosmic Data Taking stable running beam prep cosmics prep

Wolfgang EhrenfeldATLAS Trigger Software Validation8/25 Steps until Successful Data Taking

Wolfgang EhrenfeldATLAS Trigger Software Validation9/25 ATLAS Trigger Validation Group  trigger validation group: ~40 members from different trigger areas (L1, L2, EF, different slices, steering) main focus on daily validation: shifts of 4 hours/day for 1 week  monitor release status  locate problems and bugs  submit bug reports via Savannah  daily status for developers, release coordination and users develop tools based on common ATLAS tools for easier validation work  goals: big picture: release  ensure, that the trigger part of a release or cache is fully functional small picture: package  if a package fails (build, config, run time), identify problem and notify developer via Savannah bug tracker unify validation (frame)work within trigger community to save man power  why? developers scope for testing is limited (code compiles and runs, some tests run) integration with other packages (within AtlasTrigger and from other projects) some problems only show up after processing a large number of events

Wolfgang EhrenfeldATLAS Trigger Software Validation10/25 ATLAS Release Structure  structure for given release line: Base release, e.g Production cache: mainly bug fixes needed for MC production (GRID) T0 cache: mainly bug fixes for data taking (P1) / first reconstruction (T0)  current release lines: MC production cosmic reprocessing External package validation, e. g. Geant4 development for data taking  validation needs to ensure correct release or cache built run time configuration run time execution physics correctness

Wolfgang EhrenfeldATLAS Trigger Software Validation11/25 NIghtly COntrol System (NICOS) Entry page to nightly build system: list all active builds (release, caches) summary of build status Build full ATLAS software every night: list build status per package and project run unit tests after package build run full tests after project build provided by

Wolfgang EhrenfeldATLAS Trigger Software Validation12/25 NICOS – Project Summary architecture (32/64 bit), operating system (SLC4/5), compiler version (3.4/4.3), compiler optimization (opt/debug) build status Summary of project build: list all build combinations summary of build status failed builds (packages) successful tests (fraction) test status

Wolfgang EhrenfeldATLAS Trigger Software Validation13/25 NICOS – Package Build for AtlasTrigger Package status: built status (compilation, library build, auxiliary files) dependency check unit test In case of problems automatic notification to package managers (developers)! unit test: not widely used dependency check: run only on dedicated builds

Wolfgang EhrenfeldATLAS Trigger Software Validation14/25 AT Night Testing (ATN)  ATN allows to run custom jobs with full project framework on O(10) events: test run time behavior technical validation:  run time, e.g. accessing null pointers, infinite loops, …  configuration (python based): missing or broken run time configuration regression test: is the code doing the same as before automatically run after project build finished; standalone running possible  common infrastructure supplied by trigger validation group configuration and runtime environment available tests: regression tests on log files and histograms collection and presentation of status information  developer needs to supply: job configuration for test job hooks for regression test; fill monitoring histograms references  enough information available to easily find and diagnose problems  sometimes too much information for all available releases/builds meta summary needed provided by

Wolfgang EhrenfeldATLAS Trigger Software Validation15/25 ATN – Test Results log file regression test histogram comparison regression and histogram tests essential to check expected behavior before release deadline

Wolfgang EhrenfeldATLAS Trigger Software Validation16/25 ATN – Meta Summary Present most information from single ATN test in a compact way: quick overview of release status (release coordination) time dependence over the nightly build cycle (7 days) comparison between different releases, e. g. dev and devval one click: access to full details traffic light for quick overview summary of exit codes, errors, regression test

Wolfgang EhrenfeldATLAS Trigger Software Validation17/25 Run Time Tester (RTT)  RTT allows to run over O(1000) events in an automated process similar to ATN for every nightly build and releases accumulate higher statistics  histogram based physics validation  find run time problems with rare conditions memory monitoring CPU time performance floating point exceptions redundancy  results, log and intermediate files (configurable) are available on the web  tests part of ATN tests are run in RTT additional tests, which doesn’t make sense on O(10) events provided by 40k minimum bias events: 1 sec of data taking 1k top pair events: multi purpose sample Other special samples e. g. B physics See talk by B. Simmons: Id 140: , 15:20

Wolfgang EhrenfeldATLAS Trigger Software Validation18/25 Memory Monitoring  any kind of memory leak has a significant effect on the total memory budged: L2 example: 500 nodes a 8 cores, 75 kHz  ~20 events/core/s 1 GB/core installed, ~800 MB/application needed  200 MB margin  1 byte  20 B leak/s  ~70 kB/h  ~1 MB/run(12h)  10 byte  ~10 MB/run(12h)  100 byte  ~100 MB/run(12h) high input rate and long online time makes any memory leak a problem  memory monitoring is essential  aim: L2: memory leak below 10 B/event EF: memory leak below 1 kB/event

Wolfgang EhrenfeldATLAS Trigger Software Validation19/25 Memory Monitoring  Webpage  Memory usage plot Access to memory plot and all job information Total memory usage and per event leak Numbers close to requirements, but tests are not purely online! slope: 14.3 kByte

Wolfgang EhrenfeldATLAS Trigger Software Validation20/25 CPU Time Monitoring  if the average processing time per event exceeds the limit, trigger dead time will increase and potential Higgs events get rejected  CPU time monitoring is crucial  offline CPU time monitoring is more difficult than offline memory monitoring event mixing  done in technical runs last year CPU specs  online integration/cosmics online/offline environment, e. g. I/O  but a few things are possible per algorithm monitoring (normalization from online running) coarse estimate (no stable test environment with regards to CPU) L2 jet trigger: p T > 20 GeV

Wolfgang EhrenfeldATLAS Trigger Software Validation21/25 Advanced Tools in RTT – DCube  clearly, single distributions are essential to pin down problems, but a tool for automatic comparison is needed:  DCube (Data Quality Browser)  infrastructure for automatic histogram processing and comparison, e. g. Kolmogorov-Smirnov,  2 or bin-by-bin test details / histogram for test color code for easy readability provided by

Wolfgang EhrenfeldATLAS Trigger Software Validation22/25 Advanced RTT Use Cases  RTT is quite a powerful tool for automatic job execution and testing  ideal tool to extent validation to a broader area test trigger related tools (e.g. DB upload) test more physics related quantities (e.g. trigger counts)  within trigger software validation there are 8 test packages: general trigger test packages: TriggerTest, TrigAnalysisTest, TrigP1Test special purpose test packages: TrigEgammaValidation, TrigInDetValidation, TrigMenuValidation, TrigMinBiasPerf, TrigTauPerformAthena  examples: TrigMenuValidation: load trigger menu from DB, run trigger simulation and compare with results from trigger menu from XML source TrigEgammaValidation: special monitoring of all e  related trigger variables (heavy use of DCube comparison to spot small differences)

Wolfgang EhrenfeldATLAS Trigger Software Validation23/25 Recent Developments – Dashboard  interval of validity: all information from tests in nightly builds (ATN/RTT) are lost after 7 days (nightly build cycle) information from tests of a full release build always accessible  how to do long term monitoring?  trigger software validation dashboard collect test information for a long time scale using a database visualize results in an easy way on the web, e. g. tables and graphs advanced search and filter capabilities  status: small scale prototype developed and under testing for further

Wolfgang EhrenfeldATLAS Trigger Software Validation24/25 Trigger Software Validation Dashboard  prototype functional  full implementation in near future  long term validation/monitoring of physics quantities, e.g. trigger rates query interface Software change Failed test Query result

Wolfgang EhrenfeldATLAS Trigger Software Validation25/25  the trigger software project is a substantial software project within ATLAS providing online and offline trigger software  thorough offline validation is an essential step towards achieving a stable online running  offline trigger software validation is done on a shift basis: constant monitoring of the release status prompt spotting of problems and developer notification  offline trigger software validation is done by a group of people: join efforts in providing common tools and infrastructure relieve developers and release coordination from work load  future developments and plans achieve easy and maintainable validation process more focus on physics validation although development will continue, can not sustain high effort during data taking Many thanks to the ATLAS Software Infrastructure Team Summary provided by See talk by F. Luehring: Id 248: , 16:50