Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, 2007 1 The ATLAS Analysis Architecture Kyle Cranmer Brookhaven National Lab.

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

March 24-28, 2003Computing for High-Energy Physics Configuration Database for BaBar On-line Rainer Bartoldus, Gregory Dubois-Felsmann, Yury Kolomensky,
6/4/20151 Introduction LHCb experiment. LHCb experiment. Common schema of the LHCb computing organisation. Common schema of the LHCb computing organisation.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
ATLAS Analysis Model. Introduction On Feb 11, 2008 the Analysis Model Forum published a report (D. Costanzo, I. Hinchliffe, S. Menke, ATL- GEN-INT )
CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.
CLEO’s User Centric Data Access System Christopher D. Jones Cornell University.
Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
LC Software Workshop, May 2009, CERN P. Mato /CERN.
Nick Brook Current status Future Collaboration Plans Future UK plans.
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
Event Data Model in ATLAS Edward Moyse (CERN) On behalf of the ATLAS collaboration CHEP 2004, Interlaken.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett The Athena Control Framework in Production, New Developments and Lessons Learned.
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Event Data History David Adams BNL Atlas Software Week December 2001.
Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.
DPDs and Trigger Plans for Derived Physics Data Follow up and trigger specific issues Ricardo Gonçalo and Fabrizio Salvatore RHUL.
Vertex finding and B-Tagging for the ATLAS Inner Detector A.H. Wildauer Universität Innsbruck CERN ATLAS Computing Group on behalf of the ATLAS collaboration.
STAR Event data storage and management in STAR V. Perevoztchikov Brookhaven National Laboratory,USA.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
GDB Meeting - 10 June 2003 ATLAS Offline Software David R. Quarrie Lawrence Berkeley National Laboratory
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.
A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.
Peter van Gemmeren (ANL): ANL Analysis Jamboree The ATLAS Data Model.
Artemis School On Calibration and Performance of ATLAS Detectors Jörg Stelzer / David Berge.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
A New Tool For Measuring Detector Performance in ATLAS ● Arno Straessner – TU Dresden Matthias Schott – CERN on behalf of the ATLAS Collaboration Computing.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
DØ Offline Reconstruction and Analysis Control Framework J.Kowalkowski, H.Greenlee, Q.Li, S.Protopopescu, G.Watts, V.White, J.Yu.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
Claudio Grandi INFN-Bologna CHEP 2000Abstract B 029 Object Oriented simulation of the Level 1 Trigger system of a CMS muon chamber Claudio Grandi INFN-Bologna.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
ANALYSIS TOOLS FOR THE LHC EXPERIMENTS Dietrich Liko / CERN IT.
The ATLAS TAGs Database - Experiences and further developments Elisabeth Vinek, CERN & University of Vienna on behalf of the TAGs developers group.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
Next-Generation Navigational Infrastructure and the ATLAS Event Store Abstract: The ATLAS event store employs a persistence framework with extensive navigational.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Physics Analysis Tools Kétévi A. Assamagan BNL. 2 Introduction Discussion list:
I/O aspects for parallel event processing frameworks Workshop on Concurrency in the many-Cores Era Peter van Gemmeren (Argonne/ATLAS)
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
3rd April Richard Hawkings Analysis models in the top physics group  A few remarks …  What people are doing now  Where we are going - Top reconstruction.
Mini-Workshop on multi-core joint project Peter van Gemmeren (ANL) I/O challenges for HEP applications on multi-core processors An ATLAS Perspective.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
WP18, High-speed data recording Krzysztof Wrona, European XFEL
CMS High Level Trigger Configuration Management
ALICE analysis preservation
User feedback on SAN Alan Barr
The ATLAS Computing Model
Use Of GAUDI framework in Online Environment
Presentation transcript:

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, The ATLAS Analysis Architecture Kyle Cranmer Brookhaven National Lab

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Introduction I will review the ATLAS analysis architecture including: ‣ The data chain from trigger to final analysis results ‣ The key architecture components and design principles ‣ The analysis tools and distributed analysis on the grid ‣ The recent developments in our analysis model I will try to present this from a physicist point of view instead of a computing point of view

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, The ATLAS Experiment 140 Million Pixels 50 μm precision Nearly 200,000 Calorimeter Cells!

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Triggering ‣ Bunch crossings 40MHz ‣ Each Event takes ~1.6MB on disk ‣ Couldn’t possibly save all that data! ‣ 3-level trigger system selects 200Hz of interesting data to save

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Trigger & Data Acquisition

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Streaming Model Streaming based on Trigger RAW data streaming out of Sub Farm Output (SFO) Considering disjoint and overlapping streams ‣ both for RAW and AOD

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Event Filter Farm at CERN ‣ Located near the Experiment, assembles data into a stream to the Tier 0 Center Tier 0 Center at CERN ‣ Raw data ➔ Mass storage at CERN and to Tier 1 centers ‣ Swift production of Event Summary Data (ESD) and Analysis Object Data (AOD) ‣ Ship ESD, AOD to Tier 1 centers ➔ Mass storage at CERN Tier 1 Centers distributed worldwide (10 centers) ‣ Re-reconstruction of raw data, producing new ESD, AOD ‣ Scheduled, group access to full ESD and AOD Tier 2 Centers distributed worldwide (approximately 30 centers) ‣ Monte Carlo Simulation, producing ESD, AOD. Move ESD, AOD ➔ Tier 1 centers ‣ On demand user physics analysis CERN Analysis Facility ‣ Analysis with heightened access to ESD and RAW/calibration data on demand Tier 3 Centers distributed worldwide ‣ Physics analysis Tier Structure of ATLAS Computing Model

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Data Processing Stages

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Distributed Data Management In 2004, ATLAS held a Data Challenge associated with a physics workshop ‣ It was the first time ESD & AOD were used for analysis ‣ access to AOD was one of the main challenges Based on this experience, ATLAS created a Distributed Data management (DDM) system: DQ2 ‣ The scope includes the management of file-based data of all types A major component to the design is a dataset ‣ composed of several files each with logical name, unique ID ‣ supports hierarchical datasets: datasets made from other datasets ‣ supports versioning Emphasis on monitoring

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Major Architectural Elements The Athena framework is an enhanced version of the Gaudi framework that was originally developed by the LHCb experiment, but is now a common ATLAS-LHCb project. ‣ Major design principles are: ● the clear separation of data and algorithms, ● and between transient (in memory) and persistent (on disk) data. All levels of processing of ATLAS data, from high-level trigger to event simulation, reconstruction and analysis, take place within the Athena framework. Packages dependencies and building is managed by CMT POOL is used to write output of reconstruction

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Gaudi: Algorithms, Tools, & Services Algorithms: ‣ initialize(), execute(), finalize() ‣ retrieve or record data to the StoreGate service ‣ use AlgTools AlgTools: ‣ interface customized for a particular action ● eg. execute(Electron) ‣ Produced on demand from ToolService ‣ Can be public or private for a particular algorithm Services: ‣ StoreGateSvc is our blackboard for data, manages memory ‣ ToolSvc is a factory for AlgTools ‣ other services for histograming, data conversion, etc...

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Python Based Configuration Algorithms and AlgTools can declare private data members as properties which can be modified externally from Python ‣ Currently undergoing a migration to Configurables ● Auto-generated Python classes encapsulate properties of C++ classes ‣ Also forming higher-level abstractions in Python

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Event Data Model One of the major design principles of Athena is a clear separation of data and algorithms. ‣ A flexible design of the Event Data Model (EDM) is vital! ● same EDM class often used by multiple reconstruction algorithms ● emphasis on common interfaces Links between objects implemented with ElementLink and ElementLinkVector ‣ Acts as a pointer, easy to read/write ‣ Uses StoreGate to retrieve data ● Supports “read-on-demand” ● target data can be in another file!

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Constituent Navigation A common feature of our EDM classes is the ability to Navigate from an object to its constituents ‣ Relational structure is decoupled from physical location in the files! The ESD will have cells, clusters, & tracks The AOD will have electrons, clusters & tracks, but will not have the cells The DPD may have a composite particle and the electrons it was made from, but not store the clusters or track It is possible to navigate from Composite Particle to cells if one has access to ESD & AOD ‣ navigating between files called “Back Navigation”

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Skimming, Thinning, and Slimming We have identified several complementary strategies for reducing and refining information from RAW to four-vectors Skimming: selecting only interesting events based on TAG Thinning: selecting only interesting objects from a container Slimming: selecting only interesting properties of an object Each of these data reduction strategies is highly configurable ‣ allows the user to customize content for their purpose ‣ does not change underlying event data model

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, AOD/ESD Merger and Slimming Initially, AOD & ESD had different representations of particles ‣ needed to reduce information from ESD to AOD ‣ we were unable to run AOD-based algorithm on ESD Recently, we merged AOD & ESD representations ‣ outsource most of information to external objects User can choose what information is needed for their analysis ‣ an example of “slimming” AODESDDPD

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, TAGs & Athena-Aware Ntuples ATLAS will produce event-level metadata called TAGs ‣ simple quantities for selecting interesting events ‣ can be stored in database or as a ROOT TTree. ‣ TAGs will span across AOD streams ‣ target size is 1kB / event The output of Athena analysis is often an ntuple ‣ our most common format for Derived Physics Data (DPD) ‣ recently we added hooks to make them “Athena-Aware” ‣ allows us to use these ntuples like they are TAGs

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Derived Physics Data (DPD) The AOD is the last common, centrally produced format Analysis on AOD produces Derived Physics Data (DPD) ‣ contents will be analysis specific The most common DPD format now is an Athena-Aware ntuple ‣ simple “flat” structure easily accessible in ROOT ‣ can be used for skimming, but can’t process in Athena Recent attention on improving the DPD format ‣ add structure to DPD (eg. add Electron object) ‣ write DPD with POOL so that we can process it with Athena

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Transient-Persistent Separation Previously, we used POOL to write the transient class directly ‣ complex data structure was quite slow to read & write ● typically I/O: ~2MB/s Now the persistent model can be highly optimized to reduce space and improve I/O performance ‣ now read ~10MB/s ‣ size on disk reduced x1-10 ATLAS has recently made a separation between the Transient (in memory) and Persistent (on disk) representation of the data ‣ Allows us to drastically change transient model and continue to read old data (ie. schema evolution)

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, ROOT Interoperability ATLAS currently has a multi-stage analysis model ‣ Jet finding, B-tagging, etc. happen in Athena ‣ Final plots, cut optimization, etc. happen in ROOT The transition between Athena and ROOT depends on ‣ requirements of the analysis ‣ personal preferences ‣ maturity of the analysis Recently we have developed technology to use ESD/AOD/DPD written with POOL directly in ROOT ‣ allows us to unify file formats and save disk space ‣ port code between ROOT and Athena

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Interactive Athena In addition to improving ROOT interoperability, we can use Athena for interactive studies ‣ Dictionaries and Python bindings are generated for all EDM classes From the Athena (python) command prompt one can: ‣ access any object in StoreGate (eg. Electrons) for an event ‣ make plots looping over several events One can even write an analysis algorithm in Python ‣ acts as a normal algorithm in the Gaudi/Athena framework

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Distributed Analysis Tools Distributed analysis tools take advantage of Grid resources for processing. ‣ Unlike Monte Carlo production and reconstruction processing tasks, analysis is very “chaotic” Two main tools for distributed analysis GANGA and pAthena ‣ Both tools provide a user interface similar to local running ● local: athena myJobOptions.py myInput.py ● grid: pathena myJobOptions.py --inDS ttbar.AOD --outDS user.KyleCranmer.myAnalysis.ttbar ‣ Both tools provide bookkeeping and are interfaced with Distributed Data Management tools (eg. DQ2) ● output is registered as a DQ2 dataset ● can be used as an input to subsequent jobs Already several hundred users! ‣ both tools developing quickly with significant user feedback

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, EventView We have developed an analysis framework in Athena based on EventView The EventView is an EDM class that: ‣ represents a consistent interpretation of an event (particle ID, combinatorics, etc.) ‣ points to particles in AOD/ESD with labels ‣ holds UserData for a given “View” Very modular set of analysis tools ‣ easy to extend ‣ promotes sharing of code

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Trigger-Aware Analysis The Atlas High-Level Trigger also runs Athena ‣ allows for some offline code to be re-used As a result, our trigger simulation can use exactly the same algorithms as online Trigger objects are stored in ESD & AOD ‣ we can study trigger menus by reruning trigger with different thresholds ● both real & simulated data

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Event Display ATLAS has 3 major displays ‣ Atlantis: 2D views based on ALEPH’s DALI display, external Java application ‣ Persint: specific to muons ‣ v-Atlas: runs within Athena, supports extensions by user

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Conclusions The ATLAS Analysis Architecture has matured significantly in the last few years. ‣ Most analyses based on AOD & ESD written with POOL ● significant feedback in the last few years has led to an evolution of the analysis model and feed back to the computing model ‣ We have a strong framework with clear design principles ● including several strategies for data reduction ● common analysis tools with a large user-community ‣ Recent developments in the format of Derived Physics Data ● DPD that can be used in both Athena and ROOT ‣ Distributed data management and distributed analysis are also maturing based on feedback from real users ATLAS is looking forward to the LHC startup!

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Backup

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23,

Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, Reprocessing