Download presentation
Presentation is loading. Please wait.
Published byAnabel Haynes Modified over 8 years ago
1
David Adams ATLAS Architecture for ATLAS Distributed Analysis David Adams BNL March 25, 2004 ATLAS Distributed Analysis Meeting
2
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 20042 Contents Scope Model Job-based AJDL Application Task Dataset Job High-level services Analysis service Job management service Catalog services Implementation Strategy Effort providers ARDA Role of GANGA Connection to LHCb More information
3
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 20043 Scope Analysis (not necessarily distributed) Supports the manipulation and extraction of summary data (e.g. histograms) from any type of event data –AOD, ESD, … Supports user-level production of event data –e.g. MC generation, simulation and reconstruction Distributed analysis Extends the extraction and production support to include distributed users, data and processing. Natural extension of non-distributed analysis Easily invoked from any ATLAS analysis environment –including Python, ROOT, command line –easily ported to any future environment (e.g. JAS)
4
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 20044 Model ADA model has the following features Job-based Client-service Architecture Generic interface Extensible interface Components Virtual data Provenance
5
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 20045 Model: Job-based Adopt a job-based model for distributed processing User provides a high-level description of a task to be carried out –Typically a “transformation” and an input dataset Processing system splits jobs, submits and tracks sub- jobs and gathers and merges results User may also provide suggestions or directives about how to do processing –Where place output data –Mechanisms for splitting and merging
6
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 20046 Model: Clients and services Separate clients from services Client should be lightweight and portable Services implemented as web services –Service can be more robust and have a longer lifetime –Does not preclude service from running on the client side –Or even accessing python-based services as plug-ins Services and clients to be developed independently
7
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 20047 Model: Architecture
8
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 20048 Model: Generic interface Define a generic high-level interface Processing system can be constructed using only this interface –Effort to develop and maintain processing systems can and should be shared >Difficult to make a robust and responsive processing system Same comment for user interfaces ATLAS can try out different processing systems but present the same user interface –Production, DIAL, ARDA, other experiments, future developments
9
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 20049 Model: Extensible interface Interface must be extensible Which application used to process data –Athena, atlsim, root, non-atlas, … Different means to install applications Means for user to configure application for a job Different kinds of data –Application- and user-specific access to data
10
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200410 Model: Components Identify the fundamental concepts Express high-level interface in terms of these components Components: Dataset –Collection of data –Often but not always files Transformation –Specifies means for transforming an input dataset into an output dataset –Application + Task Job –Tracks the transformation of a dataset in processing system
11
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200411 Model: Virtual data Virtual dataset Prescription to create dataset = transformation + dataset May be multiple concrete representations –Created in separate jobs –Different files sizes –Data in DB vs. file –Copy selected events from a dataset User typically selects a virtual dataset –Selection based on dataset metadata –Processing system chooses best (e.g. nearest) concrete representation –Or starts job to (re)create
12
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200412 Model: Provenance System should record provenance of any registered dataset Virtual dataset: input dataset + transformation –In VDC = virtual data catalog Concrete dataset: add processing details –Description of job producing dataset ># sub-jobs >Processing nodes >Times >Etc, etc, … –In job catalog
13
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200413 AJDL Acronym: Analysis Job Definition Language Type names label the different kinds of components Subtypes extend types E.g. Dataset : EventDataset : EventCollectionDataset Base “data” types include: Application – executable to process data Task – user configuration of application Dataset – describes input and output data Job – Activity to perform on (or off) the grid –Typical: app, task and input dataset output dataset Following is usual interaction diagram
14
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200414 Analysis Framework Job 1 Job 2 ApplicationTask Dataset 1 Analysis Service 1. L ocate 2. select3. Create or select 4. select 5. submit(app,tsk,ds) 6. split Dataset Dataset 2 7. create e.g. ROOT e.g. athena Result 9. create 10. gather Result 9. create exe, pkgsscripts, code ADA/DIAL user interface
15
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200415 AJDL: Application Application specifies executable used to process data Two entry points Extract and build task Process input dataset to produce output dataset –Application + Task = Dataset transformation Carries enough information to Locate entry points –Or carry the corresponding scripts Enable installation of all required software –E.g. list of packages for use with package management system –Might be subtypes for different package management systems
16
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200416 AJDL: Task Task carries the user configuration for an application E.g. runtime configuration or code for shared library Nature of the task specified by the corresponding application At present the task is a collection of embedded text files Task plus application (transformation) should specify the content of input and output datasets Enable users and processing system to –Verify transformation is suitable for given input dataset –Avoid staging unneeded parts of input dataset –Predict the content of output dataset
17
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200417 AJDL: Dataset Provides data view Generic properties for use in high-level services: Location of data (files, DB, …) –So data can be staged Content –E.g. for ATLAS events: event ID’s and type-keys (e.g. good electrons) for each event –EventDataset is an important generic subtype Constituents for compound dataset –Natural boundaries for dataset splitting Subtypes provide interface for users and applications to access the data
18
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200418 AJDL: Job Interface enables users (and high-level services) to monitor and manage jobs on the grid Generic properties State: running, succeeded, failed, paused, … Input parameters (e.g. application, task and dataset) Result (e.g. output dataset) after completion Management Pause/resume Kill Update status Job management service to implement these
19
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200419 Nature of AJDL types Nature of components Persistent representation of data –Assume XML –Content specified by DTD or XML schema –AJDL type is XML type Classes (C++, Python, java,…) –Language bindings or re-implementations –These define the API (user interface) –Build tools and GUI on top of these –Class types map to AJDL types –Need methods to write to and create from XML representation Service or resource (as in WSRF)
20
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200420 AJDL functions Do we want non-trivial functions associated with AJDL types? Trivial means getters and setters –E.g. EventDataset XML attribute event count maps to >EventDataset::m_event_count >EventDataset::set_event_count() >EventDataset::get_event_count() const –Class headers could be automatically generated from XML DTD/schema or vice versa Yes –Examples follow –Might try to keep generic types trivial
21
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200421 AJDL functions (cont) Examples of AJDL functions Creator for EventCollectionDataset opens event collection to read the event count PacmanApplication must call pacman to locate the site- specific entry points for the application Task or Dataset subtype may want to stage files to facilitate local access Job subtype implements method to kill job
22
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200422 AJDL functions How to handle multiple languages? Simple functions may be reimplemented Language wrappings may be used in some places –PyLcgDict maps C++ to python Function may be provided by a service Some functions absent in some bindings –E.g. java binding for EventCollectionDataset does not provide access to the C++ POOL event collection interface Cache data in generic classes –Generic EventDataset might hold the event count even though subtypes have means to find this value on demand Handle the problem on case-by-case basis
23
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200423 High-level services High-level services use AJDL components Middleware does not Typically high-level services are generic Only use generic properties of AJDL components Same service for different applications and datasets Different experiments or realms can share services –E.g. LHCb and ATLAS Examples Analysis (transformation) service Job management Catalogs
24
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200424 Analysis service Transformation service might be a better name Provides means to create a concrete dataset Interface functions Request dataset –Input is application, task and dataset –Output is job ID –Associated job carries ID for output dataset Fetch job description –Input is job ID –Output is job
25
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200425 Analysis service (cont) Example scenario for processing a high-level job Input is application, task, dataset and job configuration Map input virtual dataset to concrete representation Split into sub-datasets Create sub-job for each sub-dataset Stage files for each sub-job Locate and possibly install application Build (e.g. compile) task Run sub-jobs Gather and merge results to create output dataset Register output dataset (including replica) Job provides connection to output dataset and detailed job provenance
26
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200426 Job management service Provide means to manage jobs Analysis service creating the job provides this May also want this functionality elsewhere Accessed from job interface to implement management functions Might create job service (OGSI) Or job is a resource (WSRF)
27
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200427 Catalog services Repositories Store AJDL components indexed by ID Selection (metadata) catalogs Help user to select input data, task, … VDC – Virtual Dataset Catalog Prescriptions for creating datasets –Application, task input dataset DRC – Dataset Replica Catalog Mapping between virtual and concrete datasets Job catalog Detailed provenance for concrete datasets
28
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200428 Implementation strategy Define AJDL Components, nature, interfaces Implement catalogs Tables in AMI Programmatic interface –(C++ with Python binding) Analysis services Start with existing services or analogs –DIAL, ATCOM, Capone, GANGA, … Different implementations for different strategies At least one using ARDA middleware
29
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200429 Implementation strategy (cont) User interface Programmatic interface to high-level services and AJDL components –C++, python and eventually java bindings GANGA will provide python binding and use it to deliver a GUI –Extensible design: client tools plug into python bus Middleware Whatever works to begin ARDA services will be used in that context –Like to see better integration with other middleware efforts
30
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200430 Implementation strategy (cont) Web service infrastructure Short term use independent persistent services Mid-term follow ARDA strategy –GAS – grid access service Long term follow standards such as WSRF –Dataset and job become resources? Releases Deliver working prototype in May –Robust enough for average physicist Regular releases adding functionality, improving performance and incorporating new middleware
31
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200431 Effort providers Look to the following for effort: GANGA for user interface and more DIAL for interactive analysis service ARDA integration team for ARDA analysis service ARDA/EGEE and US grid projects for middleware POOL for datasets and metadata? SEAL for python-C++ integration –Later java as well? ATLAS physics and computing groups for ATLAS- specific pieces –ATLAS applications and datasets –System testing and evaluation
32
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200432 ARDA ARDA begins April 1 Two areas in LCG: Middleware development (1 st report delivered) Integration team ATLAS ARDA prototype Collaboration in context of integration team Deliver at least one analysis service base on ARDA middleware We would also like to collaborate on AJDL and other high-level services –ARDA not interested in short term
33
David Adams ATLAS Architecture ATLAS Distributed analysisMarch 25, 200433 More information ADA home page: http://www.usatlas.bnl.gov/ADA This page has links to other projects
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.