David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL June 7, 2004 BNL Technology Meeting.

Slides:



Advertisements
Similar presentations
Data Management Expert Panel - WP2. WP2 Overview.
Advertisements

Computing Lectures Introduction to Ganga 1 Ganga: Introduction Object Orientated Interactive Job Submission System –Written in python –Based on the concept.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL June 23, 2003 GAE workshop Caltech.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
K. Harrison CERN, 20th April 2004 AJDL interface and LCG submission - Overview of AJDL - Using AJDL from Python - LCG submission.
David Adams ATLAS AJDL: Analysis Job Description Language David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams – BNL September 16, 2005 DOSAR meeting.
David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN.
Event Data History David Adams BNL Atlas Software Week December 2001.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
Ganga A quick tutorial Asterios Katsifodimos Trainer, University of Cyprus Nicosia, Feb 16, 2009.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
INFSO-RI Enabling Grids for E-sciencE ATLAS Distributed Analysis A. Zalite / PNPI.
David Adams ATLAS Architecture for ATLAS Distributed Analysis David Adams BNL March 25, 2004 ATLAS Distributed Analysis Meeting.
David Adams ATLAS DIAL status David Adams BNL November 21, 2002 ATLAS software meeting GRID session.
Metadata Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
GDB Meeting - 10 June 2003 ATLAS Offline Software David R. Quarrie Lawrence Berkeley National Laboratory
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences.
D. Adams, D. Liko, K...Harrison, C. L. Tan ATLAS ATLAS Distributed Analysis: Current roadmap David Adams – DIAL/PPDG/BNL Dietrich Liko – ARDA/EGEE/CERN.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL November 17, 2003 SC2003 Phoenix.
K. Harrison CERN, 3rd March 2004 GANGA CONTRIBUTIONS TO ADA RELEASE IN MAY - Outline of Ganga project - Python support for AJDL - LCG analysis service.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
ADA Job Builder A Graphical Approach to Job Building ATLAS Software and Computing Workshop May 2005 Chun Lik Tan
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.
Application Web Service Toolkit Allow users to quickly add new applications GGF5 Edinburgh Geoffrey Fox, Marlon Pierce, Ozgur Balsoy Indiana University.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
K. Harrison CERN, 21st February 2005 GANGA: ADA USER INTERFACE - Ganga release Python client for ADA - ADA job builder - Ganga release Conclusions.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
David Adams ATLAS AJDL: Abstract Job Description Language David Adams BNL June 29, 2004 PPDG Collaboration Meeting Williams Bay.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams Brookhaven National Laboratory February 13, 2006 CHEP06 Distributed Data Analysis.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Seven things you should know about Ganga K. Harrison (University of Cambridge) Distributed Analysis Tutorial ATLAS Software & Computing Workshop, CERN,
POOL Based CMS Framework Bill Tanenbaum US-CMS/Fermilab 04/June/2003.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
U.S. ATLAS Grid Production Experience
David Adams Brookhaven National Laboratory September 28, 2006
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
PHP / MySQL Introduction
The Ganga User Interface for Physics Analysis on Distributed Resources
Module 01 ETICS Overview ETICS Online Tutorials
ADA analysis transformations
Production Manager Tools (New Architecture)
Presentation transcript:

David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL June 7, 2004 BNL Technology Meeting

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Contents Analysis model Key features Architecture Status Deliverables Authentication and authorization Service infrastructure AJDL Analysis services Catalogs Data movement Clients Applications Datasets Deployment monitoring Status summary ARDA Conclusions Other presentations More information Acknowledgements

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Analysis model User selects Dataset defining the input data Application to process the data –Athena, root, paw, … Task to configure the application –E.g. Script to define histograms and code to fill them User locates an analysis service (or local scheduler) Former submitted to the latter Examine partial results and result Repeat as desired

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Key features Collection of high-level web services User entry points: –Analysis service (job submission and monitoring) –Catalog services (repositories, selection, …) –Job management service (long-running jobs) Other services –Dataset splitters and mergers Loose coupling allows –Migration from reference to sophisticated implementations –Contributions from independent development teams Common interfaces –Selection of most appropriate service at run time –One client can be used with many service implementations –May clients can access the same service

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Key features (cont) AJDL – Abstract Job Definition Language Used to define the high-level service interfaces Object oriented –Application, Task, Dataset, JobPreferences, Job Extensible –E.g. EventDataset, AtlasPoolEventDataset, RootHistogramDataset, … Data component in XML –Argument for service invocation Class representation –For constructing services and clients –C++, Python and maybe java

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Key features (cont) Multiple implementations of the analysis service Production system –Take advantage of existing infrastructure –Provide capability to individual users –Use system as a whole or individual executors DIAL –Distributed Interactive Analysis of Large datasets –Provide interactive response for analysis jobs ARDA –Based on the new EGEE GLITE middleware –Long term replacement for the above? –If performance is the same or better and all sites support Switch –To select between the above and create networks

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Key features (cont) Catalogs, catalogs, catalogs Repositories –Hold XML descriptions of objects indexed by ID –E.g. Dataset and job repositories Selection catalogs –Provide metadata enabling users to select object –E.g. (virtual) dataset selection catalog Dataset replica catalog –Map virtual datasets to one or more concrete representations Virtual data catalog –Maps application, task and input dataset to output dataset –Both datasets are virtual –Included in dataset selection catalog?

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Key features (cont) User interface(s) Use CINT to provide direct access to DIAL/AJDL classes from the root command line Use lcgdict to provide Python bindings for these classes GANGA uses the latter as the basis for a GUI

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Architecture Release 1.0 Work begun

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Demo See DIAL web pages to run demo Runs at BNL and sites with AFS access Distribution kit provided but not robust Steps: Select analysis service Choose application, task and input dataset Submit job Monitor and display partial and final results Repeat as desired

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Status The following slides are taken from ATLAS software workshop May 23-28

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Authentication and authorization Authentication done with GSI Users: get and register grid certificate BNL tier 1 providing list of authorized users From all ATLAS LCG and USATLAS lists Soon updated automatically every 6 hours –Now, send me message when you join and I will ask for update DIAL service uses this list for authorization –Interface and implementation by VS Available to others BNL tier 1 will provide authorization web service Summer 2004? DIAL will add implementation of the same interface

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Authentication and authorization (cont) Like to add user ID’s Long term – beyond lifetime of DN’s (distinguished names) and CA’s (certificate authorities) Need catalog to map DN’s to ID’s For 1.0?

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Service infrastructure For now, persistent standalone services No OGSA: OGSI, WSRF, … No UDDI –But we would to have multiple service instances in 1.0 Expect scaling problems later –More service instances for interactive jobs –Use production system for long-running jobs

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, AJDL Abstract Job Description Language Components used to define high-level interfaces Dataset, transformation, job and job preferences Both data and class to interpret and manipulate data Data in XML Classes in C++ and python; later java perhaps Component has generic interface but is extensible High level services typically use generic interface User and endpoint application need extensions –E.g. application uses event collection (not needed?) –Or user accesses histograms

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, AJDL (cont) For release 1.0 Take AJDL components from DIAL –XML schema (DTD) –C++ class interfaces –C++ class implementation >Including writing and reading XML Python implementation by wrapping C++ classes Future Separate AJDL from DIAL Move data for generic interface into generic classes – So we don’t need subclasses to implement generic interface –Some methods (e.g. kill job) may fail gracefully

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, AJDL Job For release 1.0 All exchanged data carried in base class –Clients only see base for remote jobs –Recently added job ID and list of sub-jobs Job manipulation done though analysis service –Methods include create, start, kill, update, … Later Manipulation though remote class? –By calling appropriate analysis service –Or running job management service >Job is a service (OGSI) or resource (WSRF)

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, AJDL dataset For release 1.0 Interface in Dataset, EventDataset and CompoundDataset Partial implementation in BaseDataset, EventDataset and CompoundDataset Remaining implementation provided by subclasses –CbntDataset, AtlasPoolEventDataset, … –Subclasses have different XML structure Later Try to move XML data into base classes Maybe drop CompoundDataset –Add to Dataset interface

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, AJDL transformation Release 1.0 Application is just name + version –Scripts to build and run stored at site –Software pre-installed Task is collection of text files –Carried in XML Viability of model demonstrated by the introduction of new applications –Some interest in adding task parameters, e.g. atlas release Later Application carries scripts Application carries list of SW packages Automatic installation with PacMan

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, AJDL Job Preferences Class and XML representation added to DIAL Mostly a placeholder Now includes output file catalogs Plan to add user ID, expected CPU time/event, …

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Analysis services DIAL service is running Want to improve response time –Distribute merging >Not yet distributed but no longer blocking –Use Condor COD for submission (done but not integrated) New problem: client-side crashes Need steering service Separate interactive analysis jobs from batch analysis and reconstruction (nothing done) If we support the latter (we do) Like to add a service (or services) based on production system Probably not for release 1.0 (no one identified yet)

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Catalogs See “Catalog services for ATLAS” for the list of required catalogs Repositories –XML description indexed by ID –For dataset, task, jobs, … Selection catalogs –ID and metadata –For datasets and tasks Use AMI to host most of these catalogs Access through generic AMI web service DIAL/AJDL provides client classes for these –User-friendly interface

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Catalogs (cont) For release 1.0 Define schema and construct tables –AMI dataset and task repositories in place >Soon have others: job, job preferences, application, … Provide interface classes –Have AMI interface for repositories –Have MySQL interface for DSC and DRC New generic interfaces –Abstract SqlTable takes SqlQuery returns SqlResult –Have MySQL implementation of SqlTable –Will add AMI implementation –Maybe add Oracle? –Easy to select catalogs at runtime –Base selection (and maybe repository) implementation on these

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Catalogs (cont) Future Distribute catalogs –Do not depend on access to central catalog service Private data in catalogs? Indicate owner/creator for each entry in catalogs –User ID’s required for this Private catalogs for users? Catalog interface to aid in selection –Directly or to build graphical interface –To select datasets and tasks –Simple queries implemented >Enough for 1.0

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Data movement At present data movement supported through FileCatalog interface OK if data is produced in a catalog capable of presenting data to the user –AfsFileCatalog, MagdaFileCatalog Work required to get existing or new catalogs to use DMS, SRM or GridFTP to move data Like to add DonQuojoteFileCatalog Or SrmFileCatalog?

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Clients Use C++/CINT client provided by DIAL User-friendly access from ROOT DIAL classes are processed using rootcint –Continuing success – near full binding in every release >Problems with a few C++ types, e.g. vector GANGA provides Python interface For release 1.0, import with LCGDict –Problem: no free functions  no output stream >Adding display() and to_string() methods –Python binding now included in DIAL Future maybe provide some python implementation –Reduce or remove dependency on AJDL C++ library

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Clients (cont) Java – maybe in the future Command line – partial based on C++ dial_submit extended to allow use of analysis service Web browser – some day

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Clients (cont) Like to add client tools to improve user interface Dataset browser Task browser and editor Job monitor Graphical interface to tie these together Expect these to be provided by GANGA Work begun on job configuration panel that includes the above browsers and editors (Alvin’s talk)

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, ATLAS Applications Minimum for release 1.0 is AOD to histograms User provides algorithm to fill histograms from AOD –Carried by task –Application compiles and (dynamically) links Ketevi working on this Partially implemented –aodhisto 0.90 >Input: AtlasPoolEventDataset >Output: RootHistogramDataset –Scripts in place-needs integration and testing –At present, user code is not implemented: the application runs the Higgs finding algorithm provided by Ketevi –Like to find someone to take over this package

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, ATLAS Applications (cont) Also like to have reconstruction Maybe RDO to ESD for release 1.0 Later add support for bytestream and AOD Christian and Szymon working on this Application is in place –atlasreco 0.90 >Input: AtlasPoolEventDataset >Output: AtlasPoolEventDataset –Does reconstruction using release –Many ideas for future development –See Christian’s talk Later add Monte Carlo tasks

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, ATLAS datasets For DC2, need to add datasets describing POOL event collection –Implementation now in place: AtlasPoolEventDataset >Constructed from a single file holding an implicit collection >Will add support for multiple files and explicit collections >Latter will make it possible to implement the missing event selection method >Which will enable distributed processing of a single file –One such dataset in dataset catalogs ROOT histograms –RootHistogramDataset now in place Bytestream

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Deployment Analysis services: Interactive service at BNL –Using Condor COD or LSF dial queue Reconstruction service at BNL –DIAL using BNL condor –Or another service using grid Service to select between these Last two only if we support reconstruction Nice to have another site Service based on 0.90 alpha is usually running at BNL –Presently uses LSF dial queue –Need to choose a different queue for reconstruction jobs –Probably switch to Condor COD for analysis

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Deployment (cont) Catalog service hosted by AMI Might deploy some catalog services at BNL Those for which we do not have AMI clients Or to study distributed cataloging Don Quixote at BNL if needed Now deployed for production

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Monitoring Would like to monitor Available services Jobs CPU usage Disk usage No plans and no requirements for release 1.0 Have added some functionality to analysis service and client for interactive use Would very much like have graphical monitoring –Might be included in GANGA

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Status summary Status of deliverables for release 1.0 (updated): Authentication and authorization – ok Service infrastructure – ok AJDL – ok (enhanced) Analysis services – will improve not ready Catalogs – not ready (much work done) Data movement – not ready (some progress – job prefs) Clients – will improve (Python now available) ATLAS Applications – not ready will improve ATLAS Datasets – not ready done Deployment – will improve Monitoring – not provided

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Status summary (cont) Status of deliverables for release 0.90: Authentication and authorization – ok Service infrastructure – ok AJDL – ok Analysis services – fix Catalogs – might improve Data movement – might improve Clients – ok ATLAS Applications – ok ATLAS Datasets – ok Deployment – ok Monitoring – ok

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, ARDA ATLAS in ARDA Agreed that ARDA team will deliver an analysis service based on the GLITE (EGEE) middleware We have promised DIAL release by the end of May –Release 0.90 is for ARDA (alpha available now) –Try to finalize next week –Close to what we have now –Try to resolve client crash problem –Get demos working ARDA/GLITE status I still have not heard much Meeting planned week of June 14 (by invitation)

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Conclusions Significant effort required for release 1.0 However we do expect to release an end-to-end system Push this release to June or July Aim for release 0.90 in next week or two Alpha release available now Extra effort could greatly enhance system Analysis service using production system Means for data movement Client tools (work started) Reconstruction application (first version in place) Deployment at other sites Monitoring of any sort

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Other presentations This session we will hear more about DC2 Reconstruction application – Christian DC2 analysis application – Ketevi GANGA/Python interface – Karl Graphical interface - Alvin Working group meeting tomorrow Demos More detailed discussion

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, More information ADA home page: This page has links to deliverables with more info Needs updating to reflect changes since April Release info Follow links to DIAL and DIAL releases E.g. release 0.90 described at – Includes instructions for installing software and running the demos

David Adams ATLAS ATLAS Distributed Analysis BNL Tech MtgJune 7, Acknowledgments Much work leading up to and during this workshop Including effort from Christian Haeberli Szymon Gadomski Ketevi Assamagan Karl Harrison Alvin Tan Vinay Sambamurthy Nagesh Chetan Chitra Kannan Comments and suggestions from many others