David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Contents Goals Key concepts Datasets Transformations Jobs AJDL Service architecture Analysis services DIAL ATPROD ARDA Catalog services Data management services Clients Status ARDA Conclusions Contributors More information
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Goals Provide to globally distributed users: Access to globally distributed data that is –Comprehensible –Enables selection of relevant data –Enables sensible placement of data Means to perform globally distributed processing on this data –High-level view that hides details of underlying middleware –But enables monitoring and debugging –Automatic, complete and accurate provenance All the above must be easy to use Well-integrated with analysis environments –Root, python, etc. Graphical views where appropriate –Browse and examine data, –Monitor jobs, …
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Key concepts Dataset Describes a collection of data –E.g. a collection of reconstructed events, –A collection of histograms, … Transformation Defines an operation to be performed on the data Dataset Dataset Application + task (user configuration of application) Job Instance of a transformation Typical user request processed as a collection of sub-jobs –Same transformation acting on sub-datasets –Plus dataset splitting of input and merging of output
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Key concepts (cont)
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Datasets Dataset includes Identifier Location of data, e.g. list of logical files –Absent for virtual datasets Content (i.e. description of the content) –E.g. list of event ID’s and the type of data for each event –Or a list of histogram names List of constituent datasets –Usually their ID’s –When dataset is composite, access to location and content may require use of the constituent datasets Dataset selection catalog holds metadata Dataset replica catalog holds replica mapping 1 Virtual N concrete dataset mapping
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Datasets (cont) For ATLAS data, we identify Types of data –Used to define dataset categories –Category will be part of the content specification Types of datasets –Currently C++ classes with XML data representation –Third column indicates if this class exists –Likely will move to XML schema as the primary definition See table
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Datasets (cont) NameType?Description EVIDSEventDataset×List of event ID’s EVGENAtlasPoolEventDataset×From event generator HITSAtlasPoolEventDataset×Hits, e.g. from GEANT DIGITSAtlasPoolEventDataset×Digitization of hits RAWAtlasByteStreamEventDatasetRaw data ESDAtlasPoolEventDataset×Event summary data AODAtlasPoolEventDataset×Analysis oriented data TAGAtlasPoolTagEventDatasetEvent metadata NTUPRootNtupleDatasetNtuples HISTORootHistogramDataset×Histograms CBNTCbntDataset×DC1 combined ntuples TEXTTextDatasetText data, e.g. log files
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Transformations Transformation Describes an operation to act on a dataset to produce a new dataset Has two components –Application = code shared by multiple transformations >Usually scripts to locate and run code in software packages –Task = user-supplied configuration (parameters or code) Task List of files –Presently embedded in task –Later could also be logical files Named parameters –Add this soon Typically created by user submitting the job
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Transformations (cont) Application Two entry points (presently scripts) –Build_task to fetch task files, compile, etc –Run creates output dataset from input dataset and built task Typically created by application developer Software package management Need an interface to enable build_task and run scripts to locate software on any machine E.g. “locate mypkg 1.2.3” returns /usr/contrib/mypkg/1.2.3/rh73_gcc73 Also support querying and installation Implement as thin layer on existing package management systems –Pacman, RPM, local build, … Use service to handle installation and removal of packages
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Transformations (cont) For ATLAS we identify the above transformations Characterized by input and output dataset categories Most common ones listed—others are possible
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Jobs A job is an instance of a transformation acting on a dataset Output result is another dataset Partial result may be available before job is complete Typical user-submitted job is split into sub-jobs By splitting input dataset and applying the same transformation to each sub-dataset Strategies for splitting and merging results must be provided Provenance Dataset provenance is specified by recording the input dataset and transformation More complete information is available from the job: –Site, CPU, submission, start and stop times, … –Log files maintained for some period, perhaps as datasets
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, AJDL AJDL = Abstract Job Definition Language Components are representations of Dataset Transformation = Application + Task Job JobPreferences File Identifiers for all the above Presently defined as C++ classes With methods to write to and read from XML –Different for each subclass of Dataset –Same for subclasses of Job XML specified in DTD files
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, AJDL (cont) Look at moving to XML schema Automatically derive classes from XML definitions –Automatic support for other languages (python, java, …) In collaboration with GANGA and others At the same time Try to find one representation for all datasets Introduce separate type for event ID lists –Often too large to carry around in a dataset Also interested in specifying interfaces for AJDL services Those that operate on AJDL components Services listed later Interested in working with others on these specifications
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Service architecture ADA itself is distributed Allows data access and job management to be distributed –Important for scaling to a large number of users Collection of web services –Analysis service for job processing –Job monitoring –Catalog services >Metadata >Repository >Replica (not only for files) Users interact through clients –Root client from DIAL –Python client from GANGA
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Service architecture
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, DIAL analysis service Two instances running at BNL Long running jobs using condor job submission Interactive response using fast LSF queue Working to improve interactive response Submit jobs to perform result merging –Presently done on service host Use parallel jobs for merging Long term, look at the use of job agents –Possibly as part of ARDA Add service to act as switch Delegate jobs based on –Job requirements –Desired response time –Resource availability
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, ATPROD analysis service Enable submission to the existing ATLAS production system At least for user-level production Strategy Split input dataset Make an entry in the production catalog for each sub-job Monitor catalog and gather and merge results as jobs finish Same for the other analysis services Not yet implemented
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, ARDA analysis service Enable submission to the gLite WMS Let EGEE do the work of matchmaking, brokering, job tracking, monitoring, error reporting, … There is a service to submit to the existing prototype system Expect first release of GLite next month Quickly deploy an analysis service based on this Make regular updates taking advantage of more gLite features
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Catalog services Goals of ADA cataloging: Provide a repository for AJDL objects indexed by ID –Insert at site A and extract with ID at site B Enable users to assign metadata to objects and retrieve with queries Record dataset provenance Provide job monitoring Identify three types of catalogs Repository –Map ID to XML string Metadata catalog –Map ID to named attributes Replica catalog –Map ID to a list of ID’s
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Catalog services (cont) Required global catalog instances Repositories for Dataset, Application, Task, Job Metadata catalog for Dataset –Same as that used for production? Replica catalog for Dataset More later First choice is to host these in AMI (soon) Next add local job catalog to record analysis service state So service can be restarted without losing jobs Later look at issues such as Distributed cataloging Private catalogs
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Data management services DQ (Don Quijote) was developed as part of production Provides access to file replica catalogs from all three grids Enables file movement including between grids ADA will adopt this for replica management and movement ATLAS has plan to add a file transfer service Adopt this as well when available SRM provides file management at the site level ATLAS expects sites to deploy this service DQ and ADA will use this as it is deployed GLite has a suite of data management services Including SRM Rest of service model is complex—hide it behind DQ –Already have DQ interface to AlieEn file catalog
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Clients DIAL provides a ROOT client ACLiC used to build dictionaries for DIAL classes –All DIAL classes available on the ROOT command line –Enables catalog browsing, job submission, monitoring, etc. GANGA provides a python client PyLCGDict used to build python wrappers for DIAL classes –All DIAL classes available on the python command line Later build python-only client –Restricted functionality but –Greater portability GUI GANGA is developing a GUI –Data browsing –Configure, submit and monitor jobs
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Status Present system includes Root and Python command line clients DIAL analysis services running –Interactive service at BNL –Batch service at BNL Datasets –Classes for combined ntuples, ATLAS-POOL event collections –All DC1 CBNT data –Few DC2 samples Transformations –DC1 CBNT histograms –DIGI: atlasdigi –RECO: atlas-reco-8.x.0. x= 3, 4, 5
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, ARDA ATLAS-ARDA prototype ARDA is a CERN project to deliver prototype distributed analysis systems for the LHC experiments –Based on gLite (EGEE middleware) The ATLAS ARDA prototype makes use of the components shown in the figure Expect functional system this year
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Conclusions Status ADA is coming together but there is still much to do Still in demo mode; for serious use we must add –Dataset description of DC2 data –Repositories for applications, tasks, datasets and jobs in AMI –Dataset selection catalog in AMI –Dataset replica catalogs in AMI –Transformations for the full DC2 production/analysis chain –Means to move output data to a storage element Expect all this year Future developments (beyond those above) Update AJDL moving to XML schema and adding WSDL GUI (expect this soon) ATPROD service to access more compute resources ARDA service to try out EGEE middleware Improvements to DIAL service to improve interactive response
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, Contributors DIAL D. Adams, W. Deng, V. Sambamurthy, N. Chetan, C. Kannan GANGA K. Harrison, C. Tan, A. Soroko ARDA D. Liko, F. Orellana AMI S. Albrand, J. Fulachier ATLAS C. Haeberli, J. Bahilo, F. Fassi, G. Rybkine, M. Branco Many useful discussions All the above and PPDG, GAG, gLite,…
David Adams ATLAS CHEP2004 Atlas Distributed AnalysisSept 30, More information For more information on ADA, see the home page Includes status of subprojects, relevant talks and documents, and links to associated projects To try it out, run root demo 3 in the latest DIAL release See the ADA paper in the CHEP2004 proceedings