INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org ATLAS Distributed Analysis A. Zalite / PNPI.

Slides:



Advertisements
Similar presentations
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
Advertisements

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL June 23, 2003 GAE workshop Caltech.
Systems Architecture, Fourth Edition1 Internet and Distributed Application Services Chapter 13.
Automated Tests in NICOS Nightly Control System Alexander Undrus Brookhaven National Laboratory, Upton, NY Software testing is a difficult, time-consuming.
Talend 5.4 Architecture Adam Pemble Talend Professional Services.
Linux Operations and Administration
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
Grappa: Grid access portal for physics applications Shava Smallen Extreme! Computing Laboratory Department of Physics Indiana University.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
K. Harrison CERN, 20th April 2004 AJDL interface and LCG submission - Overview of AJDL - Using AJDL from Python - LCG submission.
David Adams ATLAS AJDL: Analysis Job Description Language David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Nadia LAJILI User Interface User Interface 4 Février 2002.
ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams – BNL September 16, 2005 DOSAR meeting.
David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN.
Event Data History David Adams BNL Atlas Software Week December 2001.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
David Adams ATLAS Architecture for ATLAS Distributed Analysis David Adams BNL March 25, 2004 ATLAS Distributed Analysis Meeting.
David Adams ATLAS DIAL status David Adams BNL November 21, 2002 ATLAS software meeting GRID session.
Systems Analysis and Design in a Changing World, Fourth Edition
INFSO-RI Enabling Grids for E-sciencE SCDB C. Loomis / Michel Jouvin (LAL-Orsay) Quattor Tutorial LCG T2 Workshop June 16, 2006.
MuSL Builder Handcrafting custom Mu Scenarios. MuSL in the Mu Scenario Editor.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
Nurcan Ozturk University of Texas at Arlington US ATLAS Transparent Distributed Facility Workshop University of North Carolina - March 4, 2008 A Distributed.
GDB Meeting - 10 June 2003 ATLAS Offline Software David R. Quarrie Lawrence Berkeley National Laboratory
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences.
D. Adams, D. Liko, K...Harrison, C. L. Tan ATLAS ATLAS Distributed Analysis: Current roadmap David Adams – DIAL/PPDG/BNL Dietrich Liko – ARDA/EGEE/CERN.
EGEE is a project funded by the European Union under contract IST “Interfacing to the gLite Prototype” Andrew Maier / CERN LCG-SC2, 13 August.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
MuSL Builder Handcrafting custom Mu Scenarios. MuSL in the Mu Scenario Editor.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL November 17, 2003 SC2003 Phoenix.
K. Harrison CERN, 3rd March 2004 GANGA CONTRIBUTIONS TO ADA RELEASE IN MAY - Outline of Ganga project - Python support for AJDL - LCG analysis service.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
EGEE-II INFSO-RI Enabling Grids for E-sciencE YAIM Overview MiMOS Grid tutorial HungChe, ASGC OPS Team.
2 June 20061/17 Getting started with Ganga K.Harrison University of Cambridge Tutorial on Distributed Analysis with Ganga CERN, 2.
Hyperion Artifact Life Cycle Management Agenda  Overview  Demo  Tips & Tricks  Takeaways  Queries.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
ADA Job Builder A Graphical Approach to Job Building ATLAS Software and Computing Workshop May 2005 Chun Lik Tan
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
27/02/04 ATLAS weekTileCal Athena tutorial, part 21 TileCal Athena tutorial Part 2: Reading GEANT hits from ZEBRA and POOL Reading HLT and testbeam ByteStream.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
David Adams ATLAS AJDL: Abstract Job Description Language David Adams BNL June 29, 2004 PPDG Collaboration Meeting Williams Bay.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams Brookhaven National Laboratory February 13, 2006 CHEP06 Distributed Data Analysis.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
Starting Analysis with Athena (Esteban Fullana Torregrosa) Rik Yoshida High Energy Physics Division Argonne National Laboratory.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
1 April 2, Software Packaging and Releasing Best Practices William Cohen NCSU CSC 591W April 2, 2008.
ATLAS Distributed Analysis S. González de la Hoz 1, D. Liko 2, L. March 1 1 IFIC – Valencia 2 CERN.
ADA aodhisto transformation
Chapter 15 Introduction to Rails.
Introduction to Athena
ADA analysis transformations
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE ATLAS Distributed Analysis A. Zalite / PNPI

Enabling Grids for E-sciencE INFSO-RI Overview Why? Goal ADA Model First steps Demo example More examples Conclusion

Enabling Grids for E-sciencE INFSO-RI Why? Huge amount of data –Atlas experiment is expected to record several petabytes of data per year –Atlas offline system will produce similar amount of data (ESD, AOD, …) Globally-distributed members of Atlas collaboration –Over 1000 physicists from all over the world will take part in data analysis The data have to be available to all members of the collaboration

Enabling Grids for E-sciencE INFSO-RI Goal Provide to globally distributed users –Access to globally distributed data –Tools to perform globally distributed processing on this data Easy to use and access from analysis environment –Flexible to adopt to environment Enable effective use of all ATLAS computing resources Trace information about processing with any data –Where did this data (event or analysis) come from?

Enabling Grids for E-sciencE INFSO-RI ADA Model Components: Data described by a Datasets (collection of data) –Location of the data (e.g. files) –Content (e.g. list of event ID’s and the type of the data for each event) Transformation describes an operation that can act on a dataset to produce a new dataset –Application scripts used to run job to build task or process data –Task carries user parameters or code (E.g. atlas release, job options, and/or algorithm code) Job is an instance of the transformation acting on a dataset

Enabling Grids for E-sciencE INFSO-RI ADA Model Many ATLAS-specific transformations have been defined –Atlasopt: user provides ATLAS release and job options –Aodhisto: atlasopt plus code to build in UserAnalysis package –Atlasdev: atlasopt plus local development directory –Atlasdev-src: same as atlasdev except development area is tarred up and will be rebuilt if platform changes All these transformations run Athena

Enabling Grids for E-sciencE INFSO-RI ADA Model Transformation

Enabling Grids for E-sciencE INFSO-RI ADA Model This view enables distributed processing: Split input dataset –Along event, file, or sub-dataset boundaries Create separate sub-job for each sub-dataset Implies post-processing stage to merge results (output datasets) Users carry out processing by Defining a job –Application, task and dataset Submitting this definition to a scheduler –Typically an analysis service Polling for status –Job state (and sub-job states) –Result dataset

Enabling Grids for E-sciencE INFSO-RI ADA Model On receiving a job request, the scheduler Builds the task (or locates an existing build) Split the dataset into sub-datasets Create and submits a sub-job for each sub-dataset Merge the results (output datasets) from each sub-job into overall result

Enabling Grids for E-sciencE INFSO-RI ADA Architecture

Enabling Grids for E-sciencE INFSO-RI ADA ADA uses DIAL framework. Release 1.20 of DIAL is the basis for the current ADA system. To use ADA it is necessary To have Grid certificate –Certificate from Russian CA is OK To be member of Atlas VO –Takes some time

Enabling Grids for E-sciencE INFSO-RI First Steps Working node - LXPLUS at CERN Setup grid environment:. /afs/cern.ch/project/gd/LCG-share/sl3/etc/profile.d/grid_env.sh Certification proxy initialization: grid-proxy-init DIAL setup (setup script that defines a few environmental variables and aliases) at CERN: DIALSETDIR=/afs/cern.ch/user/d/dial/apps/dial/setup Verify user certificate and check the status of the unique ID service by issuing the command "uidtest" after setting up dial.

Enabling Grids for E-sciencE INFSO-RI First Steps

Enabling Grids for E-sciencE INFSO-RI First Steps The best way to start with DIAL is to run the demos inside ROOT These demos define a job –application (papp) –task (ptsk) –dataset (pdst) and submit it to the current scheduler (msch) Start: dialroot –i flag –i means that any missing DIAL configuration, example or demo files will be copied into the local directory (necessary only 1 st time)

Enabling Grids for E-sciencE INFSO-RI First Steps

Enabling Grids for E-sciencE INFSO-RI First Steps

Enabling Grids for E-sciencE INFSO-RI Demo Example Distributed analysis is an iterative process where a physicist defines a job, submits it to a processing system, examines the result and then repeats the sequence. Demo selects an application, task and dataset which are then submitted to a scheduler to define a job. root [0].x demos/demo4.C This defines papp, ptsk and pdst root [1] submit() Submit a job based on papp, ptsk and pdst root [2] get_results() Get job status and partial result root [3] TBrowser br Check ouput ntuples and histgrams

Enabling Grids for E-sciencE INFSO-RI Demo Example A job is specified by defining a transformation and selecting a dataset to process with this transformation. The transformation is specified by an application and a task. The application carries the scripts that do the processing and the task carries user configuration data. Demo4 uses aodhisto to create histograms and ntuples from user source code The demo identifies objects by name, extract the corresponding ID from a selection catalog and use this ID to extract the object from a repository.

Enabling Grids for E-sciencE INFSO-RI Demo Example void demo4() { string aname = "aodhisto"; string tname = "aodhisto_zll_aod"; string dname = "hma.dc digit.A1_z_ee.aod files"; aid = asc.id(aname); tid = tsc.id(tname); did = dsc.id(dname); papp = ar.extract(aid); ptsk = tr.extract(tid); pdst = dr.extract(did); }

Enabling Grids for E-sciencE INFSO-RI Demo Example

Enabling Grids for E-sciencE INFSO-RI Demo Example

Enabling Grids for E-sciencE INFSO-RI Demo Example

Enabling Grids for E-sciencE INFSO-RI Demo Example

Enabling Grids for E-sciencE INFSO-RI Demo Example

Enabling Grids for E-sciencE INFSO-RI Demo Example

Enabling Grids for E-sciencE INFSO-RI Demo Example

Enabling Grids for E-sciencE INFSO-RI Demo Example

Enabling Grids for E-sciencE INFSO-RI Demo Example

Enabling Grids for E-sciencE INFSO-RI Demo Example Objects: papp - pointer to the current application ptsk - pointer to the current task pdst - pointer to the current dataset Can be displayed root [4] pprint(papp) Display the application root [5] pprint(ptsk) Display the task root [6] pprint(pdst) Display the dataset

Enabling Grids for E-sciencE INFSO-RI Demo example

Enabling Grids for E-sciencE INFSO-RI Demo Example

Enabling Grids for E-sciencE INFSO-RI More Examples There are more examples: Demo5 uses esd2aod to create AOD from ESD using the prodsys transformation Demo6 uses atlasopt to run a job with provided job options Demo7 uses atlasdev to run a job based on a users atlas development area Demo8 uses atlasdev-src to run a job based on a tarball of a user development area

Enabling Grids for E-sciencE INFSO-RI More Examples Displaying the status of all catalogs to verify connection and see the size of each: root [4] show_catalogs()

Enabling Grids for E-sciencE INFSO-RI More Examples A list of available datasets may be obtained by querying the DSC (dataset selection catalog, object dsc). The DSC is the primary user interface to datasets and it plays a role of what is often called a metadata catalog. Limit the query to 100 results (received 12). The query resticts the selection to TOP level datasets, i.e. complete samples intended for user access and then uses the name to select Rome samples with v10 reconstruction, SUSY data using all AOD data avaialble at BNL. AOD-bnl replaced with AOD to get samples available at both CERN and BNL. Counting datasets matching a query with the query_count method

Enabling Grids for E-sciencE INFSO-RI More Examples

Enabling Grids for E-sciencE INFSO-RI More Examples DCS supports list of parameters which can be used in selection of Datasets

Enabling Grids for E-sciencE INFSO-RI More Examples List attributes for given Dataset Record ID and fetch the Dataset from repository

Enabling Grids for E-sciencE INFSO-RI More Examples

Enabling Grids for E-sciencE INFSO-RI More Examples Select an application in a similar way

Enabling Grids for E-sciencE INFSO-RI More Examples Select a task in a similar way

Enabling Grids for E-sciencE INFSO-RI More Examples The application usually is not modified, but necessity of task modification is very likely Extract the files from the task

Enabling Grids for E-sciencE INFSO-RI More Examples The list of jobOptions can be found in CVS repository at atlas/PhysicsAnalysis/AnalysisCommon/ AnalysisExamples/share/

Enabling Grids for E-sciencE INFSO-RI More Examples Now it is possibly to build a new task from the modified files: ptsk = new dial::Task("atlas_release jo.py output_content", "mytask"); The list of files used to construct the task may be replaced with "*" if you want all the files from the directory Now papp, ptsk and pdst are defined, and job can be submited

Enabling Grids for E-sciencE INFSO-RI More Examples

Enabling Grids for E-sciencE INFSO-RI More Examples

Enabling Grids for E-sciencE INFSO-RI More on More Examples It is not necessary to do a lot of typing (as we did before) to perform previous analysis There is simple way to avoid this – job definition script that defines the application, task and dataset (variables papp, ptsk and pdst). Sample script can be found here: The sample script is copied into the local directory when the dialroot files are installed (dialroot -i). Edit the top part of this script to specify the application, task and dataset of interest. Run: root [0].x jobdef.C root [1] submit()...

Enabling Grids for E-sciencE INFSO-RI More on More Examples void jobdef() { // Specify names for the application, task and dataset. // Typical job definition is created by changing these values. // Depending on the following code, a name may be intepreted as // one or more of the following. // 1. ID: Object identifier. // 2. name: Object name in the default selection catalog. // 3. directory: Name of a directory holding files to be used // construct the object. // 4. xml: Name of a file holding the XML description of the object. // Application: directory, name, or ID. string aname = "atlasopt"; // Task: directory, xml, name, or ID. string tname = "atlasopt_example_zll "; // Dataset: ID or name. string dname = "hma.dc digit.A1_z_ee.aod files"; …….

Enabling Grids for E-sciencE INFSO-RI More on More Examples There is web-interface “DIAL CATALOG QUERY PAGE”

Enabling Grids for E-sciencE INFSO-RI More on More Examples This interface permit to switch to: –Dataset Selection Catalog (DSC) –Task Selection Catalog (TSC) –Application Selection Catalog (ASC) A list of available datasets may be obtained from DSC query page Some useful applications and example tasks are cataloged as well. The application and task catalogs may also be examined using the ASC query page and TSC query page

Enabling Grids for E-sciencE INFSO-RI More on More Examples

Enabling Grids for E-sciencE INFSO-RI More on More Examples

Enabling Grids for E-sciencE INFSO-RI More on More Examples

Enabling Grids for E-sciencE INFSO-RI More on More Examples

Enabling Grids for E-sciencE INFSO-RI More on More Examples

Enabling Grids for E-sciencE INFSO-RI More on More Examples

Enabling Grids for E-sciencE INFSO-RI More on Transformations Transformations are applied to datasets to produce new datasets. A transformation includes: –an application which carries out the processing –a task used to configure the application An application provides two entry points: one to build (e.g compile) a task and one to process a dataset A task is a collection of named text files It is not sensible to arbitrarily combine any task with any application.

Enabling Grids for E-sciencE INFSO-RI More on Transformations There is a task interface that specifies which files must or may be present in a task and how these files are to be used. Tasks are labeled with the interface they provide and applications with the task interface they expect. Task interfaces: –atlas_release –atlas_job_options –atlas_simple_analysis –atlas_user_analysis –atlas_developer_directory –atlas_developer –atlas_xform

Enabling Grids for E-sciencE INFSO-RI More on Transformations The task interface atlas_release specifies an atlas release List of files for atlas_release: –atlas_release - ATLAS release version, e.g The task interface atlas_job_options specifies an atlas release, job options and output content List of files for atlas_job_options: –atlas_release - ATLAS release version, e.g –jo.py - User job options. –output_content - describes the output to be saved (content label and name - HIST hist.root )

Enabling Grids for E-sciencE INFSO-RI More on Transformations The ADA task interface atlas_user_analysis specifies an atlas release and files to replace those in the UserAnalysis package List of files for atlas_user_analysis: –atlas_release - ATLAS release version, e.g –*.h - header files. –*.cxx - C++ source files –requirements - CMT requirements file –AnalysisSkeleton_jobOptions.py - job options file –output_content - describes the output to be saved

Enabling Grids for E-sciencE INFSO-RI Documentation ADA system described on ADA home page – This page also has a link to DIAL 1.20 release page

Enabling Grids for E-sciencE INFSO-RI Conclusion ADA permits now to perform distributed analysis for Atlas experiment Available documentation permits to newcomers to start using of ADA Further development (especially user-oriented) will allows more wider distribution of ADA among physicists