ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams Brookhaven National Laboratory February 13, 2006 CHEP06 Distributed Data Analysis.

Slides:



Advertisements
Similar presentations
Data Management Expert Panel - WP2. WP2 Overview.
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL June 23, 2003 GAE workshop Caltech.
Ganga Developments Karl Harrison (University of Cambridge) 18th GridPP Meeting University of Glasgow, 20th-21st March 2007
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS AJDL: Analysis Job Description Language David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams – BNL September 16, 2005 DOSAR meeting.
David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Grid Workload Management Massimo Sgaravatto INFN Padova.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Event Data History David Adams BNL Atlas Software Week December 2001.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
INFSO-RI Enabling Grids for E-sciencE ATLAS Distributed Analysis A. Zalite / PNPI.
David Adams ATLAS Architecture for ATLAS Distributed Analysis David Adams BNL March 25, 2004 ATLAS Distributed Analysis Meeting.
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
David Adams ATLAS DIAL status David Adams BNL November 21, 2002 ATLAS software meeting GRID session.
Metadata Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL June 7, 2004 BNL Technology Meeting.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences.
A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.
D. Adams, D. Liko, K...Harrison, C. L. Tan ATLAS ATLAS Distributed Analysis: Current roadmap David Adams – DIAL/PPDG/BNL Dietrich Liko – ARDA/EGEE/CERN.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL November 17, 2003 SC2003 Phoenix.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
ADA Job Builder A Graphical Approach to Job Building ATLAS Software and Computing Workshop May 2005 Chun Lik Tan
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
PROGRESS: GEW'2003 Using Resources of Multiple Grids with the Grid Service Provider Michał Kosiedowski.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
Future of Distributed Production in US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13,
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.
David Adams ATLAS AJDL: Abstract Job Description Language David Adams BNL June 29, 2004 PPDG Collaboration Meeting Williams Bay.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
1. 2 Introduction SUMS (STAR Unified Meta Scheduler) overview –Usage Architecture Deprecated Configuration Current Configuration –Configuration via Information.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
David Adams ATLAS Hybrid Event Store Integration with Athena/StoreGate David Adams BNL March 5, 2002 ATLAS Software Week Event Data Model and Detector.
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
The Ganga User Interface for Physics Analysis on Distributed Resources
Presentation transcript:

ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams Brookhaven National Laboratory February 13, 2006 CHEP06 Distributed Data Analysis session

ATLAS D. Adams CHEP06 DIALFebruary 13, Contents Goals Interactive analysis Model Service paradigm Datasets Transformations Jobs Schedulers Catalogs ATLAS deployment Performance Conclusions Documentation DIAL contributors Updated: February 13, 2006

ATLAS D. Adams CHEP06 DIALFebruary 13, Goals DIAL was initiated in 2002 and first presented at CHEP03 Goals remain the same: Demonstrate the feasibility of interactive analysis of large datasets –How much data can we study interactively? Set requirements for grid tools and services –As much as possible, try to use existing and evolving products Provide ATLAS with a useful environment for distributed analysis –Enable physicists to easily examine the AOD (analysis oriented data) and other samples from Monte Carlo production –Be ready for “real” data when it appears in a year or two

ATLAS D. Adams CHEP06 DIALFebruary 13, Interactive analysis Interactive is taken to mean a system which responds to user requests on the time scale of a few minutes (or less) So user may submit many requests of the course of a few hours to try out different ideas without losing his or her train of thought Does not require that user literally “interacts” directly with running jobs Result should be available in a few minutes –If CPU cycles (and bandwidth) are available If not, user should be able to monitor progress and receive updated partial results on an interactive time scale This responsiveness is obtained with parallel processing Input dataset is split Subdatasets are processed in parallel Results are merged Data and compute resources must be accessible on this time scale Most likely share resources with other activities, e.g. production

ATLAS D. Adams CHEP06 DIALFebruary 13, Model Input dataset describes the data to be processed Transformation is an operation to be performed on this data Output of this transformation is another dataset, the result Job preferences are hints on how to carry out processing But should not affect the essence of the result Job is an instance of applying a transformation to a dataset to produce a result Scheduler creates ands runs jobs Catalogs store objects and metadata

ATLAS D. Adams CHEP06 DIALFebruary 13, Service paradigm DIAL provides a web service framework with modules to implement scheduler and catalog services Based on gsoap With plugin for GSI authentication and authorization –Delegation to maintain user identity throughout the processing chain Motivation Provide users with a common interface for wide range of batch and workload management systems Insulate users from splitting, merging and error recovery –Users may express preferences but the work is done at a central point –User are free to disconnect during processing

ATLAS D. Adams CHEP06 DIALFebruary 13, Datasets Dataset describes a collection of data Properties Unique identifier for persistent reference –Two 32-bit integers –All persistent objects have this Location of the data (e.g. a list of files) Content is a collection of content blocks; each has –Content label (e.g. ESD, AOD, HIST, …) –List of content identifiers, each a type and key >E.g. Jets-cone7 or TH1F-jet_pt –Event ID list (may be restricted to count, first and last) List of subdataset ID’s –I.e. datasets are hierarchical

ATLAS D. Adams CHEP06 DIALFebruary 13, Datasets (cont) Classes Base class Dataset defines interface Subclass GenericDataset provides data and means to stream to and from a generic XML representation –Most complete dataset types inherit from this DIAL provides some other generic types –SingleFileDataset – holds a single file >Base for VO-specific classes, e.g. AtlasPoolEventDataset >Subclasses add content information –MultiFileDataset – a generic list of files >No content information >Use this for integration with ATLAS DQ2 –EventMergeDataset – merger of event datasets >E.g. VO-specific single file datasets –TextDataset – directly carries named text files >E.g. for log files

ATLAS D. Adams CHEP06 DIALFebruary 13, Transformations A transformation has two components The application carries out the processing The task carries data used to configure the application Task is a list of named text files Application defines the task interface –Expected file names and their meaning –Not formalized – test by running a job with the task Files may hold run time parameters, source code, … Binary data (e.g. libraries) may be put in an SE (storage element) and the file URL (e.g. LFN) carried in a task file

ATLAS D. Adams CHEP06 DIALFebruary 13, Transformations (cont) Application holds two scripts build_task –Used to build. e.g. compile the task –Run in a directory containing the task files –Writes output to that same directory run –Does the data processing –Run in a directory with the input dataset in file dataset.xml and task location in the file taskdir –Expected to write result dataset to result.xml Both run in minimal environment –Posix, g++ and pkgmgr –Latter provides means to locate software Both scripts return 0 to indicate success

ATLAS D. Adams CHEP06 DIALFebruary 13, Jobs Job is constructed from Job definition –Application, task, dataset and preferences Local run directory –Remote jobs also have a remote dir and a way to bring back result Name of run script (build_task, or run) Base class Job holds data and provides implementation for Reading job data Streaming data to and from XML Base class provides interface for Creation Submission Killing Updating the job data Subclasses implement this for different batch or WM systems

ATLAS D. Adams CHEP06 DIALFebruary 13, Jobs (cont) DIAL provides job subclasses: LsfJob for submission to LSF CondorJob for submission to Condor or Condor-G CondorCodJob to use Condor COD (computing on demand) ScriptedJob to control job through a user-supplied script –Rather than adding a new subclass –Used for globus gatekeeper and PANDA CompoundJob used internally by schedulers Job data includes State (initialized, running, done, failed, killed) –Last three are terminal >job is immutable when one of those states is reached Start, update and stop times Batch or WMS ID, worker node Input job definition and output result And more…

ATLAS D. Adams CHEP06 DIALFebruary 13, Schedulers Normally a scheduler is used to submit, monitor and kill jobs Local scheduler provides submission of single jobs to a particular batch or WMS (via choice of job type) Master scheduler provides distributed processing –Split input dataset –Use local scheduler to process each subdataset –Merge results as subjobs finish Scheduler may be provided as a web service Called an analysis service Web service client provides scheduler interface –Submit a job: job definition  job ID –Check status: job id  updated job object (including result) –Kill job with job ID if desired This is the typical means of accessing a scheduler

ATLAS D. Adams CHEP06 DIALFebruary 13, Catalogs Catalogs provide means to access and select objects Datasets, applications, tasks and jobs Repositories Provide means to store and retrieve objects Indexed by object ID MySQL and flat file implementations Selection catalogs Assign name and other metadata to an object SQL query can be used to select objects MySQL implementation Catalogs may be accessed directly or via web services Web service provides GSI authentication and authorization Removal and update restricted to entry owner

ATLAS D. Adams CHEP06 DIALFebruary 13, Catalogs (cont) Object references may be by value, ID, name or query Value  XML description is provided ID  XML obtained from a repository –Except for jobs that have not reached a terminal state, all persistent objects are immutable  the same object is always obtained. Name or query  selection catalog used to find ID –Selected object may be different for different catalogs –Or may change with time –E.g. use name to identify the latest version

ATLAS D. Adams CHEP06 DIALFebruary 13, ATLAS deployment DIAL services have been deployed at BNL Unique ID Repository Selection catalog Suite of analysis services –Local batch >Fast = LSF queue that overlays jobs (100 MB, 15 min) >Short = Condor preemptive (1 GB, 90 min) >Long = Condor low priority (1 GB, 1 day) –PANDA –Condor-G (as demonstration now) –Globus (usually not run; big load on gatekeeper) Each service has a dedicated URL Data is available at BNL All AOD from last production run (Rome data) All data from the current production (CSC) DIAL datasets are available for all the above

ATLAS D. Adams CHEP06 DIALFebruary 13, ATLAS deployment (cont) ATLAS transformations have been defined All run Athena – the ATLAS framework program In all cases, the task includes –ATLAS release tag, e.g –Top level job options (run time parameters) –List of output files to save (used to construct output dataset) Applications –atlasopt includes just the above –aodhisto adds code to be built inside the standard analysis package >Compiled before the job is run –atlasdev makes use of arbitrary changes to the ATLAS release >Packaged in a tarball constructed from user’s development area >User must compile (for now) –atlasxform runs any of the transformations used in data production Example tasks are provided for each of these transformations

ATLAS D. Adams CHEP06 DIALFebruary 13, Performance To assess performance a reference dataset was constructed Using much of the available CSC AOD data at BNL Mix of physics channels excluding single particle events 1872 files, each with 100 events Size is 25 GB, i.e. 130 kB/event Larger datasets were constructed by duplicating these files 0.5, 1, 1.5, 2, 3,4,5 and 6 times the reference sample Copies divided between two NFS file servers Transformation The atlasdev transformation was used The task was atlasdev_aod_histos version 3 –Opens four containers in each event >Truth, electrons, cone jets and jet tags –Fills a couple dozen histograms

ATLAS D. Adams CHEP06 DIALFebruary 13, Performance (cont) The reference dataset was run as a single job Athena clock time was 70 minutes –I.e. 43 ms/event, 3.0 MB/s –Actual data transfer is about half that value >Some of the event data is not read Following figure shows results Local fast queue (LSF) –Green squares Local short queue (Condor preemptive) –Blue triangles Condor-G to local fast –Red diamonds PANDA –Violet circles

ATLAS D. Adams CHEP06 DIALFebruary 13, Single job (Single job)/10

ATLAS D. Adams CHEP06 DIALFebruary 13, Conclusions A useful DIAL system has been deployed for ATLAS Common analysis transformations Access to current data For AOD to histograms and large samples, 15 times faster than a single process Easy to use Root interface Web-based monitoring Packaged datasets, applications and example tasks Demonstrated viability of remote processing Via Condor-G or PANDA Need interactive queues at remote sites –With corresponding gatekeeper or DIAL service Or improve PANDA responsiveness

ATLAS D. Adams CHEP06 DIALFebruary 13, Documentation DIAL home page at Service page User guide

ATLAS D. Adams CHEP06 DIALFebruary 13, DIAL contributors GANGA Karl Harrison, Alvin Tan DIAL David Adams, Wensheng Deng, Tadashi Maeno, Vinay Sambamurthy, Nagesh Chetan, Chitra Kannan ARDA Dietrich Liko ATPROD Frederic Brochu, Alessandro De Salvo AMI Solveig Albrand, Jerome Fulachier ADA (plus those above), Farida Fassi, Christian Haeberli, Hong Ma, Grigori Rybkine, Hyunwoo Kim