David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.

Slides:



Advertisements
Similar presentations
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Advertisements

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL June 23, 2003 GAE workshop Caltech.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
K. Harrison CERN, 20th April 2004 AJDL interface and LCG submission - Overview of AJDL - Using AJDL from Python - LCG submission.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS AJDL: Analysis Job Description Language David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Nick Brook Current status Future Collaboration Plans Future UK plans.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams – BNL September 16, 2005 DOSAR meeting.
David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN.
Grid Workload Management Massimo Sgaravatto INFN Padova.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Event Data History David Adams BNL Atlas Software Week December 2001.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
V. Serbo, SLAC ACAT03, 1-5 December 2003 Interactive GUI for Geant4 by Victor Serbo, SLAC.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
INFSO-RI Enabling Grids for E-sciencE ATLAS Distributed Analysis A. Zalite / PNPI.
David Adams ATLAS Architecture for ATLAS Distributed Analysis David Adams BNL March 25, 2004 ATLAS Distributed Analysis Meeting.
David Adams ATLAS DIAL status David Adams BNL November 21, 2002 ATLAS software meeting GRID session.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL June 7, 2004 BNL Technology Meeting.
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences.
D. Adams, D. Liko, K...Harrison, C. L. Tan ATLAS ATLAS Distributed Analysis: Current roadmap David Adams – DIAL/PPDG/BNL Dietrich Liko – ARDA/EGEE/CERN.
STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL November 17, 2003 SC2003 Phoenix.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Korea Workshop May GAE CMS Analysis (Example) Michael Thomas (on behalf of the GAE group)
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
A proposal: from CDR to CDH 1 Paolo Valente – INFN Roma [Acknowledgements to A. Di Girolamo] Liverpool, Aug. 2013NA62 collaboration meeting.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
- GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
David Adams ATLAS AJDL: Abstract Job Description Language David Adams BNL June 29, 2004 PPDG Collaboration Meeting Williams Bay.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams Brookhaven National Laboratory February 13, 2006 CHEP06 Distributed Data Analysis.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
David Adams ATLAS Hybrid Event Store Integration with Athena/StoreGate David Adams BNL March 5, 2002 ATLAS Software Week Event Data Model and Detector.
(on behalf of the POOL team)
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Data Analysis in Particle Physics
Production Manager Tools (New Architecture)
Presentation transcript:

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization

David Adams ATLAS March 25, 2003 DIAL CHEP Contents Goals of DIAL What is DIAL? Design Applications Schedulers Datasets Status Development plans GRID requirements

David Adams ATLAS March 25, 2003 DIAL CHEP Goals of DIAL 1. Demonstrate the feasibility of interactive analysis of large datasets Large means too big for interactive analysis on a single CPU 2. Set requirements for GRID services Datasets, schedulers, jobs, resource discovery, authentication, allocation, Provide ATLAS with analysis tool For current and upcoming data challenges

David Adams ATLAS March 25, 2003 DIAL CHEP What is DIAL? Distributed Data and processing Interactive Prompt response (seconds rather than hours) Analysis of Fill histograms, select events, … Large datasets Any event data (not just ntuples or tag)

David Adams ATLAS March 25, 2003 DIAL CHEP What is DIAL? (cont) DIAL provides a connection between Interactive analysis framework –Fitting, presentation graphics, … –E.g. ROOT and Data processing application –E.g. athena for ATLAS –Natural for the data of interest DIAL distributes processing Among sites, farms, nodes To provide user with desired response time

David Adams ATLAS March 25, 2003 DIAL CHEP 20036

David Adams ATLAS March 25, 2003 DIAL CHEP Design DIAL has the following components Dataset describing the data of interest –Organized into events Application –Event loop providing access to the data Task –Result to fill for each event –Code process each event Scheduler –Distributes processing and combines results

David Adams ATLAS March 25, 2003 DIAL CHEP User Analysis Job 1 Job 2 ApplicationTask Dataset 1 Scheduler 1. Create or locate 2. select3. Create or select 4. select 8. run(app,tsk,ds1) 5. submit(app,tsk,ds) 8. run(app,tsk,ds2) 6. split Dataset Dataset 2 7. create e.g. ROOT e.g. athena Result 9. fill 10. gather Result 9. fill ResultCode

David Adams ATLAS March 25, 2003 DIAL CHEP Design (cont) Sequence diagrams follow 1.User creates a task made up of Event selection Two histograms Code to fill these 2.User submits a job (application, task and dataset) to an existing scheduler 3.Grid scheduler uses site schedulers to process a job

David Adams ATLAS March 25, 2003 DIAL CHEP Create empty result Add first histogram Add event selector Add second histogram Fetch code Create task Create task XML

David Adams ATLAS March 25, 2003 DIAL CHEP Choose application Create task Select dataset Submit job Check job status

David Adams ATLAS March 25, 2003 DIAL CHEP Job submitted Assign job ID Split dataset Loop over sub-datasets Submit job for each sub-dataset

David Adams ATLAS March 25, 2003 DIAL CHEP Applications Current application specification is Name –E.g. athena Version –E.g List of shared libraries –E.g. libRawData, libInnerDetectorReco

David Adams ATLAS March 25, 2003 DIAL CHEP Applications (cont) Each DIAL compute node provides an application description database File-based –Location specified by environmental variable Indexed by application name and version Application description includes –Location of executable –Run time environment (shared lib path, …) –Command to build shared library from task code Defined by ChildScheduler –Different scheduler could change conventions

David Adams ATLAS March 25, 2003 DIAL CHEP Schedulers A DIAL scheduler provides means to Submit a job Terminate a job Monitor a job –Status –Events processed –Partial results Verify availability of an application Install and verify the presence of a task for a given application

David Adams ATLAS March 25, 2003 DIAL CHEP Schedulers (cont) Schedulers form a hierarchy Corresponding to that of compute nodes –Grid, site, farm, node Each scheduler splits job into sub-jobs and distributes these over lower-level schedulers Lowest level ChildScheduler starts processes to carry out the sub-jobs Scheduler concatenates results for its sub-jobs User may enter the hierarchy at any level

David Adams ATLAS March 25, 2003 DIAL CHEP Schedulers (cont) Schedulers communicate using client-server Between processes, nodes, sites User constructs a client scheduler specifying –Remote node –Name for remote scheduler Server process on remote machines –Starts schedulers and assigns them names –Passes requests from clients to the named scheduler Not yet implemented –Communication protocols not established

David Adams ATLAS March 25, 2003 DIAL CHEP

David Adams ATLAS March 25, 2003 DIAL CHEP Datasets Datasets specify event data to be processed Datasets provide the following List of event identifiers Content –E.g. raw data, refit tracks, cone=0.3 jets, … Means to locate the data –List of of logical files where data can be found –Mapping from event ID and content to a file and a the location in that file where the data may be found –Example follows

David Adams ATLAS March 25, 2003 DIAL CHEP

David Adams ATLAS March 25, 2003 DIAL CHEP Datasets (cont) User may specify content of interest Dataset plus this content restriction is another dataset Event data for the new dataset located in a subset of the files required for the original Only this subset required for processing

David Adams ATLAS March 25, 2003 DIAL CHEP

David Adams ATLAS March 25, 2003 DIAL CHEP Datasets (cont) Distributed analysis requires means to divide a dataset into sub-datasets Sub-dataset is a dataset Do not split data from any one event Split along file boundaries –Jobs can be assigned where files are already present –Split most likely done at grid level May assign different events from one file to different jobs to speed processing –Split likely done at farm level

David Adams ATLAS March 25, 2003 DIAL CHEP

David Adams ATLAS March 25, 2003 DIAL CHEP Status All DIAL components in place But scheduler is very simple –Only local ChildScheduler is implemented –Grid, site, farm and client-server schedulers not yet implemented Datasets implemented as a separate system Only concrete dataset is ATLAS AthenaRoot – Holds Monte Carlo generator information

David Adams ATLAS March 25, 2003 DIAL CHEP Status (cont) DIAL and dataset classes imported to ROOT ROOT can be used as user interface –All DIAL and dataset classes and methods available at command prompt –DIAL and dataset libraries must be loaded Import done with ACLiC Only preliminary testing done Need to add adapter for TH1 and any other classes of interest

David Adams ATLAS March 25, 2003 DIAL CHEP DIAL status (cont) No application integrated to process jobs Except test program dialproc can be used to count events In ATLAS natural thing is to define a DIAL algorithm to run in athena –However ATLAS is not yet able to persist reconstructed data Perhaps a ROOT backend to process ntuples? –Or is this better handled with PROOF? –Or use PROOF to implement a farm scheduler?

David Adams ATLAS March 25, 2003 DIAL CHEP Development plans (Items in red required for useful ATLAS tool) Schedulers Add client-server schedulers Farm scheduler –Allows large-scale test Site and grid schedulers –GRID integration –Interact with dataset, file and replica catalogs –Authentication and authorization

David Adams ATLAS March 25, 2003 DIAL CHEP Development plans (cont) Datasets Interface to ATLAS POOL event collections –expected in summer ROOT ntuples ?? Applications Athena for ATLAS ROOT ?? Analysis environment Import classes into LCG/SEAL? (Python) JAS? (java binding?)

David Adams ATLAS March 25, 2003 DIAL CHEP GRID requirements Identify components and services that can be shared with Other distributed interactive analysis projects –PROOF, JAS, … Distributed batch projects –Production –Analysis Non-HEP event-oriented problems –Data organized into a collection of “events” that are each processed in the same way

David Adams ATLAS March 25, 2003 DIAL CHEP GRID requirements (cont) Candidates for shared components include Dataset –Events –Content –File mapping –Splitting Job –Specification (application, task, response time) –Splitting –Merging results –Monitoring

David Adams ATLAS March 25, 2003 DIAL CHEP GRID requirements (cont) Scheduler –Available applications and tasks –Job submission –Job status including partial results Application –Specification –Installation Authentication and authorization Resource location and allocation –Data, processing and matching

David Adams ATLAS March 25, 2003 DIAL CHEP GRID requirements (cont) Difference with batch processing is latency Interactive system provides means for user to specify maximum acceptable response time All actions must take place within this time –Locate data and resources –Splitting and matchmaking –Job submission –Gathering of results Longer latency for first pass over a dataset –Record state for later passes –Still must be able to adjust to changing conditions

David Adams ATLAS March 25, 2003 DIAL CHEP Grid requirements (cont) Avoid sharp division between interactive and batch resources Share implies more available resources for both Interactive use varies significantly –Time of day –Time to the next conference –Discovery of interesting events Interactive request must be able to preempt long-running batch jobs –But allocation determined by sites, experiments, …