David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.

Slides:



Advertisements
Similar presentations
Spark: Cluster Computing with Working Sets
Advertisements

Transaction.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.
Application architectures
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL June 23, 2003 GAE workshop Caltech.
The B A B AR G RID demonstrator Tim Adye, Roger Barlow, Alessandra Forti, Andrew McNab, David Smith What is BaBar? The BaBar detector is a High Energy.
MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.
Trains status&tests M. Gheata. Train types run centrally FILTERING – Default trains for p-p and Pb-Pb, data and MC (4) Special configuration need to be.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
David Adams ATLAS AJDL: Analysis Job Description Language David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams – BNL September 16, 2005 DOSAR meeting.
David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN.
Event Data History David Adams BNL Atlas Software Week December 2001.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
G.Corti, P.Robbe LHCb Software Week - 19 June 2009 FSR in Gauss: Generator’s statistics - What type of object is going in the FSR ? - How are the objects.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
INFSO-RI Enabling Grids for E-sciencE ATLAS Distributed Analysis A. Zalite / PNPI.
David Adams ATLAS Architecture for ATLAS Distributed Analysis David Adams BNL March 25, 2004 ATLAS Distributed Analysis Meeting.
David Adams ATLAS DIAL status David Adams BNL November 21, 2002 ATLAS software meeting GRID session.
Reports and Learning Resources Module 5 1. SLMS Primary Administrator Training Module 5: Reports and Learning Resources 2.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
STAR Event data storage and management in STAR V. Perevoztchikov Brookhaven National Laboratory,USA.
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences.
LHC Physics Analysis and Databases or: “How to discover the Higgs Boson inside a database” Maaike Limper.
AliRoot survey P.Hristov 11/06/2013. Offline framework  AliRoot in development since 1998  Directly based on ROOT  Used since the detector TDR’s for.
D. Adams, D. Liko, K...Harrison, C. L. Tan ATLAS ATLAS Distributed Analysis: Current roadmap David Adams – DIAL/PPDG/BNL Dietrich Liko – ARDA/EGEE/CERN.
STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL November 17, 2003 SC2003 Phoenix.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Service Proforma Middleware Workshop. Notes Please complete as much of this proforma as possible – it will help make the workshop more informative & productive.
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
Korea Workshop May GAE CMS Analysis (Example) Michael Thomas (on behalf of the GAE group)
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
Postgraduate Computing Lectures PAW 1 PAW: Physicist Analysis Workstation What is PAW? –A tool to display and manipulate data. Learning PAW –See ref. in.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
M. Oldenburg GridPP Metadata Workshop — July 4–7 2006, Oxford University 1 Markus Oldenburg GridPP Metadata Workshop July 4–7 2006, Oxford University ALICE.
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
October Test Beam DAQ. Framework sketch Only DAQs subprograms works during spills Each subprogram produces an output each spill Each dependant subprogram.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
David Adams ATLAS AJDL: Abstract Job Description Language David Adams BNL June 29, 2004 PPDG Collaboration Meeting Williams Bay.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams Brookhaven National Laboratory February 13, 2006 CHEP06 Distributed Data Analysis.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
David Adams ATLAS Hybrid Event Store Integration with Athena/StoreGate David Adams BNL March 5, 2002 ATLAS Software Week Event Data Model and Detector.
LEMMA Test Beam Meeting
Planning next release of GAUDI
Presentation transcript:

David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk2 Contents Definitions Use cases Requirements Design Datasets Dataset interface Dataset implementation Status and conclusions

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk3 Definitions Dataset Collection of event data –Known event (beam crossing) ID’s –Same content (raw, reconstructed, summary,…) for each event –Known luminosity and selection criteria (including triggers) Suitable for extracting physical quantities (cross section, limit, etc.) –Or special data for calibration, alignment or monitoring detector performance

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk4 Definitions (cont) Large Too big to analyze from a single process –Today: 100 GB or more Analysis Loop over events and perform the same action on each –Select events –Visualize events –Fill histograms and tuples –Generate new event data?

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk5 Definitions (cont) Interactive Rapid response –Request processed in seconds, not hours Updates if the request is not finished quickly: –Partial results –Progress meter >% completed >Time to completion –Status visualization: what is being processed where –Able to terminate incomplete requests

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk6 Definitions (cont) Distributed Central process presents results to the user Processing carried out by multiple jobs Jobs on different machines and different sites Motivation: –Access remote data –Parallel processing for faster response

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk7 Use cases Event data specification User defines dataset –which events and which data in each event Includes version of data for each event – e.g. jets from reco version 14.2 instead of 13.1 Restrict visible content of each event –E.g. jets, not tracks –Reduces cost of data access Dataset use as input for processing Dataset can be recorded and recalled later

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk8 Use cases (cont) Event loop processing Event selection –User provides algorithm to be run on each event –Result determines if event is included in output dataset Fill histogram –User defines histogram and provides algorithm to fill from data for one event Fill tuple –Collection of named variables –User provides algorithm to fill 0-N times/event

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk9 Use cases (cont) Single event processing Fetch event –Data for selected event returned to user –User may request a subset of the event data Visualization –User defines a “view” –User specifies an event and the associated data is used to fill the view

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk10 Use cases (cont) Distributed processing Remote processing –Analysis program run on the local node –Data is located on a remote node –Job processing data runs on the remote node –User generates requests on the local node which are run on the remote node with results returned to the local node Parallel processing –Dataset divided by event and each dataset is processed in a separate process or thread

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk11 Use cases (cont) Distributed processing (cont) Multi-node processing –Previous processes are run on different compute nodes Multi-site processing –Previous processes are distributed over different sites GRID processing –Previous uses GRID for job specification, submission, authentication and monitoring

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk12 Requirements Use cases Satisfy the preceding use cases Interactivity Show status while a request is being processed Update status once/minute (adjustable) Return partial results on the same time scale Provide facility to abort a request

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk13 Requirements (cont) History Event selection –Identify and record the attributes (including code) for each event selection algorithm Dataset –Identify and record each dataset –Provide mechanism to recover the selection algorithm(s) used to construct a dataset

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk14 Design Dataset This description of a set of event data is the basis for all analysis –More on this later Analyzer User works in an analysis framework which provides the tools required to view and process histograms and tuples ROOT is one example

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk15 Design (cont) Task Specifies the operation to perform on each event including –Number of event selections to be performed –Histograms to be filled –Tuple to be filled –Code which makes selections and fills histograms and tuples

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk16 Design (cont) Application Description of the executable run by jobs Loops over events in a dataset Executes task on each to generate event result Merges successful event results to form a dataset result Specification includes –Application name >E.g. Athena or ROOT –Version or acceptable versions

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk17 Design (cont) Event result Flag indicating whether event was accepted for each event selection entry Histogram entries for each fill Tuple values for each fill Return status from task –Success or failure

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk18 Design (cont) Dataset result New dataset for each event selection –Old dataset plus list of ID’s for each event selection Filled histograms Filled tuples List of events for for which task processing was unsuccessful

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk19 Design (cont) Job scheduler Receives request (application, task and dataset) from analyzer May divide dataset into sub-datasets Creates or locates jobs with a matching application (and possibly task) Adds task to jobs if needed Passes a dataset to each job, invokes task and receives result Merges results and returns to analyzer

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk20 Design (cont) Analyzer Job 1 Job 2 ApplicationTask Dataset 1 Scheduler 1. create 2. create3. create 4. create 7. create(app,tsk) 5. submit(app,tsk,ds) 7. create(app,tsk) 6. split Dataset Dataset 2 6. create 8. submit(tsk,ds1) 8. submit(tsk,ds2)

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk21 Datasets Datasets provide interface and means for accessing event data Different types –Raw –Reconstructed –Summary –Tag Organized into EDO’s (event data objects) –Dataset does not see inside EDO Following plots give some examples

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk22 Datasets (cont)

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk23 Datasets (cont)

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk24 Datasets (cont) Not allowed Not allowed?

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk25 Dataset interface Event range Collection of event ID’s Content Collection of content ID’s Event data (event views) For each event ID-content ID pair: –A means to access the corresponding EDO or –A flag indicating the EDO is not included No other event data is included

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk26 Dataset interface (cont)

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk27 Dataset implementation Datasets are used in many ways Inspection by humans I/O for processing in C++ –And other languages Cataloging in DB’s Implementation Prefer something object oriented At present, C++ classes with XML persistence

David Adams ATLAS August 5, 2002DIAL BNL OMEGA talk28 Status and conclusions High-level design for DIAL is in place Described in this talk See Detailed design and first implementation of datasets is finished See