The ARDA Project Prototypes for User Analysis on the GRID Dietrich Liko/CERN IT.

Slides:



Advertisements
Similar presentations
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Advertisements

Analysis demos from the experiments. Analysis demo session Introduction –General information and overview CMS demo (CRAB) –Georgia Karapostoli (Athens.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
Enabling Grids for E-sciencE Overview of System Analysis Working Group Julia Andreeva CERN, WLCG Collaboration Workshop, Monitoring BOF session 23 January.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Server - Metadata Services in gLite (+ ARDA DB Deployment Plans with Experiments)
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
ARDA Prototypes Andrew Maier CERN. ARDA WorkshopAndrew Maier, CERN2 Overview ARDA in a nutshell –Experiments –Middleware Experiment prototypes (basic.
Metadata Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
LCG ARDA status Massimo Lamanna 1 ARDA in a nutshell ARDA is an LCG project whose main activity is to enable LHC analysis on the grid ARDA is coherently.
INFSO-RI Enabling Grids for E-sciencE Experience of using gLite for analysis of ATLAS combined test beam data A. Zalite / PNPI.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences.
A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.
15 December 2015M. Lamanna “The ARDA project”1 The ARDA Project (meeting with the LCG referees) Massimo Lamanna CERN.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
ARDA Prototypes Julia Andreeva/CERN On behalf of the ARDA team CERN.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
K. Harrison CERN, 3rd March 2004 GANGA CONTRIBUTIONS TO ADA RELEASE IN MAY - Outline of Ganga project - Python support for AJDL - LCG analysis service.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Julia Andreeva on behalf of the MND section MND review.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
ANALYSIS TOOLS FOR THE LHC EXPERIMENTS Dietrich Liko / CERN IT.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
EGEE is a project funded by the European Union under contract IST “ARDA input for the testing coordination meeting” Massimo Lamanna CERN, 7.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community.
Ganga development - Theory and practice - Ganga 3 - Ganga 4 design - Ganga 4 components and framework - Conclusions K. Harrison CERN, 25th May 2005.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
EGEE is a project funded by the European Union under contract IST Feedback on the gLite middleware Dietrich Liko / IT - LCG ARDA Workshop,
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
INFSO-RI Enabling Grids for E-sciencE LCG ARDA project Status and plans Massimo Lamanna / CERN.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Metadata Services on the GRID
EGEE Review: Application Assessment of gLite
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
LCG middleware and LHC experiments ARDA project
Metadata Services on the GRID
Presentation transcript:

The ARDA Project Prototypes for User Analysis on the GRID Dietrich Liko/CERN IT

Frontiersciene 2005 Dietrich Liko 2 Overview ARDA in a nutshell ARDA prototypes  4 experiments ARDA feedback on middleware  Middleware components on the development test bed  ARDA Metadata Catalog Outlook and conclusions

Frontiersciene 2005 Dietrich Liko 3 The ARDA project ARDA is an LCG project  Main activity is to enable LHC analysis on the grid  ARDA is contributing to EGEE (NA4) Interface with the new EGEE middleware (gLite)  By construction, ARDA uses the new middleware  Verify the components in an analysis environments Contribution in the experiments framework (discussion, direct contribution, benchmarking,…) Users needed here. Namely physicists needing distributed computing to perform their analyses  Provide early and continuous feedback Activity extends naturally also to LCG  LCG is the production grid  Some gLite components are already part of LCG –See the presentation later

Frontiersciene 2005 Dietrich Liko 4 ARDA prototype overview LHC Experiment Main focusBasic prototype component /framework Middleware GUI to GridGANGA/DaVinci Interactive analysis PROOF/AliROOT High-level services DIAL/Athena Explore/exploit native gLite functionality ORCA

Frontiersciene 2005 Dietrich Liko 5 CMS ASAP = Arda Support for cms Analysis Processing  First version of the CMS analysis prototype capable of creating- submitting-monitoring of the CMS analysis jobs on the gLite middleware had been developed by the end of the year 2004  Prototype was evolved to support both RB versions deployed at the CERN testbed (prototype task queue and gLite 1.0 WMS )  Currently submission to both RBs is available and completely transparent for the users (same configuration file, same functionality)  Supports also current LCG

Frontiersciene 2005 Dietrich Liko 6 Starting point for users The user is familiar with the experiment application needed to perform the analysis (ORCA application for CMS) The user debugged the executable on small data samples, on a local computer or computing services (e.g. lxplus at CERN) How to go for larger samples, which can be located at any regional center CMS-wide? The users should not be forced :  to change anything in the compiled code  to change anything in the configuration file for ORCA  to know where the data samples are located

Frontiersciene 2005 Dietrich Liko 7 ASAP work and information flow RefDBPubDB ASAP UI Monalisa gLite JDL Job monitoring directory ASAP Job Monitoring service Publishing Job status On the WEB Delegates user credentials using MyProxy Job submission Checking job status Resubmission in case of failure Fetching results Storing results to Castor Output files location Application,applicationversion, Executable, Orca data cards Data sample, Working directory, Castor directory to save output, Number of events to be processed Number of events per job Job running on the Worker Node

Frontiersciene 2005 Dietrich Liko 8 Job Monitoring ASAP Monitor

Frontiersciene 2005 Dietrich Liko 9 Merging the results

Frontiersciene 2005 Dietrich Liko 10 Integration Development is now coordinated with the EGEE/LCG Taskforce Key ASAP components will be merged and migrated with the CMS mainstream tools as BOSS and CRAB. Selected features of ASAP will be implemented separated  Task monitor: correlation/presentation of information from different sources  Task manager: control level to provide disconnected operation (submission, resubmission,…) Further contribtions  Dashboard  MonAlisa Monitoring

Frontiersciene 2005 Dietrich Liko 11 ATLAS Main Activities during last year  DIAL to gLite scheduler  Analysis Jobs with the ATLAS production system  GANGA (Principal component of the LHCb prototype, but also part of ATLAS DA) Other issues addressed  AMI tests and interaction  ATCom Production and CTB tools  Job submission (ATHENA jobs)  Integration of the gLite Data Management within Don Quijote  Active participation in several ATLAS reviews  First look on interactivity/resiliency issues (DIANE) Currently working on redefining the ATLAS Distributed Analysis strategy  On the basis of the ATLAS Production system

Frontiersciene 2005 Dietrich Liko 12 Combined Test Beam Example: ATLAS TRT data analysis done by PNPI St Petersburg Number of straw hits per layer Real data processed at gLite Standard Athena for testbeam Data from CASTOR Processed on gLite worker node

Frontiersciene 2005 Dietrich Liko 13 Production system

Frontiersciene 2005 Dietrich Liko 14 Analysis jobs Characteristics  Central database  Don Quijote Data mangement  Connects to several grid infrastructures LCG OSG Nordugrid Analysis jobs have been demonstrated together with our colleagues from the production system Check out the poster

Frontiersciene 2005 Dietrich Liko 15 DIANE Was already mentioned today. Being integrated with GANGA

Frontiersciene 2005 Dietrich Liko 16 DIANE on gLite running Athena

Frontiersciene 2005 Dietrich Liko 17 Further plans New assessment of ATLAS Distributed Analysis after the review  ARDA has now a coordinating role for ATLAS Distributed Analysis Close collaboration with ATLAS production system and LCG/EGEE taskforce Close collaboration with GANGA and GridPP New players: Panda  OSG effort for Production and Distributed Analysis

Frontiersciene 2005 Dietrich Liko 18 LHCb Prototype is GANGA – A GUI for the GRID GANGA by itself is a joint project between ATLAS and LHCb In LHCb DIRAC, the LHCb production system, is used as a backend to run analysis jobs More details on the Poster

Frontiersciene 2005 Dietrich Liko 19 What is GANGA ? AtlasPROD DIAL DIRAC LCG2 gLite localhost LSF submit, kill get output update status store & retrieve job definition prepare, configure Ganga4 Job scripts Gaudi Athena AtlasPROD DIAL DIRAC LCG2 gLite localhost LSF + split, merge, monitor, dataset selection

Frontiersciene 2005 Dietrich Liko 20 GANGA 3 - The current release The current release (version 3) is a GUI Application

Frontiersciene 2005 Dietrich Liko 21 Architecture for Version 4 Client Ganga.Core GPI GUI CLIP Job Repository File Workspace IN/OUT SANDBOX AtlasPR DIAL DIRAC LCG2 gLite localhost LSF Athena Gaudi Plugin Modules Monitoring Scripts

Frontiersciene 2005 Dietrich Liko 22 ALICE prototype ROOT and PROOF ALICE provides  the UI  the analysis application (AliROOT) GRID middleware gLite provides all the rest ARDA/ALICE is evolving the ALICE analysis system UI shell Application Middleware end to end

Frontiersciene 2005 Dietrich Liko 23 USER SESSION PROOF PROOF SLAVES PROOF MASTER SERVER PROOF SLAVES Site A Site C Site B PROOF SLAVES Demo based on a hybrid system using 2004 prototype

Frontiersciene 2005 Dietrich Liko 24 ARDA shell + C/C++ API C++ access library for gLite has been developed by ARDA High performance Protocol quite proprietary... Essential for the ALICE prototype Generic enough for general use Using this API grid commands have been added seamlessly to the standard shell

Frontiersciene 2005 Dietrich Liko 25 Current Status Developed gLite C++ API and API Service  providing generic interface to any GRID service C++ API is integrated into ROOT  In the ROOT CVS  job submission and job status query for batch analysis can be done from inside ROOT Bash interface for gLite commands with catalogue expansion is developed  More powerful than the original shell  In use in ALICE  Considered a “generic” mw contribution (essential for ALICE, interesting in general) First version of the interactive analysis prototype ready Batch analysis model is improved  submission and status query are integrated into ROOT  job splitting based on XML query files  application (Aliroot) reads file using xrootd without prestaging

Frontiersciene 2005 Dietrich Liko 26 Feedback to gLite 2004:  Prototype available (CERN + Madison Wisconsin)  A lot of activity (4 experiments prototypes)  Main limitation: size Experiments data available! Just an handful of worker nodes  2005:  Coherent move to prepare a gLite package to be deployed on the pre- production service ARDA contribution: Mentoring and tutorial Actual tests!  Lot of testing during 05Q1  PreProduction Service is about to start! Access granted on May 18 th 2004!

Frontiersciene 2005 Dietrich Liko 27 Data Management Central component  Early tests started in 2004 Two main components:  gLiteIO (protocol + server to access the data)  FiReMan (file catalogue) Both LFC and FiReMan offer large improvements over RLS  LFC is the most recent LCG2 catalogue Still some issues remaining:  Scalability of FiReMan  Bulk Entry for LFC missing  More work needed to understand performance and bottlenecks  Need to test some real Use Cases  In general, the validation of DM tools takes time! Reference – Presentation at ACAT 05, DESY Zeuthen, Germany

Frontiersciene 2005 Dietrich Liko 28 Query Rate for an LFN Entries Returned / Second Number Of Threads Fireman Single Fireman Bulk 1 Fireman Bulk 10 Fireman Bulk 100 Fireman Bulk 500 Fireman Bulk 1000 Fireman Bulk 5000

Frontiersciene 2005 Dietrich Liko 29 Comparison with LFC Entries Returned / Second Number Of Threads Fireman - Single Entry Fireman - Bulk 100 LFC

Frontiersciene 2005 Dietrich Liko 30 Workload Management A systematic evaluation of the WMS performance in terms of the  job submission rate (UI - RB)  job dispatching rate (RB - CE) The first measurement has been done on both gLite prototype and LCG2 in the context of ATLAS; however, the test scenario is generic to all experiments  Simple helloWorld job without any InputSandbox  Single client, multi-thread job submission  Monitoring the overall Resource Broker (RB) loading as well as the CPU/memory usages of each individual service on RB. Continuing the evaluations on the effects of  Logging and Bookkeeping (L&B) loading  InputSandbox  gLite bulk submission feature Reference: Performance Test Plan.doc Performance Test Plan.doc

Frontiersciene 2005 Dietrich Liko 31 WMS Performance Test 3000 helloWorld jobs are submitted by 3 threads from the LCG UI in Taiwan Submission rate ~ 0.15 jobs/sec (6.6 sec/job) After about 100 sec, the first job reaches the done status Failure rate ~ 20 % (RetryCount = 0)

Frontiersciene 2005 Dietrich Liko 32 Effects of loading the Logging and Bookkeeping 3000 helloWorld jobs are submitted by 3 threads from the LCG UI in Taiwan In parallel with job submission, the L&B is also loaded up to 50 % CPU usage in 3 stages by multi- thread L&B queries from another UI Slowing down the job submission rate (from 0.15 jobs/sec to jobs/sec) Failure rate is stable to ~ 20 % (RetryCount = 0)

Frontiersciene 2005 Dietrich Liko 33 Effects of Input Sandbox 3000 jobs with InputSandbox are submitted by 3 threads from the LCG UI in Taiwan InputSandbox is taken from the ATLAS production job (~ 36 KBytes per job) Slowing down the job submission rate (from 0.15 jobs/sec to 0.08 jobs/sec)

Frontiersciene 2005 Dietrich Liko 34 gLite (v1.3) Bulk Submission 30 helloWorld jobs are submitted by 3 threads on LCG2 and gLite prototype. The comparison between LCG2 and gLite is unfair due to the hardware differences between the RBs. On gLite, the bulk submission rate is about 3 times faster than the non-bulk submission.

Frontiersciene 2005 Dietrich Liko 35 AMGA - Metadata services on the Grid Simple database for use on the GRID  Key value pairs  GSI security gLite has provided a prototype for the EGEE Biomed communit  ARDA (HEP) Requirements were not all satisfied by that early version Discussion in LCG and EGEE and UK GridPP Metadata group Testing of existing implementations in experiments Technology investigation ARDA Prototype  AMGA is now part of gLite Release Reference:

Frontiersciene 2005 Dietrich Liko 36 ARDA Implementation Prototype  Validate our ideas and expose a concrete example to interested parties Multiple back ends  Currently: Oracle, PostgreSQL, MySQL, SQLite Dual front ends  TCP Streaming Chosen for performance  SOAP Formal requirement of EGEE Compare SOAP with TCP Streaming Also implemented as standalone Python library  Data stored on the file system

Frontiersciene 2005 Dietrich Liko 37 Dual Front End Text based protocol Data streamed to client in single connection Implementations  Server – C++, multiprocess  Clients – C++, Java, Python, Perl, Ruby Most operations are SOAP calls Based on iterators  Session created  Return initial chunk of data and session token  Subsequent request: client calls nextQuery() using session token  Session closed when: End of data Client calls endQuery() Client timeout Implementations  Server – gSOAP (C++).  Clients – Tested WSDL with gSOAP, ZSI (Python), AXIS (Java) Clean way to study performance implications of protocols…

Frontiersciene 2005 Dietrich Liko 38 More data coming… Test protocol performance  No work done on the backend  Switched 100Mbits LAN Language comparison  TCP-S with similar performance in all languages  SOAP performance varies strongly with toolkit Protocols comparison  Keep alive improves performance significantly  On Java and Python, SOAP is several times slower than TCP-S Measure scalability of protocols  Switched 100Mbits LAN TCP-S 3x faster than gSoap (with keepalive) Poor performance without keepalive  Around ops/sec (both gSOAP and TCP-S) 1000 pings

Frontiersciene 2005 Dietrich Liko 39 Current Uses of AMGA Evaluated by LHCb bookkeeping  Migrated bookkeeping metadata to ARDA prototype 20M entries, 15 GB  Interface found to be complete  ARDA prototype showing good scalability Ganga (LHCb, ATLAS)  User analysis job management system  Stores job status on ARDA prototype  Highly dynamic metadata AMGA is now part of gLite Release Integrated with LFC (works side by side)

Frontiersciene 2005 Dietrich Liko 40 Summary Experiment prototypes  CMS: ASAP – now being integrated  ATLAS: DIAL - move now to Production System  LHCb: GANGA  ALICE: PROOF Feedback to the Middleware  Data management  Workload Management AMGA Metadata catalog now part of gLite