1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.

Slides:



Advertisements
Similar presentations
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Advertisements

EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Job Submission The European DataGrid Project Team
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Final Exercise Group 9 Daniel Tiggemann Carlos Borrego Shiv Kaushal Luigi Dini Oisin Curran Eleana Asimakopoulou The 3 rd International Summer School.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
INFSO-RI Enabling Grids for E-sciencE Comparison of LCG-2 and gLite Author E.Slabospitskaya Location IHEP.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
LCG-France, 22 July 2004, CERN1 LHCb Data Challenge 2004 A.Tsaregorodtsev, CPPM, Marseille LCG-France Meeting, 22 July 2004, CERN.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
LHCb week, 27 May 2004, CERN1 Using services in DIRAC A.Tsaregorodtsev, CPPM, Marseille 2 nd ARDA Workshop, June 2004, CERN.
Grid Workload Management Massimo Sgaravatto INFN Padova.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
Metadata Mòrag Burgon-Lyon University of Glasgow.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
LCG ARDA status Massimo Lamanna 1 ARDA in a nutshell ARDA is an LCG project whose main activity is to enable LHC analysis on the grid ARDA is coherently.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
EGEE is a project funded by the European Union under contract IST “Interfacing to the gLite Prototype” Andrew Maier / CERN LCG-SC2, 13 August.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
CHEP 2006, February 2006, Mumbai 1 LHCb use of batch systems A.Tsaregorodtsev, CPPM, Marseille HEPiX 2006, 4 April 2006, Rome.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
Transformation System report Luisa Arrabito 1, Federico Stagni 2 1) LUPM CNRS/IN2P3, France 2) CERN 5 th DIRAC User Workshop 27 th – 29 th May 2015, Ferrara.
1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
DIRAC Review (12 th December 2005)Stuart K. Paterson1 DIRAC Review Workload Management System.
Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
EGEE is a project funded by the European Union under contract IST Package Manager Predrag Buncic JRA1 ARDA 21/10/04
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
1 LHCb view on Baseline Services A.Tsaregorodtsev, CPPM, Marseille Ph.Charpentier CERN Baseline Services WG, 4 March 2005, CERN.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
1 DIRAC agents A.Tsaregorodtsev, CPPM, Marseille ARDA Workshop, 7 March 2005, CERN.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
GAG meeting, 5 July 2004, CERN1 LHCb Data Challenge 2004 A.Tsaregorodtsev, Marseille N. Brook, Bristol/CERN GAG Meeting, 5 July 2004, CERN.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Overview of gLite, the EGEE middleware Mike Mineter Training Outreach Education National.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
1 DIRAC Project Status A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille 10 March, DIRAC Developer meeting.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Introduction to Grid Technology
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Grid Deployment Board meeting, 8 November 2006, CERN
LCG middleware and LHC experiments ARDA project
gLite The EGEE Middleware Distribution
Presentation transcript:

1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked quite a lot of interest: –In experiments; –SC2 GTA group; –EGEE Personal opinion - this the right concept that arrived at the right moment: –Experiments need practical systems; –EDG is not capable to provide one; –Need for pragmatic, scalable solution without having to start from scratch. Some buzz words, sorry…

2 Information Service Authentication Authorisation Auditing Grid Monitoring Workload Management Metadata Catalogue File Catalogue Data Management Computing Element Storage Element Job Monitor Job Provenance Package Manager DB Proxy User Interface API Accounting 7: 12: 5: 13: 8: 15: 11: 9: 10: 1: 4: 2: 3: 6: 14: Tentative ARDA architecture Discuss more these parts in the following

3 Metadata catalogue (Bookkeeping database) (1) LHCb Bookkeeping:  Very flexible schema ;  Storing objects (jobs, qualities, others ?) ;  Available as a services (XML-RPC interface) ;  Basic schema is not efficient for generic queries:  need to build predefined views (nightly ?) ;  views fit the query by the web form, what about generic queries ?  data is not available immediately after production ; Needs further development, thinking, searching …

4 Metadata catalogue (Bookkeeping database) (2) Possible evolution: keep strong points, work on weaker ones Introduce hierarchical structure –HEPCAL recommendation for eventual DMC ; –AliEn experience ; Better study sharing parameters between the job and file objects ; Other possible ideas. This is a critical area – worth investigation ! But… (see next slide)

5 Metadata catalogue (Bookkeeping database) (3) Man power problems : –For development, but also for maintenance ; We need the best possible solution : –Evaluate other solutions: AliEn, DMC eventually ; Contribute to the DMC development ; –LHCb Bookkeeping as a standard service : Replaceable if necessary ; Fair test of other solutions.

6 Metadata catalogue (Bookkeeping database) (4) Some work has started in Marseille: –AliEn FileCatalogue installed and populated by the information from the LHCb Bookkeeping ; –Some query efficiencies measurements done ; –The results are not yet conclusive : Clearly fast if the search in the hierarchy of directories; Not so fast if more tags are included in the query; Very poor machine used in CPPM – not fair to compare to the CERN Oracle server. –The work started to provide single interface to both AliEn FileCatalogue and LHCb Bookkeeping ; How to continue: –CERN group – possibilities to contribute ? –CPPM group will continue to follow this line, but resources are limited ; –Collaboration with other projects is essential;

7 File Catalogue (Replica database) (1) The LHCb Bookkeeping was not conceived with the replica management in mind – added later ; File Catalog needed for many purposes: –Data; –Software distribution; –Temporary files (job logs, stdout, stderr, etc); –Input/Output sandboxes ; –Etc, etc Absolutely necessary for DC2004; File Catalog must provide controlled access to its data (private group, user directories) ; In fact we need a full analogue of a distributed file system

8 File Catalogue (Replica database) (2) We should look around for possible solutions : –Existing ones (AliEn, RLS) : Will have Grid services wrapping soon ; Will comply with the ARDA architecture eventually; Large development teams behind (RLS, EGEE ?) This should be coupled with the whole range of the data management tools: –Browsers; –Data transfers, both scheduled and on demand; –I/O API (POOL, user interface). This is a huge enterprise, and we should rely on using one of the available systems

9 File Catalogue (Replica database) (3) Suggestion is to start with the deployment of the AliEn FileCatalogue and data management tools: –Partly done; –Pythonify the AliEn API: This will allow developing GANGA and other application plugins; Should be easy as the C++ API (almost) exist. –Should be interfaced with the DIRAC workload management (see below); –Who ? CPPM group, others are very welcome; –Where ? Install the server at CERN. Follow the evolution of the File Catalogue Grid services (RLS team will not yield easily !); This is a huge enterprise, we should rely on using one of the available systems

10 Workload management (1) The present production service is OK for the simulation production tasks ; We need more: –Data Reprocessing in production (planned); –User analysis (spurious); –Flexible policies: Quotas; Accounting; –Flexible job optimizations (splitting, input prefetching, output merging, etc) ; –Flexible job preparation (UI) tools ; –Various job monitors (web portals, GANGA plugins, report generators, etc); –Job interactivity; –…

11 Workload management (2) Possibilities to choose from: 1.Develop the existing service ; 2.Use another existing service ; 3.Start developing the new one. Suggestion – a mixture of all of choices: –Start developing the new workload management service using existing agents based infrastructure and borrowing some ideas from the AliEn workload management: Already started actually (V. Garonne); First prototype expected next week; Will also try OGSI wrapper for it (Ian Stokes-Rees); –Keep the existing service as jobs provider for the new one.

12 Workload management architecture Job Receiver Job Receiver Job DB Optimizer 1 Match Maker Match Maker Job queue Agent 1 Agent 2 Agent 3 CE 1 CE 2 CE 3 Workload management GANGA Production service Production service Command line UI Command line UI Site

13 Workload management (3) Technology: –JDL job description; –Condor Classad library for matchmaking; –MySQL for Job DB and Job Queues; –SOAP (OGSI) external interface; –SOAP and/or Jabber internal interfaces; –Python as development language; –Linux as deployment platform. Dependencies: –File catalog and Data management tools: Input/Output sandboxes; –CE DIRAC CE ; EDG CE wrapper.

14 Conclusions Most experiment dependant services are to be developed within the DIRAC project: –MetaCatalog (Job Metadata Catalog); –Workload management (with experiment specific policies and optimizations); –Can be eventually our contribution to the common pool of services. Get other services from the emerging Grid services market: –Security/Authentication/Authorization, FileCatalog, DataMgmt, SE, CE, Information,… Aim at having DC2004 done with the new (ARDA) services based architecture –Should be ready for deployment January 2004