Baseline Services Group Status Madison OSG Technology Roadmap Baseline Services 4 th May 2005 Based on Ian Bird’s presentation at Taipei Markus Schulz.

Slides:



Advertisements
Similar presentations
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Advertisements

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Summary of issues and questions raised. FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Enabling Grids for E-sciencE Overview of System Analysis Working Group Julia Andreeva CERN, WLCG Collaboration Workshop, Monitoring BOF session 23 January.
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
INFSO-RI Enabling Grids for E-sciencE DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
SRM workshop – September’05 1 SRM: Expt Reqts Nick Brook Revisit LCG baseline services working group Priorities & timescales Use case (from LHCb)
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
LCG EGEE is a project funded by the European Union under contract IST LCG PEB, 7 th June 2004 Prototype Middleware Status Update Frédéric Hemmer.
Data Access for Analysis Jeff Templon PDP Groep, NIKHEF A. Tsaregorodtsev, F. Carminati, D. Liko, R. Trompert GDB Meeting 8 march 2006.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Julia Andreeva, CERN IT-ES GDB Every experiment does evaluation of the site status and experiment activities at the site As a rule the state.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Julia Andreeva on behalf of the MND section MND review.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Baseline Services Group Report LHCC Referees Meeting 27 th June 2005 Ian Bird IT/GD, CERN.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Baseline Services Group GDB CERN 18 th May 2005 Ian Bird IT/GD, CERN.
Distributed Data Access Control Mechanisms and the SRM Peter Kunszt Manager Swiss Grid Initiative Swiss National Supercomputing Centre CSCS GGF Grid Data.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
The GridPP DIRAC project DIRAC for non-LHC communities.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
ARDA Massimo Lamanna / CERN Massimo Lamanna 2 TOC ARDA Workshop Post-workshop activities Milestones (already shown in December)
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Baseline Services Group Status of File Transfer Service discussions Storage Management Workshop 6 th April 2005 Ian Bird IT/GD.
Jean-Philippe Baud, IT-GD, CERN November 2007
David Kelsey CCLRC/RAL, UK
Baseline Services Group Report
Joint JRA1/JRA3/NA4 session
SRM Developers' Response to Enhancement Requests
LCG middleware and LHC experiments ARDA project
gLite The EGEE Middleware Distribution
Presentation transcript:

Baseline Services Group Status Madison OSG Technology Roadmap Baseline Services 4 th May 2005 Based on Ian Bird’s presentation at Taipei Markus Schulz IT/GD, CERN

LCG Baseline Services Working Group 2 Overview  Introduction & Status  Goals etc.  Membership  Meetings  Status of discussions  Baseline services  SRM  File Transfer Service  Catalogues and …  Future work  Outlook

LCG Baseline Services Working Group 3 Goals  Experiments and regional centres agree on baseline services  Support the computing models for the initial period of LHC  Thus must be in operation by September  The services concerned are those that  supplement the basic services  (e.g. provision of operating system services, local cluster scheduling, compilers,..)  and which are not already covered by other LCG groups  such as the Tier-0/1 Networking Group or the 3D Project.  Needed as input to the LCG TDR,  Report needed by end April 2005  Define services with targets for functionality & scalability/performance metrics.  Feasible within next 12 months  for post SC4 (May 2006), & fall- back solutions where not feasible  When the report is available the project must negotiate, where necessary, work programmes with the software providers.  Expose experiment plans and ideas Not a middleware group – focus on what the experiments need & how to provide it What is provided by the project, what by experiments? Where relevant an agreed fall-back solution should be specified – But fall backs must be available for the SC3 service in 2005.

LCG Baseline Services Working Group 4 Group Membership  ALICE: Latchezar Betev  ATLAS: Miguel Branco, Alessandro de Salvo  CMS: Peter Elmer, Stefano Lacaprara  LHCb: Philippe Charpentier, Andrei Tsaragorodtsev  ARDA: Julia Andreeva  Apps Area: Dirk Düllmann  gLite: Erwin Laure  Sites: Flavia Donno (It), Anders Waananen (Nordic), Steve Traylen (UK), Razvan Popescu, Ruth Pordes (US)  Chair: Ian Bird  Secretary: Markus Schulz

LCG Baseline Services Working Group 5 Communications  Mailing list:   Web site:   Including terminology – it was clear we all meant different things by “PFN”, “SURL” etc.  Agendas: (under PEB):   Presentations, minutes and reports are public and attached to the agenda pages

LCG Baseline Services Working Group 6 Overall Status  Initial meeting was 23 rd Feb  Have been held ~weekly (6 meetings)  Introduction – discussion of what baseline services are  Presentation of experiment plans/models on Storage management, file transfer, catalogues  SRM functionality and Reliable File Transfer  Set up sub-groups on these topics  Catalogue discussion – overview by experiment  Catalogues continued … in depth discussion of issues  [Preparation of this report], plan for next month  A lot of the discussion has been in getting a broad (common/shared) understanding of what the experiments are doing/planning and need  Not as simple as agreeing a service and writing down the interfaces!

LCG Baseline Services Working Group 7 Baseline services  Storage management services  Based on SRM as the interface  gridftp  Reliable file transfer service X File placement service – perhaps later  Grid catalogue services  Workload management  CE and batch systems seen as essential baseline services, ? WMS not necessarily by all  Grid monitoring tools and services  Focussed on job monitoring – basic level in common, WLM dependent part  VO management services  Clear need for VOMS – limited set of roles, subgroups  Applications software installation service  From discussions add:  Posix-like I/O service  local files, and include links to catalogues  VO agent framework See discussion in following slides  We have reached the following initial understanding on what should be regarded as baseline services

LCG Baseline Services Working Group 8 SRM  The need for SRM seems to be generally accepted by all  Jean-Philippe Baud presented the current status of SRM “standard” versions  Sub group formed (1 person per experiment + J-P) to look at defining a common sub set of functionality  ALICE: Latchezar Betev  ATLAS: Miguel Branco  CMS: Peter Elmer  LHCb: Philippe Charpentier  Expect to define an “LCG-required” SRM functionality set that must be implemented for all LCG sites (defined feature sets, like SRM-basic, SRM-advanced don’t fit)  May in addition have a set of optional functions  Input to Storage Management workshop

LCG Baseline Services Working Group 9 Status of SRM definition  SRM v1.1 insufficient – mainly lack of pinning  SRM v3 not required – and timescale too late  Require Volatile, Permanent space; Durable not practical  Global space reservation: reserve, release, update (mandatory LHCb, useful ATLAS,ALICE). Compactspace not needed  Permissions on directories mandatory  Prefer based on roles and not DN (SRM integrated with VOMS desirable but timescale?)  Directory functions (except mv) should be implemented asap  Pin/unpin high priority  srmGet Protocols useful but not mandatory  Abort, suspend, resume request : all low priority  Relative paths in SURL important for ATLAS, LHCb, not for ALICE  Duplication between srmcopy and a fts – need 1 reliable mechanism  Group of developers/users started regular meetings to monitor progress CMS input/comments not included yet

LCG Baseline Services Working Group 10 Reliable File Transfer  James Casey presented the thinking behind and status of the reliable file transfer service (in gLite)  Interface proposed is that of the gLite FTS  Agree that this seems a reasonable starting point  James has discussed with each of the experiment reps on details and how this might be used  Discussed in Storage Management Workshop in April  Members of sub-group  ALICE: Latchezar Betev  ATLAS: Miguel Branco  CMS: Lassi Tuura  LHCb: Andrei Tsaregorodtsev  LCG: James Casey fts: generic file transfer service FTS: gLite implementation

LCG Baseline Services Working Group 11 File transfer – experiment views Propose gLite FTS as proto-interface for a file transfer service: (see note drafted by the sub-group)  CMS:  Currently PhedEx used to transfer to CMS sites (inc Tier2), satisfies CMS needs for production and data challenge  Highest priority is to have lowest layer (gridftp, SRM), and other local infrastructure available and production quality. Remaining errors handled by PhedEx  Work on reliable fts should not detract from this, but integrating as service under PhedEx is not a considerable effort  ATLAS:  DQ implements a fts similar to this (gLite) and works across 3 grid flavours  Accept current gLite FTS interface (with current FIFO request queue). Willing to test prior to July.  Interface – DQ feed requests into FTS queue.  If these tests OK, would want to integrate experiment catalog interactions into the FTS

LCG Baseline Services Working Group 12 FTS summary – cont.  LHCb:  Have service with similar architecture, but with request stores at every site (queue for operations)  Would integrate with FTS by writing agents for VO specific actions (eg catalog), need VO agents at all sites  Central request store OK for now, having them at Tier 1s would allow scaling  Like to use in Sept for data created in challenge, would like resources in May(?) for integration and creation of agents  ALICE:  See fts layer as service that underlies data placement. Have used aiod for this in DC04.  Expect gLite FTS to be tested with other data management service in SC3 – ALICE will participate.  Expect implementation to allow for experiment-specific choices of higher level components like file catalogues

LCG Baseline Services Working Group 13 File transfer service - summary  Require base storage and transfer infrastructure (gridftp, SRM) to become available at high priority and demonstrate sufficient quality of service  All see value in more reliable transfer layer in longer term (relevance between 2 srms?)  But this could be srmCopy  As described the gLite FTS seems to satisfy current requirements and integrating would require modest effort  Experiments differ on urgency of fts due to differences in their current systems  Interaction with fts (e.g catalog access) – either in the experiment layer or integrating into FTS workflow  Regardless of transfer system deployed – need for experiment- specific components to run at both Tier1 and Tier2  Without a general service, inter-VO scheduling, bandwidth allocation, prioritisation, rapid address of security issues etc. would be difficult

LCG Baseline Services Working Group 14 fts – open issues  Interoperability with other fts’  interfaces  srmCopy vs file transfer service  Backup plan and timescale for component acceptance?  Timescale for decision for SC3 – end April  All experiments currently have an implementation  How to send a file to multiple destinations?  What agents are provided by default, as production agents, or as stubs for expts to extend?  VO specific agents at Tier 1 and Tier 2  This is not specific to fts

LCG Baseline Services Working Group 15 Catalogues  Subject of discussions over 3 meetings and iteration by between  LHCb and ALICE: relatively stable models  CMS and ATLAS: models still in flux  Generally:  All experiments have different views of catalogue models  Experiment dependent information is in experiment catalogues  All have some form of collection (datasets, …)  CMS – define fileblocks as ~TB unit of data management, datasets point to files contained in fileblocks  All have role-based security  May be used for more than just data files

LCG Baseline Services Working Group 16 Catalogues …  Tried to draw the understanding of the catalogue models (see following slides)  Very many issues and discussions arose during this iteration  Experiments updated drawings using common terminology to illustrate workflows  Drafted a set of questions to be answered by all experiments to build a common understanding of the models  Mappings, what, where, when  Workflows and needed interfaces  Query and update scenarios  Etc …  Status Status  ongoing

LCG Baseline Services Working Group 17 Alice Ali E n Catalogue Contains: LFN, GUID, SE index, SURL WN Input files LFN GUID SURL Input files LFN GUID SURL One central instance, high reliability WMS DMS Output Files LFN GUID SE index SURL AliEn API Alice Service Comments: Schema shows only FC relations The DMS implementation is hidden Ownership of files is set in the FC, underlying storage access management assured by a ‘single channel entry’ No difference between ‘production’ and ‘user’ jobs All jobs will have at least one input file in addition to the executable Synchronous catalog update required Comments: Schema shows only FC relations The DMS implementation is hidden Ownership of files is set in the FC, underlying storage access management assured by a ‘single channel entry’ No difference between ‘production’ and ‘user’ jobs All jobs will have at least one input file in addition to the executable Synchronous catalog update required Job flow diagram shown in:

LCG Baseline Services Working Group 18 LHCb FC LFN -> SURL LHCb FC LFN -> SURL LHCb FC LFN -> SURL LCG API LHCb (XML-RPC) LHCb Service/Agent SE One central instance, Local wanted DIRAC WMS No meta data, Only size. Date, etc. Query LFN  SURL DIRAC Transfer Agent data WN POOL WN POOL WN POOL LHCb BKDB Physics->LFN Metadata, provenance, LFNs  jobs  LFNs data updates Job provenance Output files LFNs (XML) Output files LFN  GUID  SURL DIRAC Job Agent Input files LFNs FC excerpt LHCb XML DIRAC BK Agent

LCG Baseline Services Working Group 19

LCG Baseline Services Working Group 20 Attempt to reuse same Grid catalogues for dataset catalogues (reuse mapping provided by interface as well as backend) ATLAS Interactions with catalogues POOL FC API ATLAS-API ATLAS Service OtherGrid API Internal Catalogues (many) Internal Catalogues (many) Dataset Catalogues Infrastructure ATLAS DQ WN SE data Local Replica Catalogue Local Replica Catalogue Local Replica Catalogue Local Replica Catalogue Register files Register datasets WN POOL SE data Local Replica Catalogue Local Replica Catalogue Queries Catalogue Exports LFN&GUID->SURL On each site: fault tolerant service with multiple back ends internal space management User defined metadata schemas Accept different catalogues and interfaces for different GRIDs but expect to impose POOL FC interface. Datasets, internal space management Replication Metadata Monitoring

LCG Baseline Services Working Group 21 Dataset Catalogues Infrastructure (prototype) Baseline requirement Possible interfaces?

LCG Baseline Services Working Group 22 Summary of catalogue needs  ALICE:  Central (Alien) file catalogue.  No requirement for replication  LHCb:  Central file catalogue; experiment bookkeeping  Will test Fireman and LFC as file catalogue – selection on functionality/performance  No need for replication or local catalogues until single central model fails  ATLAS:  Central dataset catalogue – will use grid-specific solution  Local site catalogues (this is their ONLY basic requiremnt) – will test solutions and select on performance/functionality (different on different grids)  CMS:  Central dataset catalogue (expect to be experiment provided)  Local site catalogues – or – mapping LFN  SURL; will test various solutions  No need for distributed catalogues;  Interest in replication of catalogues (3D project)

LCG Baseline Services Working Group 23 Some points on catalogues  All want access control  At directory level in the catalogue  Directories in the catalogue for all users  Small set of roles (admin, production, etc)  Access control on storage  clear statements that the storage systems must respect a single set of ACLs in identical ways no matter how the access is done (grid, local, Kerberos, …)  Users must always be mapped to the same storage user no matter how they address the service  Interfaces  Needed catalogue interfaces:  POOL  WMS (e.g. Data Location Interface /Storage Index – if want to talk to the RB)  gLite-I/O or other Posix-like I/O service

LCG Baseline Services Working Group 24 VO specific agents  VO-specific services/agents  Appeared in the discussions of fts, catalogs, etc.  This was subject of several long discussions – all experiments need the ability to run “long-lived agents” on a site  E.g. LHCb Dirac agents, ALICE: synchronous catalogue update agent  At Tier 1 and at Tier 2  how do they get machines for this, who runs it, can we make a generic service framework  GD will test with LHCb a CE without a batch queue as a potential solution

LCG Baseline Services Working Group 25 Summary  Will be hard to fully conclude on all areas in 1 month  Focus on most essential pieces  Produce report covering all areas – but some may have less detail  Seems to be some interest in continuing this forum in the longer term  In-depth technical discussions  …