RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June 2004 - What is the RLS -

Slides:



Advertisements
Similar presentations
1 WP2: Data Management Paul Millar eScience All Hands Meeting September
Advertisements

WP2 and GridPP UK Simulation W. H. Bell University of Glasgow EDG – WP2.
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
DataGrid is a project funded by the European Commission under contract IST WP2 – R2.1 Overview of WP2 middleware as present in EDG 2.1 release.
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
N° 1 LCG EDG Data Management Catalogs in LCG James Casey LCG Fellow, IT-DB Group, CERN
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 1: Introduction to Windows Server 2003.
D. Duellmann, CERN Data Management at the LHC1 Data Management at CERN’s Large Hadron Collider (LHC) Dirk Düllmann CERN IT/DB, Switzerland
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
RLS Tier-1 Deployment James Casey, PPARC-LCG Fellow, CERN 10 th GridPP Meeting, CERN, 3 rd June 2004.
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
Workshop Summary (my impressions at least) Dirk Duellmann, CERN IT LCG Database Deployment & Persistency Workshop.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
LCG Application Area Internal Review Persistency Framework - Project Overview Dirk Duellmann, CERN IT and
Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
3D Workshop Outline & Goals Dirk Düllmann, CERN IT More details at
An Agile Service Deployment Framework and its Application Quattor System Management Tool and HyperV Virtualisation applied to CASTOR Hierarchical Storage.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
LCG LHC Computing Grid Project Oracle-based Production Services for LCG 1 Jamie Shiers IT Division, CERN
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Database authentication in CORAL and COOL Database authentication in CORAL and COOL Giacomo Govi Giacomo Govi CERN IT/PSS CERN IT/PSS On behalf of the.
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Pool Project and ROOT I/O Dirk Duellmann What is Pool? Component Breakdown Status and Plans.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Site Services and Policies Summary Dirk Düllmann, CERN IT More details at
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
PSM, Database requirements for POOL (File catalog performance requirements) Maria Girone, IT-DB Strongly based on input from experiments: subject.
POOL & ARDA / EGEE POOL Plans for 2004 ARDA / EGEE integration Dirk Düllmann, IT-DB & LCG-POOL LCG workshop, 24 March 2004.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
A quick summary and some ideas for the 2005 work plan Dirk Düllmann, CERN IT More details at
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
CERN - IT Department CH-1211 Genève 23 Switzerland t Service Level & Responsibilities Dirk Düllmann LCG 3D Database Workshop September,
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 6 April 2005.
(on behalf of the POOL team)
U.S. ATLAS Grid Production Experience
IT-DB Physics Services Planning for LHC start-up
LCG 3D Distributed Deployment of Databases
Database Services at CERN Status Update
3D Application Tests Application test proposals
Database Readiness Workshop Intro & Goals
POOL: Component Overview and use of the File Catalog
POOL File Catalog: Design & Status
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
POOL/RLS Experience Current CMS Data Challenges shows clear problems wrt to the use of RLS Partially due to the normal “learning curve” on all sides in.
Presentation transcript:

RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June What is the RLS - RLS and POOL - Service Overview - Experience in Data Challenges - Towards a Distributed RLS - Summary

Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone What is the RLS The LCG Replica Location Service (LCG-RLS) is the central Grid File Catalog, responsible for maintaining a consistent list of accessible files (physical and logical names) together with their relevant file metadata attributes The RLS (and POOL) refers to files via a unique and immutable file identifier, (FileID) generated at creation time Stable inter-file reference LFN1PFN1 LFN2 LFNn PFN2 PFNn File metadata (jobid, owner, …)

Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone POOL and the LCG-RLS POOL is the LCG Persistency Framework See talk from Radovan Chytracek The LCG-RLS is one of the three POOL File Catalog implementations XML based local file catalog MySQL based shared catalog RLS based Grid-aware file catalog A complete production chain deploys several of these Cascading changes from isolated worker nodes (XML catalog) up to the RLS service DC04 used MySQL catalog at Tier1, RLS at Tier0 RLS deployment at Tier1 sites See talk from James Casey

Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone RLS Service Goals RLS is a critical service for the correct operation of the Grid! Minimal downtime for both scheduled and unscheduled interruptions Good level of availability at iAS and DB level Meet requirements of Data Challenges In terms of performance (look-up / insert rate) and capacity (total number of GUID-PFN mappings and file-level meta-data entries) Currently, the performance is not limited by the service itself Prepare for future needs and increase reliability/ manageability

Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone RLS Service Overview Currently deploys LRC and RMC middleware components from EDG Distributed Replica Location Index not deployed in LCG-2 For now, a central service deployed at CERN RLS uses Oracle Application Server (iAS) and Database (DB) Dedicated farm node (iAS) per VO Shared disk server (DB) for production VOs Similar set-up is used for testing and software certification RLS AppServers (production) RLS AppServers (certification) RLS AppServers (test) production RLS DB (certification) RLS DB (test) spare ALICE ATLAS CMS LHCb DTEAM RLS DB (production)

Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone Handling Interventions High level – ‘run like an experiment’: On-call team; primary responsible and backup Documented procedures, training for on-call personnel, daily meetings List of experts to call in case standard actions do not work Planning of interventions Most frequent: security patches iAS: can transparently switch to new box using DNS alias change Used for both scheduled and unscheduled interruptions DB: short interruption to move to ‘stand-by’ DB Total up-time achieved: 99.91% Looking at Standard Oracle solutions for High Availability: iAS clusters and DB clusters Data Guard (for data protection)

Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone Experience in Data Challenges The RLS was used for the first time in production during the CMS Data Challenge DC04 (3M PFNs and file metadata stored) ATLAS and LHCb ramping up The service was stable throughout DC04 Looking up file information by GUID seems sufficiently fast Clear problems wrt to the performance of the RLS Partially due to the normal “learning curve” on all sides in using a new system Bulk operations were missing in the deployed RLS version Also, cross-catalog queries are not efficient by RLS design Several solutions produced ‘in flight’ EDG-based tools, POOL workarounds Support for bulk operations now addressed by IT-GD (in edg-rls v2.2.7). POOL will support it in the next release (POOL V1.7)

Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone Towards a Distributed RLS RLS in LCG-2 still lacks consistent replication between multiple catalog servers EDG RLI component has not been deployed as part of LCG Central single catalog expected to result in scalability and availability problems Joint evaluation with CMS of Oracle asynchronous database replication as part of DC04 (in parallel to production) Tested a minimal (two node) multi-master system between CERN and CNAF Catalog inserts/update propagated in both directions First Results RLS application could be deployed with only minor changes No stability and performance problems observed so far Network problems and temporary server unavailability were handled gracefully Setup could not unfortunately be tested in full production mode in DC04 due to lack of time/resource

Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone Next Generation RLS LCG Grid Deployment group is currently working with the experiments to gather requirements for the next generation RLS Taking into account the experience from DC04 Build on DC04 work: move to replicated rather distributed catalogs? Still need to prove Stability and performance with production access patterns Scaling to a sufficient number of replicas (4-6 Tier1 sites?) Automated resolution of catalog conflicts that may arise as consequence of asynchronous replication Propose to continue evaluation, possibly using Oracle streams in the context of the Distributed Database Deployment activity, in the LCG deployment area

Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone Summary The Replica Location Service is a central part of the LCG infrastructure Strong requirements in terms of reliability of the service Significant contribution from GridPP funded people The LCG-RLS middleware and service have passed there first production test Good service stability was achieved Experience in Data Challenge proven to be essential for improving performance and scalability of the RLS middleware Oracle replication tests are expected to provide important input to define replicated RLS and handling of distributed metadata in general

Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone The RLS Supported Configuration A “Local Replica Catalogue” (LRC) Contains GUID PFN mapping for all local files A “Replica Metadata Catalogue” (RMC) Contains GUID LFN mapping for all local files and all file metadata information A “Replica Location Index” (RLI) <-- Not deployed in LCG-2 Allows files at other sites to be found All LRCs are configured to publish to all remote RLIs