Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.

Slides:



Advertisements
Similar presentations
1 WP2: Data Management Paul Millar eScience All Hands Meeting September
Advertisements

WP2: Data Management Gavin McCance University of Glasgow November 5, 2001.
WP2: Data Management Gavin McCance University of Glasgow.
WP2 and GridPP UK Simulation W. H. Bell University of Glasgow EDG – WP2.
HEPiX GFAL and LCG data management Jean-Philippe Baud CERN/IT/GD.
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
DataGrid is a project funded by the European Commission under contract IST WP2 – R2.1 Overview of WP2 middleware as present in EDG 2.1 release.
Plateforme de Calcul pour les Sciences du Vivant SRB & gLite V. Breton.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June What is the RLS -
N° 1 LCG EDG Data Management Catalogs in LCG James Casey LCG Fellow, IT-DB Group, CERN
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
EU 2nd Year Review – Jan – WP9 WP9 Earth Observation Applications Demonstration Pedro Goncalves :
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
File and Object Replication in Data Grids Chin-Yi Tsai.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
EGEE Catalogs Peter Kunszt EGEE Data Management Middleware Service Grids NeSC, July 2004 EGEE is a project funded by the.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
1 WP2: Data Management Gavin McCance RAL Middleware Workshop 24 February 2003.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
EGEE is a project funded by the European Union under contract IST R-GMA: Production Services for Information and Monitoring in the Grid John.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
Metadata Mòrag Burgon-Lyon University of Glasgow.
CGW 04, Stripped replication for the grid environment as a web service1 Stripped replication for the Grid environment as a web service Marek Ciglan, Ondrej.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
© 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
NorduGrid plans and questions for gLite Marko Niinimaki, NorduGrid 3 rd EGEE meeting Athens, April 2005.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Data Management The European DataGrid Project Team
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
DataGrid is a project funded by the European Commission EDG Conference, Heidelberg, Sep 26 – Oct under contract IST OGSI and GT3 Initial.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
1 Data Management for Internet Backplane Protocol by Tang Ming Assoc/Prof. Francis Lee School of Computer Engineering, Nanyang Technological University,
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
DataGrid France 12 Feb – WP9 – n° 1 WP9 Earth Observation Applications.
Introduction to Data Management in EGI
POOL File Catalog: Design & Status
Grid Data Replication Kurt Stockinger Scientific Data Management Group Lawrence Berkeley National Laboratory.
gLite The EGEE Middleware Distribution
Presentation transcript:

Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid

Outline The need for the European DataGrid and replica mgt. Overview of replica management services Performance evaluation of services Future work – replica management in EGEE Conclusion

Why do we need a Grid? 100s MB/s data output -> several PB of data per year. Equivalent to 2 million CDs of data/year needing 20,000 PCs per exp to analyse. Distributed Grid computing…

The European DataGrid Ran from Jan 2001 – March 2004 Aim: to develop a Grid infrastructure for data-intensive scientific applications –High energy physics, biology and Earth observation producing several PB of data per year Developed Grid middleware for job, data and fabric management, information and monitoring

Grid Architecture Scope of EDG middleware Scope of EDG-WP2

Data Management Requirements: –Enable secure access to massive amounts of data in a global name space –Move and replicate data at high speed from one geographical site to another 1 st generation: GDMP + edg-replica- manager –Used Globus for secure file transfer –C++ based – gave basic replication functionality and cataloging

Data Management 2 nd generation – uses web services –Easy and standardised way to connect distributed services via XML Services include –Replica Manager Client main user interface –Replica Location Service stores physical locations of replicas –Replica Metadata Catalog stores logical file name mappings and metadata attributes –Replica Optimization Service provides optimised access to replicas –Security HTTPS + Globus’ GSI

Replica Location Service Implementation of RLS framework co-developed with Globus Maps unique identifier (GUID) to multiple replicas (SURLs) Local catalog (LRC) with distributed index (RLI) RLI LRC GUID:LRCs GUID:SURL soft-state update

Replica Metadata Catalog GUIDs are unfriendly and non-intuitive –guid:131f9940-f501-11d c9a66 Use user-definable Logical File Names –lfn:cal-test-data a RMC stores LFN:GUID mappings (n:1) Can also store ~10 metadata attributes –eg file owner, file size Together with RLS gives complete LFN:GUID:SURL view GUID SURL LFN RMC RLS

Replica Optimization Service Gives optimised access to replicas by choosing replicas with quickest access (based on network measurements) Automatically replicates files to sites on which they are needed Simulation research (OptorSim) continues to investigate more complex replica management strategies

Replica Manager Client-side tool acts as user interface to services (although services can also be accessed directly) Coordinates service interactions Interfaces with external services –information service (MDS, R-GMA) –storage services (SRM, EDG-SE) –file transfer services (GridFTP)

Implementation Servers written in Java, clients auto-generated (Java, c++ etc) from WSDL Web services run on Apache Axis inside Java servlet engine (Tomcat/Oracle AS) Use MySQL/Oracle as back-end DB to store persistent information RLS used already in production for LCG (Oracle AS/DB) –CMS Data Challenge 04 – 2 million entries stored

Service Interactions User Interface Replica Manager Replica Metadata Catalog Replica Optimization Service Replica Location Service Storage Element 1 Storage Element 2 1. replicateFile(LFN, SE2) 2. getGuid(LFN) 3. listReplicas(GUID) 4. listBestFile(SURLs, SE2) 5. copyFile(SE1, SE2) 6. registerFile(GUID, SURL) “Make a replica of the file specified by LFN to SE2”

RLS performance In production use, only single LRC used so far –Test performance using Java and c++ API to insert and query GUID:SURL mappings Java vs c++ insert Excellent query performance, c++ more stable than Java c++ query

RLS performance Using Java API and multiple concurrent threads Insert 500,000 mappings5 insert and 5 query threads Throughput peak ~20 threads, again stable query performance

Security Security adds significant overheads! Problem caused by new connection for each transaction Could be reduced by using bulk operations RLS InsertsSecure Client (s)Insecure Client(s)

RMC performance Test multiple LFNs per GUID and multiple metadata attributes c++ queryJava insert Scales well with no. of LFNs per GUID and no. of attributes

RMC Performance Command Line Interface: edg-rmc addAlias Time (s)Operation Start-up script and JVM start-up Parse command and options Get RMC service locator Get RMC object Call to rmc.addAlias() method 3.0End Very slow compared to API calls (2 orders of mag slower) Recommended for testing an installation only

The Future of EDG Services G-Lite - middleware (re)engineering and integration –using many concepts/experience from EDG –but geared towards service-oriented architecture EGEE - building production quality Grids Lessons learned from EDG: Less is more: stability and usability most important User interface and documentation difficult to get first time Need easy integration of different providers

EGEE Data Mgt Services Replica Manager -> Data Scheduler + Transfer Fetcher + File Placement Service + File Transfer Service From EGEE Middleware Architecture and Planning (Release 1.0) DJRA1.1

EGEE Data Mgt Services RLS + RMC -> Combined Catalog Interface to: File Catalog + Replica Catalog (+ Metadata Catalog) From EGEE Middleware Architecture and Planning (Release 1.0) DJRA1.1

Conclusion EDG WP2 has developed a set of integrated replica management services Can cope with demanding Grid conditions –already used in production environment A lot of concepts now being taken forward into EGEE project