HEPiX Storage, Edinburgh 27-28 May 2004 SE Experiences Supporting Multiple Interfaces to Mass Storage J Jensen

Slides:



Advertisements
Similar presentations
30-31 Jan 2003J G Jensen, RAL/WP5 Storage Elephant Grid Access to Mass Storage.
Advertisements

HEPiX GFAL and LCG data management Jean-Philippe Baud CERN/IT/GD.
Jens G Jensen CCLRC/RAL hepsysman 2005Storage Middleware SRM 2.1 issues hepsysman Oxford 5 Dec 2005.
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
1 CHEP 2000, Roberto Barbera Tests of data management services in EDG 1.2 ALICE Off-line Week,
EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
CASTOR SRM v1.1 experience Presentation at SRM meeting 01/09/2004, Berkeley Olof Bärring, CERN-IT.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
10 May 2007 HTTP - - User data via HTTP(S) Andrew McNab University of Manchester.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Andrew McNab - SlashGrid, HTTPS, fileGridSite SlashGrid, HTTPS and fileGridSite 30 October 2002 Andrew McNab, University of Manchester
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
The GridSite Security System Andrew McNab and Shiv Kaushal University of Manchester.
File and Object Replication in Data Grids Chin-Yi Tsai.
Data Management The GSM-WG Perspective. Background SRM is the Storage Resource Manager A Control protocol for Mass Storage Systems Standard protocol:
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Δ Storage Middleware GridPP10 What’s new since GridPP9? CERN, June 2004.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
GridSite Web Servers for bulk file transfers & storage Andrew McNab Grid Security Research Fellow University of Manchester, UK.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
Stephen Burke – Data Management - 3/9/02 Partner Logo Data Management Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF.
SRM & SE Jens G Jensen WP5 ATF, December Collaborators Rutherford Appleton (ATLAS datastore) CERN (CASTOR) Fermilab Jefferson Lab Lawrence Berkeley.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
CERN SRM Development Benjamin Coutourier Shaun de Witt CHEP06 - Mumbai.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Andrew McNab - HTTP/HTTPS extensions HTTP/HTTPS as Grid data transport 6 March 2003 Andrew McNab, University of Manchester
Author - Title- Date - n° 1 Partner Logo WP5 Status John Gordon Budapest September 2002.
SRM-2 Road Map and CASTOR Certification Shaun de Witt 3/3/08.
SESEC Storage Element (In)Security hepsysman, RAL 0-1 July 2009 Jens Jensen.
DMLite GridFTP frontend Andrey Kiryanov IT/SDC 13/12/2013.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
GridPP2 Data Management work area J Jensen / RAL GridPP2 Data Management Work Area – Part 2 Mass storage & local storage mgmt J Jensen
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
Storage Element Security Jens G Jensen, WP5 Barcelona, May 2003.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,
User Domain Storage Elements SURL  TURL LFC Domain (LCG File Catalogue) SA1 – Data Grid Interoperation Enabling Grids for E-sciencE EGEE-III INFSO-RI
J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen
EGEE Data Management Services
Jean-Philippe Baud, IT-GD, CERN November 2007
gLite Basic APIs Christos Filippidis
StoRM: a SRM solution for disk based storage systems
Vincenzo Spinoso EGI.eu/INFN
Data Bridge Solving diverse data access in scientific applications
John Gordon EDG Conference Barcelona, May 2003
SRM v2.2 / v3 meeting report SRM v2.2 meeting Aug. 29
Introduction to Data Management in EGI
SRM2 Migration Strategy
GFAL 2.0 Devresse Adrien CERN lcgutil team
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Data Management cluster summary
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
INFNGRID Workshop – Bari, Italy, October 2004
Presentation transcript:

HEPiX Storage, Edinburgh May 2004 SE Experiences Supporting Multiple Interfaces to Mass Storage J Jensen

HEPiX Edinburgh May 2004 Outline u Objectives u Achievements u Experiences, lessons learned n Protocols, interfaces n Technical u Future u Conclusion

HEPiX Edinburgh May 2004 Objectives u Implement uniform interfaces to mass storage n Independent of underlying storage system u SRM n Uniform interface – much is optional u Develop back-end support for mass storage systems n Provide missing features – directory support? u Publish information

HEPiX Edinburgh May 2004 Objectives – SRM u SRM 1 provides async get, put n get (put) returns request id n getRequestStatus returns status of request n When status = Ready, status contains Transfer URL – aka TURL n Client changes status to Running n Client downloads (uploads) file from (to) TURL n Client changes status to Done u Files can be pinned and unpinned

HEPiX Edinburgh May 2004 Objectives – SRM u SRM 1 interface requires web services n SOAP messages via HTTP n Also SOAP via GSI-HTTP: HTTPG u Data Transfer n GridFTP mandatory s Requires GSI s Also requires Gridmap files

HEPiX Edinburgh May 2004 Achievements u In EDG, we developed EDG Storage Element n Uniform interface to mass storage and disk n Interfaces with EDG Replica Manager n Also client command line tools n Interface was based on SRM but simplified s Synchronous s Trade-off between getting it done soon and getting it right the first time s Additional functionality such as directory functions n Highly modular system

HEPiX Edinburgh May 2004 Achievements – SE TIME Look up user User database File metadata Request and handler process management Look up file data Access control MSS access Thin layer interface Mass Storage

HEPiX Edinburgh May 2004 Achievements - SE u The request contains the sequence of names of handlers u As each handler processes the request, it calls a library that moves the name to an audit section u The library allows easy access to global data u Handlers may also have handler- specific data in the XML u Storing the XML output from each handler as the request gets processed makes it easy to debug the SE handler3 handler4 handler5 Global Data handler1 handler2 sequence audit XML

HEPiX Edinburgh May 2004 Achievements - SE u The choice of architecture was right n Multiple interfaces can be supported with the same core n New functionality can be added fairly easily s Adding or removing access control is more or less a question of adding the appropriate handler n Easy to debug s Messages passed between handlers can be debugged easily n Disadvantage: not as fast as a monolithic core s Speed can be improved by having persistent handlers

HEPiX Edinburgh May 2004 Definition u A current definition of Storage Element u A Storage Element must provide: n GridFTP for data transfer n An SRM 1.0 or 1.1 web services interface s The difference is SRMCopy – 3 rd party copying s Which the client can always do anyway using GridFTP n Information published via MDS in GLUE schema s This includes the end point u Interoperability via WSDL document

HEPiX Edinburgh May 2004 Experiences – technical u Initially we had to use Java for web services (Tomcat) u Interfacing Java to Unix processes is not always painless: u Java does not appear to work well with Unix pipes u We had to write a socket-to-pipe daemon Parent process Child process New process fork() exec() pipes for std{in,out,err}

HEPiX Edinburgh May 2004 Experiences – technical u gSOAP is pretty good these days n Now uses std::string for strings u But how to do GSI n We use Ben Couturiers plugin s Clean s Need no knowledge of Globus API to use it n The official GSI code was recently rewritten s It now uses Globus GSSAPI s Previously it used Globus IO module s Free, but you must register your address

HEPiX Edinburgh May 2004 Experiences – development u We use RFIO for data access to both CASTOR and HPSS n Unfortunately the RFIO libraries are binary incompatible – but installed in the same location n Slight differences in rfio.h, too u ADS n stress testing found limitations in the pathtape interface n All users in a VO map to same ADS user s EDG testbed: Hit limits on concurrent writes

HEPiX Edinburgh May 2004 Experiences - development u Look for opportunities for component reuse n Used or improved components from other EDG WPs n Almost all parts of the Data Transfer components developed externally u Prefer Open Source n Often need to look at source to debug or supplement docs u Prototype implementations live longer than expected n SEs metadata system was implemented as prototype n Now replaced with improved system

HEPiX Edinburgh May 2004 Experiences – protocols u SRM 1 (and the EDG SE interface) is a control interface u Separating control from data transfer is useful u Can be used for load balancing, redirection, etc u Easy to add new data transfer protocols u However, files in cache must be released by the client or time out controldata xf client

HEPiX Edinburgh May 2004 Experiences – SRM 1 u WSDL definitely helped interoperability u Nevertheless we often saw and still see incompatibilities between various implementations u Many parts of the protocol are open to interpretation n E.g. SRMCopy – how does client know that copy has finished? And what the error is if it fails?

HEPiX Edinburgh May 2004 Experiences - protocols Confusingly many names for files: u LFN: Logical File Name u GUID: used by RM u SFN: Site filename n aka PFN – physical u SURL: SFN as URL u StFN: Storage file name u TURL: Transfer URL Replica manager Replica catalogue Storage Element Mass Storage User SE disk cache LFN GUID SURL TURL StFN SFN

HEPiX Edinburgh May 2004 Experiences – file access u Requirement: Clients must be able to access file in MSS independently Not all access can be controlled by SE SE cannot guarantee that it has the file… …nor that file hasnt been modified s Need to keep checksums of data u Also hard to guarantee reservations… n …unless the MSS has can guarantee space for the SE

HEPiX Edinburgh May 2004 Experiences – GridFTP u Requires Gridmap files n Pooled accounts (contributed by Andrew McNab) help make this scalable u Observed data transfer rates: 1-6 MB/s n But seen 30 MB/s with specially tuned settings u 0.7 seconds to transfer a 200 bytes file n Time is spent negotiating secure connection

HEPiX Edinburgh May 2004 Experiences – NFS u Used by jobs on worker nodes n SE is NFS mounted on WNs u Who converts TURL to local file name? n E.g. put the mount point into the filename u Potential need for several copies of file in disk cache n …due to ownership and access control issues n …and who gets the file? Who releases it? How does client get the filename? What if the file times out before the job runs?

HEPiX Edinburgh May 2004 Experiences – protocols u SRM messages are well designed u Each has a status: n Pending, Failed, Running, Done u An array of file objects n Status: Pending, Ready, Running, Failed, Done n TURL, if applicable n File metadata (size etc) Status=Running File1: SFN=se.ac.uk/foo status=Ready TURL=gsiftp://se.ac.uk/… File2: SFN=se.ac.uk/bar status=Failed Error=Access denied File3: SFN=se.ac.uk/nain status=Pending

HEPiX Edinburgh May 2004 Experiences – technical u The dCache SRM server/client n Standard – an SRM SHOULD work with these n …but not Open Source n Many idiosyncrasies s Fetches WSDL file before sending commands! So the SOAP server has to be a web server as well – and HTTP/1.1 is not entirely trivial to implement correctly – and we cant just use Apache because of GSI s Doesnt check all errors – e.g. if a request fails s Doesnt set status to Running

HEPiX Edinburgh May 2004 Experiences – security u Delegation is not ideal n Server must be trusted fully, no fine grained capabilities n Proxy certificates incompatible with normal certificates (but on IETF track) u Incompatibility between GSI (HTTPG) and HTTPS n We used HTTPS initially, hooking the SE core into Apache s Easy to develop and debug, reliable, well understood tech s Debug with curl etc n Need special clients for GSI n Andrew McNab proposes G-HTTPS: compatible with HTTPS, can still do delegation

HEPiX Edinburgh May 2004 Future – SRM 2.1 u Frozen-ish as of this spring u Provides lots of new functionality… n …much of which is optional… n …e.g. directory support, space reservation n SRM 2 basic will (probably) be functionally equivalent to SRM 1 u Space types, file types: Volatile, Durable, Permanent n Slightly complicated semantics

HEPiX Edinburgh May 2004 Future – SRM 2.1 u Guaranteed space reservation is hard n Unless you have infinite space u Define all state transitions for the protocol n Avoid ambiguities in SRM 1 u WSRF currently being discussed in SRM community

HEPiX Edinburgh May 2004 Future – SRB u There exists an SRB SRM interface n Or not? n Depending on who you ask u Build an SRM SRB interface? n E.g. build a handler for the EDG SE n Need to consider data transfer: GridFTP vs SRB

HEPiX Edinburgh May 2004 DICOM server support The Grid Storage Element WP10 DM2 DICOM Server Metadata Encrypt, anonymise Metadata Store keyStore patient metadata Access control on metadata required; different ACLs for different types of metadata Biomed applications in EGEE: Difficult task, not done in EDG

HEPiX Edinburgh May 2004 Conclusion u We have built a good framework u SRM 1 is a good choice n But file access is not trivial for the client u Technology is maturing n But not as quickly as most people would like n Still frequent tech and requirements changes n Prototypes go into production n Get it right vs build it now u Lots of idiosyncrasies – some end up getting promoted to standard u Still many challenges ahead – e.g. reservations