1 EMI INFSO-RI-261611 Dynamic Federations Seamless aggregation of standard-protocol-based storage endpoints Fabrizio Furano Patrick Fuhrmann Paul Millar.

Slides:



Advertisements
Similar presentations
COM vs. CORBA.
Advertisements

HEP Data Sharing … … and Web Storage services Alberto Pace Information Technology Division.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Understanding and Managing WebSphere V5
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
16 th May 2006Alessandra Forti Storage Alessandra Forti Group seminar 16th May 2006.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
Ricardo Rocha ( on behalf of the DPM team ) Standards, Status and Plans.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
IT-SDC : Support for Distributed Computing An HTTP federation prototype for LHCb Fabrizio Furano 1.
ArcGIS Server for Administrators
CERN IT Department CH-1211 Geneva 23 Switzerland GT WG on Storage Federations First introduction Fabrizio Furano
6 th dCache WS | Daniel Becker| 18 April 2012 | 1 Daniel Becker 6 th dCache workshop, Zeuthen, April 18, 2012 The HTTP Federation.
WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM Collaboration Motivation and proposal Oliver Keeble CERN On.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
CERN IT Department CH-1211 Geneva 23 Switzerland GT Davix A toolkit for efficient data access with HTTP/DAV based protocols Fabrizio Furano.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Standard Interfaces to Grid Storage DPM and LFC Update Ricardo.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
CERN IT Department CH-1211 Geneva 23 Switzerland GT HTTP solutions for data access, transfer, federation Fabrizio Furano (presenter) on.
The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
EMI INFSO-RI EMI Quality Assurance Tools Lorenzo Dini (CERN) SA2.4 Task Leader.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
ATLAS Database Access Library Local Area LCG3D Meeting Fermilab, Batavia, USA October 21, 2004 Alexandre Vaniachine (ANL)
EMI INFSO-RI Catalogue synchronization & ACL propagation Fabrizio Furano (CERN IT-GT)
CERN IT Department CH-1211 Geneva 23 Switzerland GT WG on Storage Federations Sept 2012 Usages and Goals Summary Fabrizio Furano on behalf.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Overview of DMLite Ricardo Rocha ( on behalf of the LCGDM team.
IT-SDC : Support for Distributed Computing Dynamic Federations: scalable, high performance Grid/Cloud storage federations Fabrizio Furano - Oliver Keeble.
DMLite GridFTP frontend Andrey Kiryanov IT/SDC 13/12/2013.
EMI is partially funded by the European Commission under Grant Agreement RI Roadmap & Future Work Ricardo Rocha ( on behalf of the DPM team )
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
IT-SDC : Support for Distributed Computing Dynafed FTS3 Human Brain Project use cases Fabrizio Furano Alejandro Alvarez.
EMI INFSO-RI Testbed for project continuous Integration Danilo Dongiovanni (INFN-CNAF) -SA2.6 Task Leader Jozef Cernak(UPJŠ, Kosice, Slovakia)
EMI INFSO-RI Patrick Fuhrmann EMI Data area leader At the EGI Technical Forum 2011, in Lyon EMI-Data The second year.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
DPM: Future Proof Storage Ricardo Rocha ( on behalf of the DPM team ) EMI INFSO-RI
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Standard Protocols in DPM Ricardo Rocha.
EMI INFSO-RI Catalogue synchronization & ACL propagation Fabrizio Furano (CERN IT-GT-DMS)
EMI INFSO-RI Catalogue synchronization & ACL propagation Fabrizio Furano (CERN IT-GT)
EMI is partially funded by the European Commission under Grant Agreement RI DPM in EMI-II HTTP and NFS interfaces Oliver Keeble On behalf of DPM.
IT-SDC : Support for Distributed Computing Dynamic Federation of Grid and Cloud Storage Fabrizio Furano, Oliver Keeble, Laurence Field Speaker: Fabrizio.
EMI is partially funded by the European Commission under Grant Agreement RI Future Proof Storage with DPM Oliver Keeble (on behalf of the CERN IT-GT-DMS.
EMI INFSO-RI Catalogue synchronization & ACL propagation Fabrizio Furano (CERN IT-GT)
1 EMI INFSO-RI Dynamic Federations Seamless aggregation of standard-protocol-based storage endpoints Fabrizio Furano Patrick Fuhrmann Paul Millar.
Dynamic Federation of Grid and Cloud Storage Fabrizio Furano, Oliver Keeble, Laurence Field Speaker: Fabrizio Furano.
CMS data access Artem Trunov. CMS site roles Tier0 –Initial reconstruction –Archive RAW + REC from first reconstruction –Analysis, detector studies, etc.
CERN IT Department CH-1211 Geneva 23 Switzerland GT Dynamic Federations Seamless aggregation of open-protocol-based storage endpoints Fabrizio.
Federating Data in the ALICE Experiment
Jean-Philippe Baud, IT-GD, CERN November 2007
Dynamic Storage Federation based on open protocols
Ricardo Rocha ( on behalf of the DPM team )
Vincenzo Spinoso EGI.eu/INFN
Dynafed, DPM and EGI DPM workshop 2016 Speaker: Fabrizio Furano
GT Dynamic Federations
Introduction to Data Management in EGI
GFAL 2.0 Devresse Adrien CERN lcgutil team
EGI UMD Storage Software Repository (Mostly former EMI Software)
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Australia Site Report Sean Crosby DPM Workshop – 13 December 2013.
Presentation transcript:

1 EMI INFSO-RI Dynamic Federations Seamless aggregation of standard-protocol-based storage endpoints Fabrizio Furano Patrick Fuhrmann Paul Millar Daniel Becker Adrien Devresse Oliver Keeble (Presenter) Ricardo Brito da Rocha Alejandro Alvarez Credits to ShuTing Liao (ASGC) 1

2 EMI INFSO-RI Sept 2012 F.Furano - Dynamic federations Storage Federations: Motivations Currently data lives on islands of storage catalogues are the maps FTS/gridFTP are the delivery companies Experiment frameworks populate the island Jobs are directed to places where the needed data is or should be Almost all data lives on more than one island Assumption : perfect storage ( unlikely to impossible) perfect experiment workflow and catalogues ( unlikely ) Strict locality has some limitations – a single missing file can derail the whole job or series of jobs -> Failover to data on another island could help Replica catalogues impose limitations, too – E.g. synchronization is difficult, performance too Quest for direct, Web-like forms of data access Great plus: other use cases may be fulfilled e.g. site caching, sharing storage amongst sites

Dynamic Federations, Lyon, Sept EMI INFSO-RI Storage federations What ’ s the goal? – Make different storage clusters be seen as one – Make global file-based data access seamless How should this be done? – Dynamically easy to setup/maintain no complex metadata persistency no DB babysitting (keep it for the experiment ’ s metadata) no replica catalogue inconsistencies, by design – Light config constraints on participating storage – Using standards No strange APIs, everything looks familiar Global direct access to global data

11 EMI INFSO-RI /dir1/file1 /dir1/file2 Storage/MD endpoint 1 /dir1/file2 /dir1/file3 Storage/MD endpoint 2 /dir1 /dir1/file1 /dir1/file2 /dir1/file3 Aggregation We see this All the metadata interactions are hidden NO persistency needed here, just efficiency and parallelism With 2 replicas The basic idea

Dynamic Federations, Lyon, Sept EMI INFSO-RI Dynamic HTTP Federations Federation – Simplicity, redundancy, storage/network efficiency, elasticity, performance – Dynamic: does everything on the fly, no DB indexing glue needed Focus on HTTP/DAV – Standard clients everywhere – One protocol for everything Single protocol for WAN and LAN – Transparent redirection Use cases – Easy, direct job/user data access, WAN friendly – Access missing files after job starts – Friend sites can share storage – Diskless sites – Cache integration (future)

Dynamic Federations, Lyon, Sept EMI INFSO-RI What is federated? We federate (meta)data repositories that are ‘ compatible ’ – HTTP interface – Name space (modulo simple prefixes) Including catalogues – Permissions (they don ’ t contradict across sites) – Content (same key or filename means same file [modulo translations]) Dynamically and transparently discovering metadata – looks like a unique, very fast file metadata system – properly presenting the aggregated metadata views – redirecting clients to the geographically closest endpoint Local SE is preferred The system also can load a “ Geo ” plugin

Dynamic Federations, Lyon, Sept EMI INFSO-RI Technically TODAY we can aggregate: – SEs with DAV/HTTP interfaces – dCache, DPM Future: Xrootd? EOS? Storm? – Catalogues with DAV/HTTP interfaces LFC supported Future: Experiment catalogues could be integrated – Cloud DAV/HTTP/S3 services – Anything else that happens to have an HTTP interface… Caches – Native LFC and DPM databases What is federated?

Dynamic Federations, Lyon, Sept 2012 Why HTTP/DAV? It ’ s everywhere – A very widely adopted technology It has the right features – Redirection, WAN friendly Convergence – Transfers and data access – No other protocols required We (humans) like browsers, they give an experience of simplicity – Open to direct access and integrated web apps 6 EMI INFSO-RI

Dynamic Federations, Lyon, Sept 2012 DPM/HTTP DPM has invested significantly in HTTP as part of the EMI project – New HTTP/DAV interface – Parallel WAN transfers – 3 rd party copy – Solutions for replica fallback “ Global access ” and metalink – Performance evaluations Experiment analyses Hammercloud Synthetic tests Root tests 7

Dynamic Federations, Lyon, Sept EMI INFSO-RI Demo We have set up a stable demo testbed, using HTTP/DAV – Head node in DESY: – a DPM instance at CERN – a DPM instance at ASGC (Taiwan) – a dCache instance in DESY – a Cloud storage account by Deutsche Telecom The feeling it gives is surprising – Metadata performance is in avg higher than contacting the endpoints We see the directories as merged, as it was only one system There ’ s one test file in 3 sites, i.e. 3 replicas. – /myfed/atlas/fabrizio/hand-shake.JPG – Clients in EU get the one from DESY/DT/CERN – Clients in Asia get the one from ASGC There’s a directory whose content is interleaved between CERN and DESY – There’s a directory where all the files are in two places –

11 LFC SE LFC or DB SE Plain DAV/HTTP Plain DAV/HTTP EMI INFSO-RI Sept 2012 F.Furano - Dynamic federations Client Plain DAV/HTTP Plain DAV/HTTP Aggregator (UGR) Plugin DMLite Frontend (Apache2+DMLite) Plugin DAV/HTTPPlugin HTTP Example

Dynamic Federations, Lyon, Sept EMI INFSO-RI Design and performance Full parallelism – Composes on the fly the aggregated metadata views by managing parallel tasks of information location Never stacks up latencies! The endpoints are treated in a completely independent way – No limit to the number of outstanding clients/tasks – No global locks/serialisations! – Thread pools, prod/consumer queues used extensively (e.g. to stat N items in M endpoints while X clients wait for some items) Aggressive metadata caching – The metadata caching keeps the performance high Peak raw cache performance is ~500K->1M hits/s per core – A relaxed, hash-based, in-memory partial name space – Juggles info in order to always contain what ’ s needed Keep them in an LRU fashion and we have a fast 1st level namespace cache – Stalls clients the minimum time that is necessary to juggle their information bits

13 EMI INFSO-RI Server architecture Clients come and are distributed through: different machines (DNS alias) different processes (Apache config) Clients are served by the UGR. They can browse/stat or be redirected for action. The architecture is multi/manycore friendly and uses a fast parallel caching scheme

Dynamic Federations, Lyon, Sept EMI INFSO-RI Name translation A sophisticated scheme of name translation is a key to be able to federate almost any source of metadata – UGR implements algorithmic translations and can accommodate non algorithmic ones as well – A plugin could also query an external service (e.g. an LFC or a private DB)

Dynamic Federations, Lyon, Sept EMI INFSO-RI Design and performance Horizontally scalable deployment – Multithreaded – DNS balanceable High performance DAV client implementation – Wraps DAV calls into a POSIX-like API, saves from the difficulty of composing requests/responses – Performance is privileged: uses libneon w/ sessions caching – Compound list/stat operations are supported – Loaded by the core as a “ location ” plugin

Dynamic Federations, Lyon, Sept EMI INFSO-RI A performance test Two endpoints: DESY and CERN (poor VM) One UGR frontend at DESY Swarm of test clients at CERN 10K files in a 4-levels deep directory – Files exist on both endpoints The test (written in C++) invokes Stat only once per file, using many parallel clients doing stat() at the maximum pace from 3 machines

18 EMI INFSO-RI The result, WAN access

Dynamic Federations, Lyon, Sept 2012 Get started Get it here: ds ds What you can do with it: – Easy, direct job/user data access, WAN friendly – Access missing files after job starts – Friend sites can share storage – Diskless sites – Federating catalogues Combining catalogue-based and catalogue-free data 19

Dynamic Federations, Lyon, Sept EMI INFSO-RI Next steps Release our beta, as the nightlies are good More massive tests, with many endpoints, possibly distant – We are now looking for partners Precise performance measurements Refine the handling of the ‘ death ’ of the endpoints Immediate sensing of changes in the endpoints ’ content, e.g. add, delete – SEMsg in EMI2 SYNCAT would be the right thing in the right place Some more practical experience (getting used to the idea, using SQUIDs, CVMFS, EOS, clouds,... )

Dynamic Federations, Lyon, Sept 2012 References Wiki page and packages – CHEP papers – Federation – – DPM & dmlite – – HTTP/dav – 23

Dynamic Federations, Lyon, Sept EMI INFSO-RI Conclusions Dynamic Federations: an efficient, persistency- free, easily manageable approach to federate remote storage endpoints HTTP, standard, WAN and cloud friendly Interoperating with and augmenting the xrootd ones is desirable and productive Work in progress, status is very advanced, demoable, installable, documented.

Dynamic Federations, Lyon, Sept EMI INFSO-RI Thank you EMI is partially funded by the European Commission under Grant Agreement INFSO-RI Partially funded by Questions?