1 EMI INFSO-RI Dynamic Federations Seamless aggregation of standard-protocol-based storage endpoints Fabrizio Furano Patrick Fuhrmann Paul Millar Daniel Becker Adrien Devresse Oliver Keeble (Presenter) Ricardo Brito da Rocha Alejandro Alvarez Credits to ShuTing Liao (ASGC) 1
Dynamic Federations, Lyon, Sept EMI INFSO-RI Dynamic HTTP Federations Federation – Simplicity, redundancy, storage/network efficiency, elasticity, performance HTTP – Standard clients everywhere – One protocol for everything Single protocol for WAN and LAN – Transparent redirection Use cases – Easy, direct job/user data access, WAN friendly – Access missing files after job starts – Friend sites can share storage – Diskless sites – Cache integration (future)
Dynamic Federations, Lyon, Sept EMI INFSO-RI Storage federations What’s the goal? – Make different storage clusters be seen as one – Make global file-based data access seamless How should this be done? – Dynamically easy to setup/maintain no complex metadata persistency no DB babysitting (keep it for the experiment’s metadata) no replica catalogue inconsistencies, by design – Light constraints on participating storage – Using standards No strange APIs, everything looks familiar Global access to global data
Dynamic Federations, Lyon, Sept EMI INFSO-RI What is federated? We federate (meta)data repositories that are ‘compatible’ – HTTP interface – Name space (modulo simple prefixes) Including catalogues – Permissions (they don’t contradict across sites) – Content (same key or filename means same file [modulo translations]) Dynamically and transparently discovering metadata – looks like a unique, very fast file metadata system – properly presenting the aggregated metadata views – redirecting clients to the geographically closest endpoint Local SE is preferred The system also can load a “Geo” plugin
Dynamic Federations, Lyon, Sept EMI INFSO-RI Technically TODAY we can aggregate: – SEs with DAV/HTTP interfaces – dCache, DPM Future: Xrootd? EOS? Storm? – Catalogues with DAV/HTTP interfaces LFC supported Future: Experiment catalogues could be integrated – Cloud DAV/HTTP/S3 services – Anything else that happens to have an HTTP interface… Caches – Native LFC and DPM databases What is federated?
Dynamic Federations, Lyon, Sept 2012 Why HTTP/DAV? It’s everywhere – A very widely adopted technology It has the right features – Redirection, WAN friendly Convergence – Transfers and data access – No other protocols required We (humans) like browsers, they give an experience of simplicity – Integrated web apps 6 EMI INFSO-RI
Dynamic Federations, Lyon, Sept 2012 DPM/HTTP DPM has invested significantly in HTTP as part of the EMI project – New HTTP/DAV interface – Parallel WAN transfers – 3 rd party copy – Solutions for replica fallback “Global access” and metalink – Performance evaluations Experiment analyses Hammercloud Synthetic tests Root tests 7
Dynamic Federations, Lyon, Sept 2012 DPM i/o interface comparison DPM and Random I/O – Longstanding issue with ATLAS DPM usage – Bad RFIO performance forced ‘download first’ – We’re doing a thorough evaluation of the current alternatives to RFIO XROOT vs HTTP vs RFIO NFS to be included when it’s writable In collaboration with ASGC – Results will make you happy LAN / Chunk Size: / File Size: 2G ProtocolN. ReadsRead SizeRead Time HTTP50022,773, HTTP100046,027, XROOT50022,773, XROOT100046,027, RFIO50022,773, RFIO100046,027, PRELIMINARY 8
Dynamic Federations, Lyon, Sept 2012 DPM i/o interface comparison DPM and Random I/O – Longstanding issue with ATLAS DPM usage – Bad RFIO performance forced ‘download first’ – We’re doing a thorough evaluation of the current alternatives to RFIO XROOT vs HTTP vs RFIO NFS to be included when it’s writable In collaboration with ASGC – Results will make you happy WAN / Chunk Size: / File Size: 2G ProtocolN. ReadsRead SizeRead Time HTTP50022,773, HTTP100046,027, XROOT50022,773, XROOT100046,027, RFIO50022,773,112 RFIO100046,027,143 PRELIMINARY 9
Dynamic Federations, Lyon, Sept EMI INFSO-RI Demo We have set up a stable demo testbed, using HTTP/DAV – Head node in DESY: – a DPM instance at CERN – a DPM instance at ASGC (Taiwan) – a dCache instance in DESY – a Cloud storage account by Deutsche Telecom The feeling it gives is surprising – Metadata performance is in avg higher than contacting the endpoints We see the directories as merged, as it was only one system There’s one test file in 3 sites, i.e. 3 replicas. – /myfed/atlas/fabrizio/hand-shake.JPG – Clients in EU get the one from DESY/DT/CERN – Clients in Asia get the one from ASGC
11 EMI INFSO-RI /dir1/file1 /dir1/file2 Storage/MD endpoint 1 /dir1/file2 /dir1/file3 Storage/MD endpoint 2 /dir1 /dir1/file1 /dir1/file2 /dir1/file3 Aggregation We see this All the metadata interactions are hidden NO persistency needed here, just efficiency and parallelism With 2 replicas The basic idea
12 LFC SE LFC or DB SE Plain DAV/HTTP Plain DAV/HTTP EMI INFSO-RI May 2012 Client Plain DAV/HTTP Plain DAV/HTTP Aggregator (UGR) Plugin DMLite Frontend (Apache2+DMLite) Plugin DAV/HTTPPlugin HTTP Example
13 EMI INFSO-RI Server architecture Clients come and are distributed through: different machines (DNS alias) different processes (Apache config) Clients are served by the UGR. They can browse/stat or be redirected for action. The architecture is multi/manycore friendly and uses a fast parallel caching scheme
Dynamic Federations, Lyon, Sept EMI INFSO-RI Name translation A sophisticated scheme of name translation is a key to be able to federate almost any source of metadata – UGR implements algorithmic translations and can accommodate non algorithmic ones as well – A plugin could also query an external service (e.g. an LFC or a private DB)
Dynamic Federations, Lyon, Sept EMI INFSO-RI Design and performance Full parallelism – Composes on the fly the aggregated metadata views by managing parallel tasks of information location Never stacks up latencies! The endpoints are treated in a completely independent way – No limit to the number of outstanding clients/tasks – No global locks/serialisations! – Thread pools, prod/consumer queues used extensively (e.g. to stat N items in M endpoints while X clients wait for some items) Aggressive metadata caching – The metadata caching keeps the performance high Peak raw cache performance is ~500K->1M hits/s per core – A relaxed, hash-based, in-memory partial name space – Juggles info in order to always contain what’s needed Keep them in an LRU fashion and we have a fast 1st level namespace cache – Stalls clients the minimum time that is necessary to juggle their information bits
Dynamic Federations, Lyon, Sept EMI INFSO-RI Design and performance Horizontally scalable deployment – Multithreaded – DNS balanceable High performance DAV client implementation – Wraps DAV calls into a POSIX-like API, saves from the difficulty of composing requests/responses – Performance is privileged: uses libneon w/ sessions caching – Compound list/stat operations are supported – Loaded by the core as a “location” plugin
Dynamic Federations, Lyon, Sept EMI INFSO-RI A performance test Two endpoints: DESY and CERN (poor VM) One UGR at DESY 10K files in a 4-levels deep directory – Files exist on both endpoints The test (written in C++) invokes Stat only once per file, using many parallel clients doing stat() at the maximum pace from 3 machines
18 EMI INFSO-RI The result, WAN access
Dynamic Federations, Lyon, Sept 2012 Get started Get it here: ds What you can do with it: – Easy, direct job/user data access, WAN friendly – Access missing files after job starts – Friend sites can share storage – Diskless sites – Federating catalogues Combining catalogue-based and catalogue-free data 19
Dynamic Federations, Lyon, Sept EMI INFSO-RI Dynamic Feds cf XROOTD feds XROOTD federations are focused on the “redirection” concept – Very light at the meta-manager, just redirect clients away as soon as possible If not possible, the penalty is 5 seconds per jump – Global listing is implemented in the client, slowish, hiccup-prone – Some details do not match yet very well with quick geography-aware redirections Dynamic Federations support both the “redirection” concept and the “browsing” concept by design – Much more centred on the meta-manager We can’t touch the clients – Cache metadata for the clients, in-memory – Designed for scalability, performance and features – Extendable plugin architecture, geography-aware redirection – Can speak any protocol, our focus in on http-based things
Dynamic Federations, Lyon, Sept EMI INFSO-RI Next steps Release our beta, as the nightlies are good More massive tests, with many endpoints, possibly distant – We are now looking for partners Precise performance measurements Refine the handling of the ‘death’ of the endpoints Immediate sensing of changes in the endpoints’ content, e.g. add, delete – SEMsg in EMI2 SYNCAT would be the right thing in the right place Some more practical experience (getting used to the idea, using SQUIDs, CVMFS, EOS, clouds,... )
Dynamic Federations, Lyon, Sept EMI INFSO-RI HTTP for xrootd An XROOTD federation gives the goodies/hooks of the XROOTD framework – This involves also many other components and groups of people Monitoring of the FAX is a perfect example IT-GT (CERN) will produce an HTTP plugin for XROOTD Double-headed data access – Discussions started Effort will be scoped early Oct Will involve xrootd framework enhancements too Federate the same clusters also via HTTP Pure HTTP/DAV endpoints can join normally Let users enjoy HTTP or XROOTD?HTTP and XROOTD?
Dynamic Federations, Lyon, Sept 2012 References Wiki page and packages – CHEP papers – Federation – – DPM & dmlite – – HTTP/dav – 23
Dynamic Federations, Lyon, Sept EMI INFSO-RI Conclusions Dynamic Federations: an efficient, persistency- free, easily manageable approach to federate remote storage endpoints HTTP, standard, WAN and cloud friendly Interoperating with and augmenting the xrootd ones is desirable and productive Work in progress, status is very advanced, demoable, installable, documented.
Dynamic Federations, Lyon, Sept EMI INFSO-RI Thank you EMI is partially funded by the European Commission under Grant Agreement INFSO-RI Partially funded by Questions?