Summary of Data Management Jamboree Ian Bird WLCG Workshop Imperial College 7 th July 2010.

Slides:



Advertisements
Similar presentations
Storage Workshop Summary Wahid Bhimji University Of Edinburgh On behalf all of the participants…
Advertisements

Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Distributed Xrootd Derek Weitzel & Brian Bockelman.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
File Management Systems
Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.
File System Implementation
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
File Systems (2). Readings r Silbershatz et al: 11.8.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
LHCb input to DM and SM TEGs. Remarks to DM and SM TEGS Introduction m We have already provided some input during our dedicated session of the TEG m Here.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Workshop summary Ian Bird, CERN WLCG Workshop; DESY, 13 th July 2011 Accelerating Science and Innovation Accelerating Science and Innovation.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
OSG Storage Architectures Tuesday Afternoon Brian Bockelman, OSG Staff University of Nebraska-Lincoln.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
SRM workshop – September’05 1 SRM: Expt Reqts Nick Brook Revisit LCG baseline services working group Priorities & timescales Use case (from LHCb)
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
WP19 DESY Development Plan Frank Schlünzen Jürgen Starek.
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
ATLAS XRootd Demonstrator Doug Benjamin Duke University On behalf of ATLAS.
Evolution of storage and data management Ian Bird GDB: 12 th May 2010.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.
EMI INFSO-RI Catalogue synchronization & ACL propagation Fabrizio Furano (CERN IT-GT)
Author - Title- Date - n° 1 Partner Logo WP5 Status John Gordon Budapest September 2002.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
Data management demonstrators Ian Bird; WLCG MB 18 th January 2011.
Ian Bird CERN, 17 th July 2013 July 17, 2013
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
J. Templon Nikhef Amsterdam Physics Data Processing Group Storage federations Why & How : site perspective Jeff Templon Pre-GBD october CDN vision.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Wahid Bhimji (Some slides are stolen from Markus Schulz’s presentation to WLCG MB on 19 June Apologies to those who have seen some of this before)
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The AliEn File Catalogue Jamboree on Evolution of WLCG Data &
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Evolution of WLCG Data & Storage Management Outcome of Amsterdam.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
Computing Fabrics & Networking Technologies Summary Talk Tony Cass usual disclaimers apply! October 2 nd 2010.
Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.
WLCG Jamboree – Summary of Day 1 Simone Campana, Maria Girone, Maarten Litmaath, Gavin McCance, Andrea Sciaba, Dan van der Ster.
File-System Management
Federating Data in the ALICE Experiment
Evolution of storage and data management
Scalable sync-and-share service with dCache
WLCG Network Discussion
DAaM summary Ian Bird MB 29th June 2010.
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Global Data Access – View from the Tier 2
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
dCache Scientific Cloud
Final summary Ian Bird Amsterdam, DAaM 18th June 2010.
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
Ákos Frohner EGEE'08 September 2008
WLCG Collaboration Workshop;
Chapter 2: Operating-System Structures
Chapter 2: Operating-System Structures
Presentation transcript:

Summary of Data Management Jamboree Ian Bird WLCG Workshop Imperial College 7 th July 2010

Summary – 1 General points – Today use of available resources, particularly network bandwidth and disk storage is probably not optimal, and reasonable step can be taken to make the existing resources more effective. – Using the network to access data when it is not available locally is a reasonable mechanism to improve the efficiency of data access (e.g. 95% of files may be available at a site, the other 5% can be retrieved or accessed remotely, presumably with local caches too). Storage – It was clear that traditional HEP use of tape (write once, read very often) is no longer supportable. It clearly cannot support the almost random access patterns typical in all but organised production scenarios, and there is a distinct mismatch in performance between tape and the consumers. – The (tape) archives should be separate from the caching layers, and treated as a true archive where data is written and rarely read. The exception is when entire datasets may be brought online by a production manager, or when data has been lost from disk and needs to be recovered. – This should simplify the interfaces to both the archive layer and the caches. The interface to the archive can be based on a simple subset of SRM, to enable explicit move of data between archive and cache (essentially including bulk operations). – This will also allow the use of industrial solutions for managing the archives, and puts the use of the large tape systems into the operational mode for which they were designed (as opposed to the way in which HEP traditionally uses them). – This also removes the direct connection between cache and archive, avoiding that general users can trigger random tape access.

Summary – 2 Data Access Layer – Improving the efficiency of user access to data, particularly for analysis, was one of the major drivers behind this workshop. – It should no longer be assumed that a job will find all the files that it needs at a site. Missing data should be accessed remotely or fetched and cached locally. Which is the better mechanism should be investigated. – It cannot be assumed that a catalogue is completely up to date, and hence it is important that missing data be fetched on demand. – Effective caching mechanisms can reduce or optimise existing usage of disk space. Several caching mechanisms and ideas were discussed. Particularly interesting are proxy caches where a proxy at a site manages the interaction between requestors and remote access. It is important that the caching tools and caching algorithms be treated separately. A variety of algorithms may be required in different circumstances and should be supported by the tools. – A combination of organised pre-placement of data and caching may be one option. – Alternatively it could be that no pre-placement is needed and that data popularity will define what gets into the caches. – The desired model of data access by the application is that of the file system. I.e. the job should be able to open, read, etc. and not have to concern itself with the underlying mechanisms. Complexities such as SRM should not be visible to the user.

Summary – 3 Data Transfer – There are several data transfer use cases that must be supported: – A reliable way to move data from a job to an archive, asynchronously. A job can finish and give its output file to a service and be sure that the data will be delivered somewhere and catalogued. (The tools perhaps should separate the two operations – i.e. catalogue updates should not be in the transfer tool). – A mechanism to support organised data placement. – A mechanism to support data transfer into caches. – Support for remote access to data not at a site, or to trigger a transfer of data to a local cache. Several specific requirements on improving FTS were made, which should also apply to any other transfer mechanism: – Well defined recovery from failure. – Ability to do partial transfers, or to transfer partial files, particularly to recover from failure – Ability to manipulate chunks of data (many files, datasets) as a single entity

Summary – 4 Namespaces, Catalogues, Authorization, etc. – The need is for dynamic catalogues that reflect the changing contents of storage, and must be synchronized with the storage systems (asynchronously) – The experiment computing models must recognise that this information may not be 100% accurate. Again back to the need to recover from this by remotely accessing (or bringing from remote sites) the missing files. – Technologies for achieving the catalogue synchronisation could be based on something like LFC together with a Message Bus, or perhaps the “catalogue” could be actually a dynamic information system (e.g. Message Bus, Distributed Hash Table, Bloom Filter, etc.). – There is a need for a global namespace, based on both LFNs and GUIDs depending on usage. – ACLs should be implemented once (e.g. in Catalogue) but back doors into storage systems must not be allowed. – The Alien FC was proposed as a general solution for a central catalogue, with many other features.

Summary – 5 Global Home Directory facility – Such a facility seems to be required. The Alien FC has this as an integral part. Other solutions could be based on commercial or opensource technology (drop boxes, Amazon S3, etc.). Demonstrators: – To be fleshed out and plans presented at WLCG workshop next week

Demonstrators proposed Brian Bockelman: xrootd-enable filesystems (HDFS, Posix, dcache + others) at some volunteer sites. Global re-director at 1 location, allow the system to cache as needed for Tier 3 use. Massimo Lamanna: very similar proposal with same use cases for ATLAS. Also include job brokering. Potential to collaborate with 1)? Graeme Stewart: Panda Dynamic Data Placement. (name?): LHCb/Dirac – very similar ideas to 3). Gerd Behrman: ARC caching technology: propose to improve the front-end to be able to work without the ARC Control Tower, also to decouple the caching tool from the CE. Jean-Philippe Baud: Catalogue synchronisation with storage using the Active MQ message broker. a) add files, catalogue them and propagate to other catalogues; b) remove entries when files are lost if a disk fails; c) remove a dataset from a central catalogue and propagate to other catalogues. Simon Metson: DAS for CMS. Aim to have a demo in the summer. Oscar Koeroo: Demonstrate that Cassandra (from Apache) can provide a complete cataloguing and messaging system. Pablo Saiz: Based on Alien FC – comparison of functionality, and demonstration of use in another experiment. Jeff Templon: Demonstrate the Coral Content Delivery Network – essentially as-is. Proposed metrics for success. Peter Elmer: wants to show workflow management mapping to the available hardware (relevant to use of multi-core hardware). Dirk Duellmann/Rene Brun: prototype proxy-cache based on xrootd. Can be used now to test several things. Jean-Philippe Baud+Gerd Behrman + Andrei Maslennikov + DESY: use of NFS4.1 as access protocol. Jens Jensen + (other name?): simple ideas to immediately speed up use of SRM and to quickly improve the lcg-cp utility