Evolution of storage and data management Ian Bird GDB: 12 th May 2010.

Evolution of storage and data management Ian Bird GDB: 12 th May 2010

Discussion on evolving WLCG Storage and Data Management Background: – Experiments’ concern with performance and scalability of access to data Particularly for analysis – Concern over long term sustainability of existing solutions – Recognize: Evolution of technology (networking, file systems, etc) Huge amounts of disk available to experiments (on aggregate!) Other communities exist that: have solved similar problems; will have similar problems soon – Short term concerns: Performance, scalability, bottlenecks (e.g. SRM) – The MONARC model assumed networking was a problem It is actually something we should invest more in Ian.Bird@cern.ch2

Scope Focus on analysis use cases – Production is in hand Start from viewpoint of user access to data – Hide details of back-end storage systems and etc. Timescale for solutions in production – 2013 run – But with incremental working prototypes that address some of the short term concerns Should not be seen as a reason for new development projects – Must use available tools as far as possible – must keep long term sustainability in mind Working model: – More network-centric than now – Few large archival repositories – long term data curation – “Cloud” of storage making optimal use of available capacity Should be a single effort of the community – There may be funding opportunities – but we should make sure that they help us in our goals and obtaining what we need Must be driven by the real needs of the experiments... – But we must learn the lesson of SRM – too many unrealistic “requirements” and no consensus on implementation Ian.Bird@cern.ch3

Technical areas Data archives and storage cloud – Simplify so that “tape” is really an archive – Allow remote data access when needed – Look into peer-peer technologies Data access layer – E.g. Xrootd/GFAL + some intelligence to determine when to cache/when to use remote access Output datasets – Still need a service for (asynchronous) movement of datasets to an archive Global home directory facility – Model is a global file system. Industrial solutions available? “Catalogues” – Still need to locate data. Issue of consistency between different storage systems. Authorization mechanisms – For access to files in storage systems (archive + cloud), and quotas etc. Ian.Bird@cern.ch4

Next steps Jamboree – 3-day workshop to look at existing tools in each of the areas: June 16-18 in Amsterdam Elaboration of a more concrete plan and timelines; set up working groups – Could be at WLCG workshop in London, July 7-9 Develop demonstrator prototypes in each area – testing of components/technologies Experiment testing – integration into frameworks Ian.Bird@cern.ch5

Jamboree agenda – 1 10:00Welcome & Introduction to the Jamboree - goals of the workshop Ian 10:15Presentation of a strawman for a new model of data access and management - should include background and rationale behind this Experiment person (Ian F/Kors?) 11:00Status of networking (cf expectation in 2001), prospects including intercontinental and far (e.g. Asia) connectivity David Foster 11:45Back-end storage - to be used as a true archive: implications /simplifications possible. Should discuss the need to not know about or have to access back-end storage (tape). ==> What should be the interface to storage? Dirk Duellmann? 12:30Lunch 14:00Needs from a data access layer - how analysis use needs to access data, etc.Experiment person 14:45Data transfer use cases: peer-peer (many-many) or point-point. On demand vs scheduled. Experiment person 15:30Coffee 15:45Namespaces, authorization needs, quotas, catalogues - what is needed?Experiment person 16:30The need for global home directory service - missing so far. What are the use cases? Experiment person 17:00Conclusions - summary of discussion points, points arising. Progress on model? Short term improvements possible? Ian.Bird@cern.ch6

Agenda – 2 Technology 9:00File systems. Summary of work on Hadoop, Lustre, GPFS, etc.Hepix WG? 9:45Xrootd - outlookFabrizio Furano 10:30Coffee 10:45FTS, LFC - what is OK, what is not? What is still useful??? 11:30Alien FC - why is it interesting?Pablo Saiz 12:15P2P technologies - can we simply adapt them??? 13:00Lunch 14:00ROOT - outlook and developmentsRene Brun Site Experiences 14:45NDGF/ARCJosva Kleist 15:15GridKa - xrootd with tape back-end?? 15:45Coffee 16:00CNAF - GPFS/TSM/Storm?? 16:30Hadoop at a Tier 2Brian Bockelman? 17:00Summary and conclusions - are there areas for potential early demonstrators? Ian.Bird@cern.ch7

Agenda – 3 9:00Conclusions of the discussions Summaries from secretaries of Days 1 and 2 10:00Next steps: points to address: - Draft an outline plan with timelines; - Create working groups - agree mandates and convenors. Set expectations for July workshop 11:00Summary: draft summary of the workshop and conclusions for presentation to wider community 12:00Close Ian.Bird@cern.ch8

Evolution of storage and data management Ian Bird GDB: 12 th May 2010.

Similar presentations

Presentation on theme: "Evolution of storage and data management Ian Bird GDB: 12 th May 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Evolution of storage and data management Ian Bird GDB: 12 th May 2010.

Similar presentations

Presentation on theme: "Evolution of storage and data management Ian Bird GDB: 12 th May 2010."— Presentation transcript:

Similar presentations

About project

Feedback