Evolution of storage and data management

Slides:



Advertisements
Similar presentations
HEPiX Edinburgh 28 May 2004 LCG les robertson - cern-it-1 Data Management Service Challenge Scope Networking, file transfer, data management Storage management.
Advertisements

CERN Summary Ian Bird eInfrastructure Workshop 9 December, 2003.
Chapter 6 Prototyping, RAD, and Extreme Programming
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Ian Bird WLCG Workshop Okinawa, 12 th April 2015.
Ian Bird WLCG Management Board CERN, 17 th February 2015.
Ian Bird LHCC Referees’ meeting; CERN, 11 th June 2013 March 6, 2013
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Ian Bird LHCC Referee meeting 23 rd September 2014.
Workshop summary Ian Bird, CERN WLCG Workshop; DESY, 13 th July 2011 Accelerating Science and Innovation Accelerating Science and Innovation.
Ian Bird GDB; CERN, 8 th May 2013 March 6, 2013
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IPv6 test methodology Mathieu Goutelle (CNRS.
From the Transatlantic Networking Workshop to the DAM Jamboree to the LHCOPN Meeting (Geneva-Amsterdam-Barcelona) David Foster CERN-IT.
Summary of Data Management Jamboree Ian Bird WLCG Workshop Imperial College 7 th July 2010.
Ian Bird GDB CERN, 9 th September Sept 2015
Fourth IABIN Council Meeting Support to Building the Inter-American Biodiversity Information Network.
Procedure to follow for proposed new Tier 1 sites Ian Bird CERN, 27 th March 2012.
Evolution of storage and data management Ian Bird GDB: 12 th May 2010.
Future computing strategy Some considerations Ian Bird WLCG Overview Board CERN, 28 th September 2012.
Collaboration to Clarify the Costs of Curation CERN Costs Workshop Activities and Approaches to Cost Modelling in the 4C Project 13 – 14 January 2014 Germán.
Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.
Ian Bird WLCG Workshop, Barcelona, 9 th July July 2014 Ian Bird; Barcelona1.
Evolving Security in WLCG Ian Collier, STFC Rutherford Appleton Laboratory Group info (if required) 1 st February 2016, WLCG Workshop Lisbon.
Data management demonstrators Ian Bird; WLCG MB 18 th January 2011.
Ian Bird CERN, 17 th July 2013 July 17, 2013
U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
LHC Computing – the 3 rd Decade Jamie Shiers LHC OPN meeting October 2010.
44222: Information Systems Development
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Evolution of WLCG infrastructure Ian Bird, CERN Overview Board CERN, 30 th September 2011 Accelerating Science and Innovation Accelerating Science and.
Outcome should be a documented strategy Not everything needs to go back to square one! – Some things work! – Some work has already been (is being) done.
Computing Fabrics & Networking Technologies Summary Talk Tony Cass usual disclaimers apply! October 2 nd 2010.
LHCbComputing Update of LHC experiments Computing & Software Models Selection of slides from last week’s GDB
Ian Bird LCG Project Leader Summary of EGI workshop.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI John Gordon EGI Virtualisation and Cloud Workshop Amsterdam 12 th May 2011.
School on Grid & Cloud Computing International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics.
Hall D Computing Facilities Ian Bird 16 March 2001.
HEPiX Virtualisation working group Andrea Chierici INFN-CNAF Workshop CCR 2010.
WLCG Jamboree – Summary of Day 1 Simone Campana, Maria Girone, Maarten Litmaath, Gavin McCance, Andrea Sciaba, Dan van der Ster.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Implications of IoT for Emerging Economies – Strategic Significance, Pitfalls, Challenges and Opportunities Rajeev Tatkar.
Status of Task Forces Ian Bird GDB 8 May 2003.
DAaM summary Ian Bird MB 29th June 2010.
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Computing models, facilities, distributed computing
Working Group 4 Facilities and Technologies
Methodology: Aspects: cost models, modelling of system, understanding of behaviour & performance, technology evolution, prototyping  Develop prototypes.
Scientific Computing Strategy (for HEP)
It’s not all about the tool!
for the Offline and Computing groups
WLCG: TDR for HL-LHC Ian Bird LHCC Referees’ meting CERN, 9th May 2017.
Taming the protocol zoo
Final summary Ian Bird Amsterdam, DAaM 18th June 2010.
Welcome slide.
EGI UMD Storage Software Repository (Mostly former EMI Software)
R&D for HL-LHC from the CWP
LCG Operations Centres
Strategy document for LHCC – Run 4
WLCG Collaboration Workshop;
Connecting the European Grid Infrastructure to Research Communities
Input on Sustainability
System Review – The Forgotten Implementation Step
Investing in Source Water Protection
Archive Management System Project Update
Expanding the PHENIX Reconstruction Universe
Presentation transcript:

Evolution of storage and data management Ian Bird Jamboree, Amsterdam: 16-18th June 2010

Discussion on evolving WLCG Storage and Data Management Background: Experiments’ concern with performance and scalability of access to data Particularly for analysis Concern over long term sustainability of existing solutions Recognize: Evolution of technology (networking, file systems, etc) Huge amounts of disk available to experiments (on aggregate!) Other communities exist that: have solved similar problems; will have similar problems soon Short term concerns: Performance, scalability, bottlenecks (e.g. SRM) The MONARC model assumed networking was a problem It is actually something we should invest more in Ian.Bird@cern.ch

Scope Focus on analysis use cases Production is in hand ... Or maybe not; should benefit from developments Start from viewpoint of user access to data Hide details of back-end storage systems and etc. Timescale for solutions in production – 2013 run But with incremental working prototypes that address some of the short term concerns Should not be seen as a reason for new development projects Must use available tools as far as possible – must keep long term sustainability in mind Also care with choosing “working” tools that need huge integration investment Working model: More network-centric than now Can have fewer large archival repositories – long term data curation “Cloud” of storage making optimal use of available capacity Should be a single effort of the community Common low-level basic tools – complexity at the experiment level? There may be funding opportunities – but we should make sure that they help us in our goals and obtaining what we need Must be driven by the real needs of the experiments ... But we must learn the lesson of SRM – too many unrealistic “requirements” and no consensus on implementation Ian.Bird@cern.ch

Technical areas Data archives and storage cloud Data access layer Simplify so that “tape” is really an archive Allow remote data access when needed Look into peer-peer technologies Data access layer E.g. Xrootd/GFAL + some intelligence to determine when to cache/when to use remote access Output datasets Still need a service for (asynchronous) movement of datasets to an archive Global home directory facility Model is a global file system. Industrial solutions available? “Catalogues” Still need to locate data. Issue of consistency between different storage systems. Authorization mechanisms For access to files in storage systems (archive + cloud), and quotas etc. Ian.Bird@cern.ch

Scope ... HEP (actually WLCG) should agree what it needs and what it can do... Only then should we look for commonalities with other communities Only then should we seek other funding opportunities But ... We should recognise that other communities do have (or soon will have) significant data management challenges And ... We should not need to have HEP-specific solutions (except where we need them...) Ian.Bird@cern.ch

Scope ... again Discussions here will have implications for job management (and other middleware) Ongoing work on virtualisation, etc (workshop next week) Pilot jobs (and pilot factories), VMs and cloud management software mean things can be done differently now: Must concern ourselves with long term supportability of middleware and etc. We will not digress into job management discussions here, but note issues and connections Organise a workshop on job management later to address these issues for the future? Ian.Bird@cern.ch

Jamboree Agenda Today Background documents – input to supplement: Explain strawman Outline thoughts in each major area Background documents – input to supplement: Tomorrow Jamboree – existing technologies, potential solutions Discussion!!! Identify areas for work; identify demonstrator prototypes Friday Summarise – agree work plans and timescales E.g. Prototypes 6 months; decision 1 year; prototype prod 2 years; full deployment 3 years Write a short summary document – for use in discussions with ... Agree demonstrators and prototypes Ian.Bird@cern.ch

Next steps Elaboration of a more concrete plan and timelines; Friday ... set up working groups; propose prototypes Report back in July at WLCG workshop Develop demonstrator prototypes in each area – testing of components/technologies Experiment testing – integration into frameworks Ian.Bird@cern.ch