Data & Storage Management TEGs Summary of recommendations Wahid Bhimji, Brian Bockelman, Daniele Bonacorsi, Dirk Duellmann GDB, CERN 18 th April 2012.

Slides:



Advertisements
Similar presentations
Storage Workshop Summary Wahid Bhimji University Of Edinburgh On behalf all of the participants…
Advertisements

Wahid Bhimji SRM; FTS3; xrootd; DPM collaborations; cluster filesystems.
Data Management TEG Status Dirk Duellmann & Brian Bockelman WLCG GDB, 9. Nov 2011.
Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.
Storage TEG “emerging” observations and recommendations Wahid Bhimji With contributions from the SM editors (listed in intro)
LHCb input to DM and SM TEGs. Remarks to DM and SM TEGS Introduction m We have already provided some input during our dedicated session of the TEG m Here.
PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…
CERN IT Department CH-1211 Geneva 23 Switzerland t Storageware Flavia Donno CERN WLCG Collaboration Workshop CERN, November 2008.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Your university or experiment logo here Storage and Data Management - Background Jens Jensen, STFC.
Data and Storage Evolution in Run 2 Wahid Bhimji Contributions / conversations / s with many e.g.: Brian Bockelman. Simone Campana, Philippe Charpentier,
Placeholder ES 1 CERN IT Experiment Support group Authentication and Authorization (AAI) issues concerning Storage Systems and Data Access Pre-GDB,
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
Evolution of storage and data management Ian Bird GDB: 12 th May 2010.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Storage Interfaces Introduction Wahid Bhimji University of Edinburgh Based on previous discussions with Working Group: (Brian Bockelman, Simone Campana,
The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Storage Classes report GDB Oct Artem Trunov
Data management demonstrators Ian Bird; WLCG MB 18 th January 2011.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data Management Highlights in TSA3.3 Services for HEP Fernando Barreiro Megino,
Storage Interfaces and Access pre-GDB Wahid Bhimji University of Edinburgh On behalf of all those who participated.
Storage Management Technical Evolution Group Wahid Bhimji Daniele Bonacorsi.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Storage Interfaces and Access: Interim report Wahid Bhimji University of Edinburgh On behalf of WG: Brian Bockelman, Philippe Charpentier, Simone Campana,
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Wahid Bhimji (Some slides are stolen from Markus Schulz’s presentation to WLCG MB on 19 June Apologies to those who have seen some of this before)
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
Evolution of WLCG infrastructure Ian Bird, CERN Overview Board CERN, 30 th September 2011 Accelerating Science and Innovation Accelerating Science and.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
Outcome should be a documented strategy Not everything needs to go back to square one! – Some things work! – Some work has already been (is being) done.
Report TEG WLCG Data and Storage Management Giacinto DONVITO INFN-IGI 14/05/12Workshop INFN CCR - GARR
EMI is partially funded by the European Commission under Grant Agreement RI Future Proof Storage with DPM Oliver Keeble (on behalf of the CERN IT-GT-DMS.
CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
Operations Coordination Team Maria Girone, CERN IT-ES GDB, 11 July 2012.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Federating Data in the ALICE Experiment
Jean-Philippe Baud, IT-GD, CERN November 2007
WLCG Workshop 2017 [Manchester] Operations Session Summary
WLCG Network Discussion
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Vincenzo Spinoso EGI.eu/INFN
Status of the SRM 2.2 MoU extension
Storage Interfaces and Access: Introduction
Storage / Data TEG Introduction
Introduction to Data Management in EGI
SRM Developers' Response to Enhancement Requests
Taming the protocol zoo
Final summary Ian Bird Amsterdam, DAaM 18th June 2010.
EGI UMD Storage Software Repository (Mostly former EMI Software)
Ákos Frohner EGEE'08 September 2008
Data Management cluster summary
Presentation transcript:

Data & Storage Management TEGs Summary of recommendations Wahid Bhimji, Brian Bockelman, Daniele Bonacorsi, Dirk Duellmann GDB, CERN 18 th April 2012

WLCG Technical Evolution Groups (see also John’s talk) “Reassess the implementation of the grid infrastructures that we use in the light of the experience with LHC data, and technology evolution….” Achieving commonalities between experiments where possible, etc. etc. Several groups – most relevant here are – Data Management (chairs: Brian Bockelman, Dirk Duellmann)Data Management – Storage Management (chairs Wahid, Daniele Bonacorsi)Storage Management

Process Information Gathering Information Gathering Synthesis / Exploration/ Orientation Synthesis / Exploration/ Orientation Refinement Initial questionnairequestionnaire Defined topics [TopicsDataStorageTEG]TopicsDataStorageTEG Soon Data / Storage TEG merged really.. Questions to experiments: Experiment Presentations and Twikis [ALICE; ATLAS;CMS;LHCb]ALICE ATLASCMSLHCb Storage Middleware presentations: Face-to-face session for each topic plus broader discussions. Developed : Layer Diagram: Overarching picture Recommendations under each topic See: Final and draft report “Emerging” recommendations Recommendations Nov 2011 Jan 2012: Face-to-face Apr 2011 Feb 2012: GDB

Layer Diagram Data Storage = Experiment Site ?? Certainly not the case now:…. Layer diagram to map out architecture and responsibilities

5 Data Placement

6 Placement with Federation

7 Storage Element Components

8 Examples of current SE’s

9 Examples of (possible) future SE’s

Recommendations and Observations

Placement & Federations Current option is only xrootd – Activity in http that should be supported (e.g. DPM)DPM – (NFS 4.1 possible but not near happening for this) – R1: HTTP plugin to xrootd Activity in ALICE ; CMS; ATLAS All anticipate < 10 % of traffic this way – R2: Monitoring of federation network bandwidth Breakdown of what features experiments expect. – R3: Topical working groups on open questions

12 Topical Working Groups Launch working groups to follow up a list of technical topics in the context of the GDB: Detailing the process of publishing new data into the read-only placement layer Investigating a more strict separation of read-only and read-write data for increased scaling, stricter consistency guarantees and possibly definition of pure read-only cache implementations with reduced service levels (i.e. relevance for higher Tier sites). Feasibility of moving a significant fraction of the current (read-only) data to world readable access avoiding the protocol overhead of fully authenticated protocols (assuming auditing to protect against denial of service attacks). Investigating federation as repair mechanism of placed data; questions to answers would be: Who initiates repair? Which inter-site trust relationship needs to be in place? How proactive is this repair? (e.g. regular site checksum scans or repair & redirect after checksum mismatch) How is the space accounting done? How do we address the repair of missing metadata?

Point-to-point Protocols GridFTP is ubiquitous and must be supported in medium term R4: gridFTP: use recent versions; exploit session reuse. Xrootd is currently used alternative: R5: Ensure xrootd well supported on all systems HTTP again a serious option (DPM dCache tests) R6: HTTP: continue tests; explore at scale

Managed Transfer (FTS) FTS is the only tool and used for more than transfer – Though experiments will go their own way if need be R7: Update FTS3 workplan to include use of replicas; http transfers; staging from archive R8: Cross-experiment test of FTS3 features Management of Catalogues and Namespaces R9: LFC not needed (by LHC) in med-term: – Could be repurposed and useful tools (e.g. for consistency checking) should work with other catalogues Also : Storage system quotas not needed (handled by experiment)

Separation of archives and disk pools/caches All experiments will split archive (tape) and cache (disk pools): – Atlas; LHCb; Alice already: CMS plan for this year R10: “HSM” still to be supported to manage disk buffer. A large separate disk pool managed through transfer offers advantages: – Performance: Lots of spindles. – Practicality: Need not be at same site. R11: FTS should support staging (see R7); Experiment workflows should support this transfer model

SRM: Ubiquitous; Needed in short-term buried in exp. frameworks; Practical advantages from common layer BUT: Not all functions needed/implemented; Performance concerns; Industry not using (and developing alternatives); Experiment frameworks adapting for alternatives. Storage Interfaces: SRM and Clouds

SRM: Looked at each functional component: Which used: (see big table in report for details) Functional GroupUsage Observation Storage Capacity Management For Space Management: Only space querying used (LHCb; ATLAS) (not dynamic reservation, moving between spaces etc.) File Locality Management For Service Classes: on medium term, spacetokens could be replaced by namespace endpoints (no orthogonality required) For Archives: bringOnline (and pinning) needed – no replacement. Transfer protocol negotiation Data access interface (get tURL from SURL): needed by LHCb: Alternatives exist: e.g algorithms or rule-based lookup Load balancing and backpressure: Needed but alternatives exist (and backpressure not imp. in SRM) Transfer and Namespace FTS and lcg-utils at least should support alternatives Looked at alternatives: Some used by WLCG currently (GridFTP ; xrootd) Some in industry (S3; WebDav; CDMI) Mapped to functions: (see big table in report for details)

Storage Interfaces: Recommendations R12: Archive sites: maintain SRM as there’s no replacement Non-archive no alternative yet for everything: But experiments already looking at integrating R13: Working group should evaluate suitability targeting subset of used functions identified in report Ensuring alternatives are scalable and supportable must be supported by FTS and lcg_utils for interoperability R14: Working group should monitor and evaluate emerging developments in storage interfaces (e.g. Clouds) so experiments work together on long term solutions.

Storage Performance: (Experiment I/O usage, LAN protocols, evolution of storage) R15: Benchmarking and I/O requirement gathering Develop benchmarks; Experiments forecast bandwidth IOPS and bandwidth needs; storage supports measurement of these. R16: Protocol support and evolution Experiments can use anything ROOT supports But move towards fewer protocols and direct access supported. ROOT; http direct access; and NFS4.1 should be developed R17: I/O error management and resilience Explicitly determine storage error types and ensure application handling R18: Storage technology review Incorporating vendors; spreading information between sites. R19: High-throughput computing research Not restricted to current data formats (ROOT); Hadoop style processing orNextBigThing TM

Storage Operations: Site-run services: monitoring; accounting etc R20: Site involvement in protocol and requirement evolution: ie. site representatives on storage interface working group to ensure proposals are manageable by them R21: Expectations on data availability. Handling of data losses Experiments should state data loss expectations (in MoU) and reduce dependence on “cache” data. Common site policies for data handling (examples in report) R22: Improved activity monitoring: Both popularity and access patterns R23: Storage accounting Support StAR accounting recordStAR accounting record

Security Separate document with Security TEG.document with Security TEG Areas that need attention in the near term: R25: Removal of “backdoors” from CASTOR R26: Checks of the actual permissions implemented by Storage Elements. R27: Tackling the issues with data ownership listed in document (e.g. ex. VO members; files owned by VO rather than individual) POOL Persistency Recently LHCb moved, so now ATLAS specific sw. Atlas also plan a move so: R24: POOL development not required in medium term