Data management in grid. Comparative analysis of storage systems in WLCG.

Slides:



Advertisements
Similar presentations
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Advertisements

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
A Very Brief Introduction to iRODS
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
A. Sim, CRD, L B N L 1 Data Management Foundations Workshop, Mar. 3, 2009 Storage in OSG and BeStMan Alex Sim Scientific Data Management Research Group.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.
File and Object Replication in Data Grids Chin-Yi Tsai.
Data Management The GSM-WG Perspective. Background SRM is the Storage Resource Manager A Control protocol for Mass Storage Systems Standard protocol:
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
1 Meeting Location: LBNL Sept 18, 2003 The functionality of a Replica Registration Service Attendees Michael Haddox-Schatz, JLAB Ann Chervenak, USC/ISI.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
Computing Sciences Directorate, L B N L 1 CHEP 2003 Standards For Storage Resource Management BOF Co-Chair: Arie Shoshani * Co-Chair: Peter Kunszt ** *
SRM Monitoring 12 th April 2007 Mirco Ciriello INFN-Pisa.
1 SRM-Lite: overcoming the firewall barrier for data movement Arie Shoshani Alex Sim Viji Natarajan Lawrence Berkeley National Laboratory SDM Center All-Hands.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/06/07 SRM v2.2 working group update Results of the May workshop at FNAL
CERN SRM Development Benjamin Coutourier Shaun de Witt CHEP06 - Mumbai.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Storage and Data Movement at FNAL D. Petravick CHEP 2003.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Data management in LCG and EGEE David Smith.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite Data Management Components Presenter.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
1 Xrootd-SRM Andy Hanushevsky, SLAC Alex Romosan, LBNL August, 2006.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
ALICE experiences with CASTOR2 Latchezar Betev ALICE.
ASCC Site Report Eric Yen & Simon C. Lin Academia Sinica 20 July 2005.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Storage Element Model and Proposal for Glue 1.3 Flavia Donno,
SRM-iRODS Interface Development WeiLong UENG Academia Sinica Grid Computing 1.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Grids and Clouds Interoperation: Development of e-Science Applications Data Manager on Grid Application Platform Hsin-Yen Chen & Wei-Long Ueng Academia.
G RID D ATA M ANAGEMENT. D ATA M ANAGEMENT Distributed community of users need to access and analyze large amounts of data Requirement arises in both.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
OSG STORAGE OVERVIEW Tanya Levshina. Talk Outline  OSG Storage architecture  OSG Storage software  VDT cache  BeStMan  dCache  DFS:  SRM Clients.
Security recommendations DPM Jean-Philippe Baud CERN/IT.
9/20/04Storage Resource Manager, Timur Perelmutov, Jon Bakken, Don Petravick, Fermilab 1 Storage Resource Manager Timur Perelmutov Jon Bakken Don Petravick.
A. Sim, CRD, L B N L 1 Production Data Management Workshop, Mar. 3, 2009 BeStMan and Xrootd Alex Sim Scientific Data Management Research Group Computational.
User Domain Storage Elements SURL  TURL LFC Domain (LCG File Catalogue) SA1 – Data Grid Interoperation Enabling Grids for E-sciencE EGEE-III INFSO-RI
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
EGEE Data Management Services
a brief summary for users
CASTOR: possible evolution into the LHC era
The Data Grid: Towards an architecture for Distributed Management
StoRM: a SRM solution for disk based storage systems
Vincenzo Spinoso EGI.eu/INFN
Status of the SRM 2.2 MoU extension
Future of WAN Access in ATLAS
Introduction to Data Management in EGI
SRM Developers' Response to Enhancement Requests
Introduction to reading and writing files in Grid
GFAL 2.0 Devresse Adrien CERN lcgutil team
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
CTA: CERN Tape Archive Overview and architecture
Data Management cluster summary
CASTOR: CERN’s data management system
INFNGRID Workshop – Bari, Italy, October 2004
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
Presentation transcript:

Data management in grid. Comparative analysis of storage systems in WLCG.

Really Two Data Problems The amount of data –High-performance tools needed to manage the huge raw volume of data Store it Move it –Measure in terabytes, petabytes, and ??? The number of data files –High-performance tools needed to manage the huge number of filenames filenames is expected soon Collection of of anything is a lot to handle efficiently

Data Questions on the Grid Questions for which you want Grid tools to address Where are the files I want? How to move data/files to where I want?

Data intensive applications Medical and biomedical: –Image processing (digital X-ray image analysis) –Simulation for radiation therapy Climate studies Physics: –High Energy and other accelerator physics –Theoretical physics, lattice calculations of all sorts Material sciences

LHC as a data source 500 MB/sec 15 PB/year 15 years

A Model Architecture for Data Grids Metadata Catalog Replica Catalog Tape Library Disk Cache Attribute Specification Logical Collection and Logical File Name Disk ArrayDisk Cache Application/ Data Management System Replica Selection Multiple Locations Selected Replica SRM commands Performance Information and Predictions Replica Location 1Replica Location 2Replica Location 3 MDS

SRM: Main concepts –Space reservations –Dynamic space management –Pinning file in spaces –Support abstract concept of a file name: Site URL –Temporary assignment of file names for transfer: Transfer URL –Directory management and authorization –Transfer protocol negotiation –Support for peer to peer request –Support for asynchronous multi-file requests –Support abort, suspend, and resume operations –Non-interference with local policies

Storage properties Access Latency (ONLINE, NEARLINE, OFFLINE) Retention Policy (REPLICA, OUTPUT, CUSTODIAL)

Use cases Access Latency (ONLINE, NEARLINE, OFFLINE) Retention Policy (REPLICA, OUTPUT, CUSTODIAL)

Logical File Name (LFN) Also called a User Alias, In case the LCG File Catalog is used the LFNs are organized in a hierarchical directory-like structure, and they will have the following format: lfn:/grid/ / /

Site URL and Transfer URL Provide: Site URL (SURL)Provide: Site URL (SURL) –URL known externally – e.g. in Replica Catalogs –e.g. srm://ibm.cnaf.infn.it:8444/dteam/test Get back: Transfer URL (TURL)Get back: Transfer URL (TURL) –Path can be different from SURL – SRM internal mapping –Protocol chosen by SRM based on request protocol preference –e.g. gsiftp://ibm139.cnaf.infn.it:2811//gpfs/sto1/dteam/test One SURL can have many TURLsOne SURL can have many TURLs –Files can be replicated in multiple storage components –Files may be in near-line and/or on-line storage –In a light-weight SRM (a single file system on disk) SURL may be the same as TURL except protocol

July Lecture 4: Grid Data Management12 Third party transfer Controller can be separate from src/dest Site A Site B Control channels Data channel Server Client

July Lecture 4: Grid Data Management13 Going fast – parallel streams Use several data channels Site A Site B Control channel Data channels Server

SRB (iRODS) SDSC SINICA LBNL EGEE Interoperability in SRM v2.2 Client User/application CASTOR DPM Disk BeStMan xrootd BNL SLAC LBNL dCache

Total Online Space Share

Popularity

CASTOR Architecture VDQM server NAME server STAGER RFIOD (DISK MOVER) DISK POOL MSGD NAME server VOLUME manager RTCPD (TAPE MOVER) TPDAEMON (PVR) VDQM server CUPV RFIO Client

Basic dCache Design

DPM Very important to backup ! Store physical files -- Namespace -- Authorization -- Replicas -- DPM config -- All requests (SRM, transfers…) Standard Storage Interface Can all be installed on a single machine

EOS: What is it... Easy to use standalone disk-only storage for user and group data with in-memory namespace – Few ms read/write open latency – Focusing on end-user analysis with chaotic access – Based on XROOT server plugin architecture – Adopting ideas implemented in Hadoop, XROOT, Lustre et al. – Running on low cost hardware no high-end storage – At CERN: Complementary to CASTOR

EOS: Access Protocol EOS uses XROOT as primary file access protocol – The XROOT framework allows flexibility for enhancements Protocol choice is not the key to performance as long as it implements the required operations – Client caching matters most Actively developed, towards full integration in ROOT (rewrite of XRootD client at CERN) SRM and GridFTP provided as well – BeStMan, GridFTP-to-XROOT gateway

Thank you Grid, Storage and SRM. OSG. Managed Data Storage and Data Access Services for Data Grids. M. Ernst, P. Fuhrmann, T. Mkrtchyan DESY J. Bakken, I. Fisk, T. Perelmutov, D. Petravick Fermilab dCache. Dmitry Litvintsev, Fermilab. OSG Storage Forum, September 21, 2010 GridFTP: File Transfer Protocol in Grid Computing Networks. Caitlin Minteer Light weight Disk Pool Manager status and plans. Jean-Philippe Baud, IT- GD, CERN Storage and Data Management in EGEE, Graeme A Stewart1, David Cameron, Greig A Cowan and Gavin McCance and many others