EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
The Storage Resource Broker and.
Peter Berrisford RAL – Data Management Group SRB Services.
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Integration of Data Grids, Digital Libraries, and Persistent.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
INFSO-RI Enabling Grids for E-sciencE Grid & Data Preservation Boon Low System Development, EGEE Training National.
Security Requirements for Shared Collections Storage Resource Broker Reagan W. Moore
VL-e PoC Introduction Maurice Bouwhuis VL-e work shop, April 7 th, 2006.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
EGEE-III INFSO-RI Enabling Grids for E-sciencE The Medical Data Manager : the components Johan Montagnat, Romain Texier, Tristan.
Enabling Grids for E-sciencE ENEA and the EGEE project gLite and interoperability Andrea Santoro, Carlo Sciò Enea Frascati, 22 November.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware Data Management in gLite.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks AMGA PHP API Claudio Cherubino INFN - Catania.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Data Grids, Digital Libraries, and Persistent Archives Reagan.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Michael Doherty RAL UK e-Science AHM 2-4 September 2003 SRB in Action.
INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth System Science S. Kindermann, DKRZ, Germany.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Introduction to The Storage Resource.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Data management in LCG and EGEE David Smith.
Data and storage services on the NGS.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
The Storage Resource Broker and.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data management in EGEE.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra gLite 1.4 Data Management System Salvatore Scifo, Riccardo Bruno Test.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Enabling Grids for E-sciencE EGEE-II INFSO-RI The Development of SRM interface for SRB Fu-Ming Tsai Academia Sinica Grid Computing.
gLite Basic APIs Christos Filippidis
The Data Grid: Towards an architecture for Distributed Management
Introduction to Data Management in EGI
GSAF Grid Storage Access Framework
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Data services in gLite “s” gLite and LCG.
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing

Enabling Grids for E-sciencE EGEE-II INFSO-RI Outlines Introduction Characteristics of data grid Storage Resource Management (SRM) Storage Resource Broker (SRB) –SRB Practical Summary

Enabling Grids for E-sciencE EGEE-II INFSO-RI Data Storage in Large Scales Historically data has been STORED rather than MANAGED The amount of data grows so rapidly that traditional storage architectures are no longer suitable Data are distributed in multiple types of source – hard to integrate data and increase the barriers between users and storage systems

Enabling Grids for E-sciencE EGEE-II INFSO-RI Challenges of Data Storage Large scales of data Distributed storage via network Heterogeneous data resources Management data with efficiency and safety Long-term preservation

Enabling Grids for E-sciencE EGEE-II INFSO-RI The Solution: Data Grid Data virtualization –Manipulates data in high level –Hides details in low level Provides a uniform interface to access the distributed data storage systems

Enabling Grids for E-sciencE EGEE-II INFSO-RI Virtualization Data virtualization Trust virtualization Data grids are used to manage shared collections that are distributed across multiple sites and multiple storage systems

Enabling Grids for E-sciencE EGEE-II INFSO-RI Data Grid - The Idea Data Grid Data found Request for Data Client Users

Enabling Grids for E-sciencE EGEE-II INFSO-RI Data Grid - The Idea Data found Request for Data Client Users Details are hidden. The data grid system finds out where the data are located. Data Grid System

Enabling Grids for E-sciencE EGEE-II INFSO-RI The Goals of Data Grid Automating preservation process Mitigating risk of data loss Supporting retrieval and access A uniform interface to manage digital entities stored in any type of storage system Providing access through wide varieties of access mechanisms Support of technology evolution

Enabling Grids for E-sciencE EGEE-II INFSO-RI Data Grid Transparencies Find data without knowing the identifier –Descriptive attributes Access data without knowing the location –Logical name space Access data without knowing the type of storage –Storage repository abstraction Retrieve data using your preferred API –Access abstraction Provide transformations for any data collection –Data behavior abstraction

Enabling Grids for E-sciencE EGEE-II INFSO-RI Data Grid Components Federated client-server architecture –Servers can talk to each other independently of the client Infrastructure independent naming –Logical names for users, resources, files, applications Collective ownership of data –Collection-owned data, with infrastructure independent access control lists Context management –Record state information in a metadata catalog from data grid services such as replication Abstractions for dealing with heterogeneity

Enabling Grids for E-sciencE EGEE-II INFSO-RI Data Grid Architecture Unix Shell Java, NT Browsers OAI, WSDL OGSA HTTP Archives - Tape, HPSS, ADSM, UniTree, DMF, CASTOR,ADS Databases DB2, Oracle, Sybase, SQLserver,Postgres, mySQL, Informix File Systems Unix, NT, Mac OSX Application ORB Standard Storage System Operations Interface Standard Database Interface Databases DB2, Oracle, Sybase, Postgres, mySQL, Informix C, C++, Java Libraries Logical Name Space Management Latency Management Digital Component Transport Metadata Transport Consistency & Metadata Management / Authorization-Authentication Audit Linux I/O DLL / Python, Perl Federation Management

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRM The Data Grid Interface for EGEE Grid Middleware

Enabling Grids for E-sciencE EGEE-II INFSO-RI gLite Services

Enabling Grids for E-sciencE EGEE-II INFSO-RI SE Storage Element –The Storage Element is the service which allows a user or an application to store data –Data Channel Protocols  File Transfer and File I/O

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRM (Storage Resource Management) What is SRM? –SRM is a protocol to manage storage resources (It is NOT a file access protocol!) –Provides an uniform interface for computing applications and client users to heterogeneous storage elements –Does not transfer files itself –Provides space management –Manage the life time of file

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRM & Grid

Enabling Grids for E-sciencE EGEE-II INFSO-RI Grid files Grid Files –Files in the Grid can be referred by different names:  Logical File Name (LFN) : An alias created by a user to refer to some item of data. For example, /grid/gilda/gridcamp/testFile.txt  Grid Unique IDentifier (GUID) : A non-human-readable unique identifier for an item of data. For example, 37afd0cc-c53b-4795-a873-6a9dde35a9cc  Site URL (SURL) : The location of an actual piece of data on a storage system. For example, srm://dpm01.grid.sinica.edu.tw/dpm/grid.sinica.edu.tw/home/twgrid/generated/ /file4c4a5a6f- 878d-4ef3-a73d-941ae  Transport URL (TURL) : Temporary locator of a replica + access protocol: understood by a SE. For example, gsiftp://dpm01.grid.sinica.edu.tw/dpm01.grid.sinica.edu.tw:/path1/twgrid/ /file4c4a5a6f-878d-4ef3-a73d-941ae –While the GUIDs and LFNs identify a file irrespective of its location, the SURLs and TURLs contain information about where a physical replica is located, and how it can be accessed.

Enabling Grids for E-sciencE EGEE-II INFSO-RI LFC File Catalogue (LFC) –The mappings between LFNs, GUIDs and SURLs are kept in a File Catalogue service, while the files themselves are stored in Storage Elements. –The only file catalogue officially supported in WLCG/EGEE is the LCG File Catalogue (LFC). Mapping by the “LFC” catalogue server

Enabling Grids for E-sciencE EGEE-II INFSO-RI Upload a file to a SE CASE 1 User needs to store data in SE (from a UI) 1.Create a new LFN entry in LFC, return a SURL. 2.srmPrepateToPut (SURL) 3.Transfer the file 4.srmPutDone (SURL)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Upload a file to a SE CASE 2 Application needs to store data in SE (from a WN) 1.Create a new LFN entry in LFC, return a SURL. 2.srmPrepateToPut (SURL) 3.Transfer the file 4.srmPutDone (SURL)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Download files from a SE CASE 3 User needs to retrieve (onto the UI) data stored into SE 1.Query the file catalog to retrieve the SURL from the LFN. 2.srmPrepateToGet (SURL) 3.Transfer the file (read) 4.srmReleaseFile (SURL)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Download files from a SE CASE 4 Application needs to copy data locally (into the WN) and use them. 1.Query the file catalog to retrieve the SURL from the LFN. 2.srmPrepateToGet (SURL) 3.Transfer the file (read) 4.srmReleaseFile (SURL)

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRB Storage Resource Broker

Enabling Grids for E-sciencE EGEE-II INFSO-RI Storage Resource Broker Developed at San Diego Supercomputer Center A distributed file management system (Data Grid), based on a client-server architecture A uniform interface to heterogeneous data storage resources, Based upon their attributes rather than just their names or physical locations Support many data storage systems Provide various types of client interfaces on different platforms

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRB Physical Structure Oracle ClientSRB Server Oracle RDBMS SRB location A SRB location B SRB location D SRB Server Storage Space Storage Driver SRB Server Storage Space Storage Driver SRB Server Storage Space Storage Driver location X MCAT-Enabled Server

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRB Pratical - inQ Download inQ from Unzip inQ350.zip Execute inQ.exe

Enabling Grids for E-sciencE EGEE-II INFSO-RI inQ – Login Name: srbusr+your number Host: tap07.grid.sinica.edu.tw Domain: ASGC Port: 6833 Authorization: ENCRYPT1 Password: The same as your user name

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRB Client Tool - inQ

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRB Demonstrations Use InQ to upload, download, remove files. Use Scommands to upload, download, remove files. –Sinit: log in SRB system  Syntax: Sinit –Sls: list directory content  Syntax: Sls –Sput: upload a file to the SRB server  Syntax: Sput filename –Sget: download a file from the SRB server  Syntax: Sget filename –Srm: remove a file stored in SRB server  Syntax: Srm filename –Sreplicate: to replicate data to another resource  Syntax: Sreplicate filename –Sexit: log out SRB system  Syntax: Sexit

Enabling Grids for E-sciencE EGEE-II INFSO-RI Data Grid Applications Digital Archiving –Long-term preservation –Heterogeneous backup Digital Library –Data sharing Scientific Computing

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRB Use Case Build Data Grid Management System –Data Grid services in Academia Sinica –NDAP cross-organization data backup project

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRB Data Grid Services in Academia Sinica (1) Objective –To provide Grid services for long-term preservation and unified data access Data Collection Status –File size: ~ 60 TB –File count: ~ 3.5 Million

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRB Data Grid Service in Academia Sinica(2)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Academia Sinica (AS) National Palace Museum (NPM) National Taiwan University (NTU) National Museum of History (NMH) Academia Historica (DRNH) National Central Library (NCL) National Museum of Natural Science (NMNS) Taiwan Historica (TH) NDAP Partners For Long-term Data Preservation

Enabling Grids for E-sciencE EGEE-II INFSO-RI Data grid for NDAP LTP service

Enabling Grids for E-sciencE EGEE-II INFSO-RI Summary Data grids provides a new solution for large-scale storage with the following features: –Distributed data storage –Efficient and safe management of data –A uniform interface to heterogeneous systems –Flexibility to new storage technology

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRM & SRB SRM –Used in gLite middleware –A uniform interface between different SEs and grid middleware SRB –Developed by SDSC –Support many backend storage systems –Widely used data grid software

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRM & SRB SRM and SRB cannot interoperate unless they have a standard to communicate Constructing a bridge between SRM and SRB so that –Integrate SRB into the gLite environment –Bind resources from the two important data grid systems –This project is currently developed by ASGC

Enabling Grids for E-sciencE EGEE-II INFSO-RI SRM & SRB

Enabling Grids for E-sciencE EGEE-II INFSO-RI iRODS A next generation data grid system after SRB developed by SDSC A rule-oriented data grid system More flexibility for data management Current version: iRODS 1.0

Enabling Grids for E-sciencE EGEE-II INFSO-RI iRODS Workshop Time – Tue 8 April 2008 Location – 2nd Conference Room, 3F For more information, please check on ISGC 2008 Website

Enabling Grids for E-sciencE EGEE-II INFSO-RI References [1] Use Cases on Data Services, Fu-Ming Tsai [2] Building Preservation Environments with Data Grid Technology, R. Moore [3] EGEE Middleware Architecture and Planning (Release 1)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Thanks for your attentions!