Presentation is loading. Please wait.

Presentation is loading. Please wait.

OGF24 15 September 2008 Data Area Overview Erwin Laure David E. Martin Data Area Directors.

Similar presentations


Presentation on theme: "OGF24 15 September 2008 Data Area Overview Erwin Laure David E. Martin Data Area Directors."— Presentation transcript:

1 OGF24 15 September 2008 Data Area Overview Erwin Laure David E. Martin Data Area Directors

2 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Data Area Goals The Data Area groups explore different aspects of data handling on grids Access Transport Management Overall Data Architecture developed by OGSA Data Architecture group: http://www.ogf.org/documents/GFD.121.pdf 2

3 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Data Access Goals: locate and provide seamless access to data stored on Grids Data Access and Integration Services (DAIS-WG) Base Specs Published for Database Access (GFD 74,75,76) Implementation in OMII-UK Now Working on Data Access Services for RDF Data Resources Grid File Systems (GFS-WG) Naming Spec Published – Resource Namespace Service (GFD101) Working on Resource Catalog Prototypes from SDSC, UVA, Univ. of Tsukuba Data Format Description Language (DFDL-WG) XML-based languagefor describing the structure of binary and textual files and data streams Simplifying the Concepts and Trying to Remove Complexity to Shorten Draft Spec Prototypes from LANL and IBM Byte IO (ByteIO-WG) Web Service interface for providing "POSIX-like" file functionality (GFD 87,88) Spec Finished Comment, Need to Make Small Changes Production Version from UVA, Will Be in OMII 3

4 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Data Transport OGSA Data Movement Interface (OGSA-DMI-WG) Discover and negotiate proper data transport protocols and manage data transport (GFD134) Working on interoperability GridFTP WG (GridFTP-WG) Grid enabled FTP protocol Spec Published 3 Years Ago (GFD20) Many Production Implementations Need Experience Report for Full Standard 4

5 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Data Management Grid Storage Management (GSM-WG) Storage Resource Manager (SRM) to provide common interface to storage resources (GFD129) Several interoperating implementations in production use Working on 3.0 Spec Information Dissemination (INFOD-WG) Model for Information Dissemination; focus on query-like operations Base specs published (GFD110) Looking at candidates for follow-on Work Storage Networking Community Group (SN-CG) Led by Vincent Franceschini, Chair of SNIA Board Portal to SNIA Work Follow-on to EGA Data Provisioning WG 5

6 Data Grid Specifications and Use Cases Material provided by Andrew Grimshaw (grimshaw@virginia.edu)

7 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Outline Background – The Rule of 3s Specifications Implementations

8 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Classic three layer view Interfaces, e.g. FUSE,SAGA, NFS, CIFS Standard portypes (RNS, ByteIO, WS-DAI, SRM) Resource Provisioning Layer Files, databases, instruments Grid Services Layer Access Layer

9 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Classic 3-layer name scheme … File replica 2 WS-name EPR File replica 1 File replica m RNS file name 1 RNS file name n … Human names Abstract name: EPI, rebinding WS-Names are WS-Addresses with optional EPI and resolver EPR This is essentially a table Addresses

10 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Outline Background – The Rule of 3s Specifications Implementations

11 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Six specs RNS – directory service that maps human names (strings) to abstract names or addresses (EPRs) Insert, delete, list Can build directed graphs, including trees Leaves can be most anything, web pages, ByteIO endpoints, DMI endpoints, BES resources RNS 1.1 under development WS-Naming – A profile on WS-Addressing that supports identity, abstract name to address mapping, and rebinding of addresses – migration, failure, and replication transparency ByteIO – think POSIX file/steam, read, write, stat WS-DAI – query interface onto structured data, e.g., relational databases or XML databases SRM – Management of data stores BES – Accepts JSDL documents and executes them

12 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Outline Background – The Rule of 3s Specifications Implementations

13 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com There are several implementations (not a complete list!) RNSByteIOWS-NamingWS-DAISRM Genesis IIYes gFarmYesplanned EGEE/gliteExperimental Prototype Planned?Used by some user communities yes NeSC Edinburgh yes Globusyes (just rebinding) yes There are over a dozen OGSA-BES/HPC-BP implementations.

14 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Let’s see what you can do with these specifications Imagine an access layer that consists of a Grid-aware FUSE file system driver for Linux (both Genesis II and gFarm have these) or a Grid-aware Installable File System (IFS) for Windows (Genesis II has one – G- ICING). a provisioning layer that proxies Windows/Unix files and directories into the Grid as RNS and ByteIO endpoints and relational databases as WS-DAI endpoints. OGSA-BES endpoints that also support the RNS specification – allowing jobs to be started simply by copying a JSDL file “into” the directory. a WS-Trust STS endpoint that also supports RNS

15 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Users can access Grid resources simply by copying files, dragging and dropping, etc. Applications don’t need to be re-written to access the Grid

16 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com You don’t have to imagine

17 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Windows Grid-awre IFS

18 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Linux Grid-aware FUSE

19 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Using RNS to name non-file- system components BES resources are also RNS directories We can schedule a job on a resource simply by “dropping” it into the directory

20 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Use SRM to abstract from Storage implementations 20 Client SRM Storage 5 1 2 1.The client asks the SRM for the file providing an SURL (Site URL) 2.The SRM asks the storage system to provide the file 3.The storage system notifies the availability of the file and its location 4.The SRM returns a TURL (Transfer URL), i.e. the location from where the file can be accessed 5.The client interacts with the storage using the protocol specified in the TURL 3 4 could use RNS give back byte-I/O endpoint

21 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com WS-DAI endpoints that support RNS To execute a query, copy a text file with the SQL into the directory that represents the database. The results of the query are accessible as either a file (they can be read, “cat’d”, or loaded into an Excel file as a csv), or subsequently queried as well.

22 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Data publisher Mapping data into the Grid Data clients Linux Windows Links directories and files from source location to data grid directory and user-specified name Presents unified view of the data across platforms, locations, domains, etc. Data publisher controls authorization policy. Data publisher

23 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com Moral of the story RNS allows us to place arbitrary resources into a traditional directed graph/tree structure FUSE/IFS map RNS namespaces into the local file system Users can interact with the grid without knowing anything about grids

24 Data Area Overview Erwin Laure, erwin.laure@cern.ch David E. Martin, martinde@us.ibm.com 24 Data Area Future From Data Area Gaps Analysis High-level Data Movement Caching and Replication Integrated Data Management Transactions in a Grid Recent Interest Storage Provisioning Virtualization Provenance, Integrity, Policy Link to Digital Libraries Dependencies OGSA Security: IETF, OASIS Management: DMTF, WSDM/WS-Man Convergence WS-*: OASIS and W3C, WS-RF/WS-T Convergence


Download ppt "OGF24 15 September 2008 Data Area Overview Erwin Laure David E. Martin Data Area Directors."

Similar presentations


Ads by Google