Download presentation
Presentation is loading. Please wait.
Published byArline Jacobs Modified over 9 years ago
1
Globus – Part II Sathish Vadhiyar
2
Globus Information Service
3
MDS Meta directory service, Monitoring and discovery service For publishing and accessing system and application data Can restrict access to MDS information by using GSI Interacts with local information services – hour- glass mechanism Provides caching to minimize transfer of upto- date information and lessen network overhead
4
MDS Integrates existing systems while providing uniform and extensible data model Uniform API Adopts data representation and API, query language and protocol from LDAP directory service Uses 2 protocols GRIP – for providing information about entities GRIP – for providing information about entities GRRP – for registering entities GRRP – for registering entities LDAP query language supports: Search Search Enquiry Enquiry subscription subscription
5
MDS Architecture GIIS – Grid Index Information Service GRIS – Grid Resource Information Service
6
MDS Support for multiple information service providers - information providers specified on a per attribute basis MDS Data: System information: architecture, OS System information: architecture, OS Network information Network information Load status Load status Additional information sent to GIIS by GRAM reporter Job status Job status Queue information Queue information Information viewed through web browser or web client commands
7
MDS Contains entries where each entry is associated with one or more attribute:value pairs Each entry associated with a distinguished name. Object class are associated with entries – for object types
8
Distinguished name example
9
Another Example
10
Distinguished names for Networks
11
Globus Data Grid
12
Data Grid Challenges: Petabytes and terabytes of data Petabytes and terabytes of data Query management to this huge data Query management to this huge data Cache management Cache management Providing gigabit/sec QoS Providing gigabit/sec QoS Coscheduling data transfers and computation Coscheduling data transfers and computation Selection of dataset replicas Selection of dataset replicas Maximize use of scarce storage, computation and network resources Maximize use of scarce storage, computation and network resources
13
Data Grid Motivation Application requirements: 1.A reliable secure high-performance data transfer protocol 2.Management of multiple copies of files and collections of files
14
Data Grid Architecture
15
GridFTP Secure file transfer over Grid Multiple data channels for parallel transfers – using multiple TCP streams in parallel to improve aggregate bandwidth Partial file transfers Third-party (direct server-to-server) transfers by adding GSSAPI security to the existing third-party data transfers in FTP standard – transfers between 2 servers mediated by a third-party client GSSAPI operations authenticate the third party to the source and destination machines of data transfer
16
Grid FTP contd… Authenticated data channels - both GSI and Kerberos security Reusable data channels Striped data transfers 2 libraries: globus_ftp_control_library – implements control channel API globus_ftp_control_library – implements control channel API gobus_ftp_client_librray – implement GridFTP API gobus_ftp_client_librray – implement GridFTP API Plugin mechanisms for fault tolerance, performance monitoring, and extended data processing
17
Globus Replica Management Architecture Replica management For better performance or availability to accesses For better performance or availability to accesses Mainly for access to “published” resources – read-only model Mainly for access to “published” resources – read-only modelFunctions:Architecture: Lower level replica catalog API Lower level replica catalog API Higher level replica management API Higher level replica management API
18
Replica catalog Provides mapping between logical names of files/locations and physical objects on storage systems Stores 3 kinds of entries Logical collection – user defined collections of files – file aggregation Logical collection – user defined collections of files – file aggregation Location entries – physical locations of files Location entries – physical locations of files Logical files – globally unique names Logical files – globally unique names Replica catalog API provides operations on the replica catalog Replica management API provides session management, catalog creation, file maintenance, access control Implemented with LDAP
19
Replica management Globus Replica Management integrates the Globus Replica Catalog (for keeping track of replicated files) and GridFTP (for moving data) and provides replica management capabilities for data grids. The globus_replica_management library provides client functions that allow files to be registered with the replica management service, published to replica locations, and moved among multiple locations. Managing the copying and placement of files in a distributed computing system so as to improve the performance of data analysis
20
Replica management service - functions Registration of files with the replica management service Creation and deletion of replicas of previously registered files Enquiries concerning the location and performance characteristics of replicas. Replica selection based on performance characteristics
21
Replica management Replica management API – combines storage system operations with calls to low-level catalog API functions Replica management system controls where and when copies are created and provides information about copies But does not ensure file consistency
22
RM API Session management Session handles and attributes Session handles and attributes Restart Restart Rollback Rollback Catalog creation and file management Creating catalog entries Creating catalog entries registering files registering files Publishing files Publishing files Copying, deleting files Copying, deleting files Future ideas Incorporating advance researvation Incorporating advance researvation Automatic replica selection and creation Automatic replica selection and creation Data grid projects http://www.globus.org/datagrid/projects.html http://www.globus.org/datagrid/projects.html
23
Replica Catalog Illustration
24
Replica Selection in Globus Data Grid (Vazhkudai et al.) Replica selection uses MDS for information regarding characteristics of storage systems LDAP information organized as DIT (Directory Information Tree) Each storage resource in Data Grid incorporates GRIS LDAP can execute shell scripts in the background to obtain various dynamic entities like availableSpace, mountPoint etc. Static attributes like seek times can be entered by the system administrator Attributes like data transfer rates across networks to clients can be obtained based on past performance, i.e., historical data ClassAds can also be used for expressing storage attributes
25
Directory for Storage GRIS
26
Metadata Specification
27
Performance Data Specification
28
Steps in Replica Management 1.Application queries metadata expressing desired characteristics of logical files 2.A logical file is returned 3.Application queries replica catalog for replica instances for the logical file 4.Storage broker helps to choose a particular replica
29
Replica Selection
30
Storage Architecture steps 1.Application presents classAds regarding replica requirements to SB 2.SB does search: 1. Queries replica catalogs with the list of all replicas 2. Queries individual GRIS of replicas about their characteristics 3. Collects all information and proceeds to matching 3.Match: 1. Converts replica capabilities to replica classAds 2. Matches application classAds to replica classAds 4.Accesses file using GridFTP
31
Globus References / sources / credits Grid Information Services for Distributed Resource Sharing. K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman. Proceedings of the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press, August 2001. Usage of LDAP in Globus. I. Foster, G. von Laszewski. This short note describes the use of LDAP in the Globus toolkit. It answers three questions: What is LDAP? Where is it used? and Why is it used in Globus? A Directory Service for Configuring High-Performance Distributed Computations. S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith, S. Tuecke. Proc. 6th IEEE Symposium on High-Performance Distributed Computing, pp. 365-375, 1997. Describes the Metacomputing Directory Service used to maintain information about Globus components.
32
Globus References / sources / credits The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets. A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke. Journal of Network and Computer Applications, 23:187-200, 2001 (based on conference publication from Proceedings of NetStore Conference 1999). Secure, Efficient Data Transport and Replica Management for High- Performance Data-Intensive Computing. B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, S. Tuecke. IEEE Mass Storage Conference, 2001. Presents the design and performance characteristics of two fundamental technologies for data management. Replica Selection in the Globus Data Grid. S. Vazhkudai, S. Tuecke, I. Foster. Proceedings of the First IEEE/ACM International Conference on Cluster Computing and the Grid (CCGRID 2001), pp. 106-113, IEEE Computer Society Press, May 2001. Discusses a high-level replica selection service that uses information regarding replica location and user preferences to guide selection from among storage replica alternatives.
33
JUNK !!
34
RFT (Reliable File Transfer) Treat movement of multiple files as a single job Accept transfer requests and reliably manage requests OGSI compliant To transfer data reliably between two GridFTP servers Uses Grid Service Handles (GSH) Acts as a proxy for the user, acts as client on user’s behalf for third-party transfers
35
RFT Client submits SOAP description of data transfer job Maintains checkpoints in data bases Supports both “push” and “pull” mechanisms
36
Data Grid Replica Services Need for meta-data services Various kinds: Application metadata Application metadata Replica metadata Replica metadata System configuration metadata System configuration metadata Replica management For better performance or availability to accesses For better performance or availability to accesses Mainly for access to “published” resources – read- only model Mainly for access to “published” resources – read- only model
37
Replica Catalog Provide mappings between logical names for file or collections and one or more copies of those objects on physical systems Services provided by replica catalog: Registering a list of files as a logical collection Registering a list of files as a logical collection Registering the physical location of a complete or partial replica of a logical collection Registering the physical location of a complete or partial replica of a logical collection Registering information about a particular logical file in a logical collection Registering information about a particular logical file in a logical collection Modifying the contents of registered entities of the catalog Modifying the contents of registered entities of the catalog Responding to queries of the catalog Responding to queries of the catalog The Globus Replica Catalog supports replica management by providing mappings between logical names for files and one or more copies of the files on physical storage systems
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.