Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.

Similar presentations


Presentation on theme: "The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes."— Presentation transcript:

1 The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes

2 The Data Grid Large dataset size Geographic distribution of users and resources Computationally intensive analysis No other architecture exists that allows us to apply technologies in large scale application domains

3 The Data Grid Data grid applications must frequently operate in wide area, multi-institutional diverse environments

4 Design Architecture for The Data Grid Mechanism Neutrality  Designed to be as independent as possible of low level mechanisms  Defining interfaces that sum up oddness of specific storage systems.

5 Design Architecture for The Data Grid Policy Neutrality  Structured so that design decisions with significant performance implications are exposed to the user

6 Design Architecture for The Data Grid Compatibility with Grid Infrastructure  Take advantage of fundamental Grid infrastructure  Compatible with lower level Grid mechanisms

7 Design Architecture for The Data Grid Uniformity of Information Infrastructure  The same data model and interface used to access the grids metadata

8 Design Architecture for The Data Grid These four principals lead us to development of a layered architecture. Lower layers provide high performance access to a statistical set of devices. In data grids, the focus on simple, policy- independent mechanisms will encourage and enable wide use without limiting the range of applications that can be applied.

9 Core Grid Data Services Two fundamental services required in data grid architecture:  Data Access  Metadata Access

10 Data Access Provides mechanisms for accessing, managing, and initiating third party transfers of data stored in storage systems

11 Metadata Access Provides mechanisms for accessing and managing information about data stored in storage systems

12 Data Abstraction: Storage System Basic grid component is the Storage System which provides functions for creating, destroying, reading, writing and manipulation file instances File instances are basic unit of information in a storage system A Storage system implemented by any storage technology that can support the required access functions

13 Data Access: Storage system access functions must be included with the security environment of each site to which remote access is required Applications should be able to provide storage systems with hints concerning access patterns, network performance, etc, that the storage system can use to optimize performance Data movement functions must be able to detect and report errors

14 Metadata Management of the data grid itself Information about file instances, the contents of file instances, and the various storage systems contained in the grid The metadata service provides the way to publish and access the data

15 Application Metadata Describes the contents and structure of the data  Content represented by the file  Circumstances under which the data was obtained  Other info useful to applications that process the data

16 Replica Metadata Used to manage replication of data objects Includes information for mapping file instances to a particular storage system locations

17 System Configuration Metadata Describes the fabric of the grid itself i.e network connectivity and details about storage systems  Capacity  Usage policy

18 Additional Requirements Service must operate efficiently in a distributed environment Scalable Robust Assert Local Control over information

19 Hierarchical Distributed System Because of these, the metadata service must be hierarchical distributed system  Achieve scalability  Avoid single points of failure  Facilitate local control over data

20 Higher-Level Data Grid Components Two types of representative components:  Replica management  Replica selection

21 Replica Management Replica Manager Create copies of file instances, or replicas, within specified storage systems Offers better performance or availability for access to or from a particular location Maintains repository or catalog

22 Replica Selection and Data Filtering High level service provided in the data grid is Replica Selection  Optimize performance principles Speed Cost Security  Replicas may be local or accessed remotely

23 Summary Architecture of the Data Grid  Mechanism Neutrality  Policy Neutrality  Compatibility with Grid Infrastructure  Uniformity of information infrastructure Data Services  Data Access  Metadata Access Replica Management


Download ppt "The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes."

Similar presentations


Ads by Google