Download presentation
Presentation is loading. Please wait.
Published byGillian Adams Modified over 9 years ago
1
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny
2
ARGONNE CHICAGO Ian Foster GriPhyN Summary The GriPhyN research agenda aims at IT advances that will enable groups of scientists distributed worldwide to harness Petascale processing, communication, and data resources to transform raw experimental data into scientific discoveries. The goals of the GriPhyN project are to achieve the fundamental IT advances required to realize Petascale Virtual Data Grids and to demonstrate, evaluate, and transfer these research results via the creation of a Virtual Data Toolkit to be used by the four major physics experiments and other projects.
3
ARGONNE CHICAGO Ian Foster Major Points l Project has two complementary & supporting elements u IT research project: will be judged on contributions to knowledge u CS/application partnership: will also be judged on successful transfer to experiments l Two associated unifying concepts u Virtual data as the central intellectual concept u Toolkit as a central deliverable and technology transfer vehicle
4
ARGONNE CHICAGO Ian Foster Virtual Data as a Key Intellectual Challenge and Unifying Concept “These characteristics combine to enable the definition and delivery of a potentially unlimited virtual space of data products derived from other data. In this virtual space, requests can be satisfied via direct retrieval of materialized products and/or computation, with local and global resource management, policy, and security constraints determining the strategy used.”
5
ARGONNE CHICAGO Ian Foster Virtual Data (contd) “The concept of virtual data recognizes that all except irreproducible raw experimental data need ‘exist’ physically only as the specification for how they may be derived. The grid may materialize zero, one, or many copies of derivable data depending on probable demand and the relative costs of computation, storage, and transport.”
6
ARGONNE CHICAGO Ian Foster (Simple) Virtual Data Example l (LIGO) “Gravitational strain for 2 minutes around each of 200 gamma-ray bursts over the last year” l For each requested data value, need to u Determine if it is materialized; if so, where; if not, how to compute it u Plan data movements and computations required to obtain all results u Execute this plan
7
ARGONNE CHICAGO Ian Foster GriPhyN Goals “Explore concept of virtual data and its applicability to data-intensive science,” i.e., 1. Transparency with respect to location u Known concept; but how to realize in a large- scale, performance-oriented Data Grid? 2. Transparency with respect to materialization u To determine: is this useful? 3. Automated management of computation u Issues of scale, transparency
8
ARGONNE CHICAGO Ian Foster Primary GriPhyN R&D Components Virtual Data Tools Request Planning and Scheduling Tools Request Execution Management Tools Performance Estimation and Evaluation Transforms Distributed resources (code, storage, computers, and network) Resource Management Services Resource Management Services Security and Policy Services Security and Policy Services Other Grid Services Other Grid Services Interactive User Tools Production Team Individual InvestigatorOther Users Raw data source
9
ARGONNE CHICAGO Ian Foster Data Grid Reference Architecture: Purpose l Identify primary components of a Data Grid architecture (part vocabulary, part requirements definition, part strategy) l Suggest potential implementation approaches l Identify principal areas in which uncertainty exists and hence research is required
10
ARGONNE CHICAGO Ian Foster Observations on Architecture l We need an architecture so that we can u Coordinate our own activities u Coordinate with other Data Grid projects u Explain to others (experiments, NSF, CS community) what we are doing l An architecture must: u Facilitate CS research activities by simplifying evaluation of alternatives u Not preclude experimentation with (radically) alternative approaches
11
ARGONNE CHICAGO Ian Foster Documents l A Data Grid Reference Architecture l Representing Virtual Data: A Catalog Architecture for Location and Materialization Transparency l Virtual Data Research Challenges l Requirements documents from CMS, LIGO, SDSS
12
ARGONNE CHICAGO Ian Foster Request Formulation Request Manager Request Planner Request Executor Virtual Data Catalogs Storage Systems Code Repositories ComputersNetworks User Applications Data Grid Reference Architecture
13
ARGONNE CHICAGO Ian Foster Virtual Data Data Grids Grids Relationship Between Components
14
ARGONNE CHICAGO Ian Foster Layered Grid Architecture Application Fabric “Controlling things locally”: Access to, & control of, resources Connectivity “Talking to things”: communication (Internet protocols) & security Resource “Sharing single resources”: negotiating access, controlling use Collective “Managing multiple resources”: ubiquitous infrastructure services User “Specialized services”: user- or appln-specific distributed services Internet Transport Application Link Internet Protocol Architecture
15
ARGONNE CHICAGO Ian Foster Request Management Discipline-Specific Data Grid Application Access to data, access to computers, access to network performance data, … Communication, service discovery (DNS), authentication, delegation Application Collective Resource Connectivity Fabric Storage Systems Compute Systems Networks Code Repositories … Catalogs Replica Management Community Policy … GriPhyN Data Grid Reference Architecture
16
ARGONNE CHICAGO Ian Foster Existing Components l Globus Toolkit u MDS-2 information service: access to static & dynamic configuration & state information u GRAM resource access protocol u GridFTP data access and transfer protocol u Replica catalog, replica management u Grid Security Infrastructure: single sign on l Condor, Condor-G resource management l SRB catalog services
17
ARGONNE CHICAGO Ian Foster Globus Data Grid Components Metadata Catalog Replica Catalog Tape Library Disk Cache Attribute Specification Logical Collection and Logical File Name Disk ArrayDisk Cache Application Replica Selection Multiple Locations NWS Selected Replica GridFTP commands Performance Information & Predictions Replica Location 1Replica Location 2Replica Location 3 MDS
18
ARGONNE CHICAGO Ian Foster Catalog Architecture
19
ARGONNE CHICAGO Ian Foster Short-Term (2001) Developments l Deployment of, and experimentation with, basic tools: data movement, data location, computation management u Already started in CMS and LIGO l Requirements definition for experiments u Already started with documents from CMS, LIGO, SDSS l Virtual data catalog prototype l Prototyping of other elements TBD l Work breakdown with EDG, PPDG
20
ARGONNE CHICAGO Ian Foster Goals for this Meeting 1. Identify major areas in with Data Grid Reference Architecture needs improvement 2. Identify how each CS research thrust contributes to this refinement process, and on what schedule l Research, software, and/or experiments 3. Identify how each application area will contribute to evaluating DGRA ideas l Experiments conducted
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.