The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.

Slides:



Advertisements
Similar presentations
Database System Concepts and Architecture
Advertisements

High Performance Computing Course Notes Grid Computing.
Data Grids Darshan R. Kapadia Gregor von Laszewski
Lesson 1-Introducing Basic Network Concepts
Introduction to Database Management  Department of Computer Science Northern Illinois University January 2001.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Chapter 2 Database Environment Pearson Education © 2014.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Overview of Database Languages and Architectures.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
Chapter 1: Hierarchical Network Design
Cloud Computing.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
SeLeNe - Architecture George Samaras Kyriakos Karenos Larnaca – April 2003 THE UNIVERSITY OF CYPRUS.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Chapter 4 Realtime Widely Distributed Instrumention System.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
Bayu Adhi Tama, M.T.I 1 © Pearson Education Limited 1995, 2005.
Database Environment Session 2 Course Name: Database System Year : 2013.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Chapter 2 Database Environment.
1 Chapter 2 Database Environment Pearson Education © 2009.
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 1: Hierarchical Network Design Connecting Networks.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Databases and DBMSs Todd S. Bacastow January 2005.
Grid and Cloud Computing
The Data Grid: Towards an architecture for Distributed Management
Vincenzo Spinoso EGI.eu/INFN
Joseph JaJa, Mike Smorul, and Sangchul Song
Introduction to Data Management in EGI
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Module 5 - Switches CCNA 3 version 3.0.
Chapter 2 Database Environment Pearson Education © 2009.
Data, Databases, and DBMSs
Database Environment Transparencies
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Chapter 2: Operating-System Structures
Cloud-Enabling Technology
The Anatomy and The Physiology of the Grid
Chapter 2 Database Environment Pearson Education © 2014.
The Anatomy and The Physiology of the Grid
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Chapter-6 Access Network Design.
Chapter 2: Operating-System Structures
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes

The Data Grid Large dataset size Geographic distribution of users and resources Computationally intensive analysis No other architecture exists that allows us to apply technologies in large scale application domains

The Data Grid Data grid applications must frequently operate in wide area, multi-institutional diverse environments

Design Architecture for The Data Grid Mechanism Neutrality  Designed to be as independent as possible of low level mechanisms  Defining interfaces that sum up oddness of specific storage systems.

Design Architecture for The Data Grid Policy Neutrality  Structured so that design decisions with significant performance implications are exposed to the user

Design Architecture for The Data Grid Compatibility with Grid Infrastructure  Take advantage of fundamental Grid infrastructure  Compatible with lower level Grid mechanisms

Design Architecture for The Data Grid Uniformity of Information Infrastructure  The same data model and interface used to access the grids metadata

Design Architecture for The Data Grid These four principals lead us to development of a layered architecture. Lower layers provide high performance access to a statistical set of devices. In data grids, the focus on simple, policy- independent mechanisms will encourage and enable wide use without limiting the range of applications that can be applied.

Core Grid Data Services Two fundamental services required in data grid architecture:  Data Access  Metadata Access

Data Access Provides mechanisms for accessing, managing, and initiating third party transfers of data stored in storage systems

Metadata Access Provides mechanisms for accessing and managing information about data stored in storage systems

Data Abstraction: Storage System Basic grid component is the Storage System which provides functions for creating, destroying, reading, writing and manipulation file instances File instances are basic unit of information in a storage system A Storage system implemented by any storage technology that can support the required access functions

Data Access: Storage system access functions must be included with the security environment of each site to which remote access is required Applications should be able to provide storage systems with hints concerning access patterns, network performance, etc, that the storage system can use to optimize performance Data movement functions must be able to detect and report errors

Metadata Management of the data grid itself Information about file instances, the contents of file instances, and the various storage systems contained in the grid The metadata service provides the way to publish and access the data

Application Metadata Describes the contents and structure of the data  Content represented by the file  Circumstances under which the data was obtained  Other info useful to applications that process the data

Replica Metadata Used to manage replication of data objects Includes information for mapping file instances to a particular storage system locations

System Configuration Metadata Describes the fabric of the grid itself i.e network connectivity and details about storage systems  Capacity  Usage policy

Additional Requirements Service must operate efficiently in a distributed environment Scalable Robust Assert Local Control over information

Hierarchical Distributed System Because of these, the metadata service must be hierarchical distributed system  Achieve scalability  Avoid single points of failure  Facilitate local control over data

Higher-Level Data Grid Components Two types of representative components:  Replica management  Replica selection

Replica Management Replica Manager Create copies of file instances, or replicas, within specified storage systems Offers better performance or availability for access to or from a particular location Maintains repository or catalog

Replica Selection and Data Filtering High level service provided in the data grid is Replica Selection  Optimize performance principles Speed Cost Security  Replicas may be local or accessed remotely

Summary Architecture of the Data Grid  Mechanism Neutrality  Policy Neutrality  Compatibility with Grid Infrastructure  Uniformity of information infrastructure Data Services  Data Access  Metadata Access Replica Management