A Replica Location Service

Slides:



Advertisements
Similar presentations
Giggle: A Framework for Constructing Scalable Replica Location Services Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschekk, Adriana.
Advertisements

The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
RLS and DRS Roadmap Items Ann Chervenak Robert Schuler USC Information Sciences Institute.
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Spark: Cluster Computing with Working Sets
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Data - Information - Knowledge
A Dependable Auction System: Architecture and an Implementation Framework
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Managing data Resources: An information system provides users with timely, accurate, and relevant information. The information is stored in computer files.
Overview Distributed vs. decentralized Why distributed databases
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Object Naming & Content based Object Search 2/3/2003.
1 Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University.
Chapter 3 : Distributed Data Processing
BUSINESS DRIVEN TECHNOLOGY
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
GRID COMPUTING: REPLICATION CONCEPTS Presented By: Payal Patel.
Hashing it Out in Public Common Failure Modes of DHT-based Anonymity Schemes Andrew Tran, Nicholas Hopper, Yongdae Kim Presenter: Josh Colvin, Fall 2011.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.
DBS201: Introduction to Database Design
6-1 DATABASE FUNDAMENTALS Information is everywhere in an organization Information is stored in databases –Database – maintains information about various.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
STORING ORGANIZATIONAL INFORMATION— DATABASES CIS 429—Chapter 7.
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
Chapter 6 – Database Security  Integrity for databases: record integrity, data correctness, update integrity  Security for databases: access control,
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Storing Organizational Information - Databases
Event Data History David Adams BNL Atlas Software Week December 2001.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
1 Distributed Databases BUAD/American University Distributed Databases.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
1 VLDB - Data Management in Grids B. Del-Fabbro, D. Laiymani, J.M. Nicod and L. Philippe Laboratoire d’Informatique de l’Université de Franche-Comté Séoul,
Introduction to Databases Dr. Osama AL Rababah. Objectives In this capture you will learn: Some common uses of database systems. The characteristics of.
CSC 351 FUNDAMENTALS OF DATABASE SYSTEMS. LECTURE 1: INTRODUCTION TO DATABASES.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
1 CEG 2400 Fall 2012 eDirectory – Directory Service.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Chapter 1 Characterization of Distributed Systems
Efficient Multi-User Indexing for Secure Keyword Search
By Oscar Suciadi CS 157B Prof. Sin-Min Lee
CHAPTER 3 Architectures for Distributed Systems
Database Database is a large collection of related data that can be stored, generally describes activities of an organization. An organised collection.
Viet Tran Institute of Informatics Slovakia
OGSA Data Architecture Scenarios
By Oscar Suciadi CS 157B Prof. Sin-Min Lee
Chapter 1 Database Systems
Managing data Resources:
Database Systems Chapter 1
7.1. CONSISTENCY AND REPLICATION INTRODUCTION
Introduction to Databases Transparencies
Distributed Databases
Database Security &Threats
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
By Oscar Suciadi CS 157B Prof. Sin-Min Lee
Chapter 1 Database Systems
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Presentation transcript:

A Replica Location Service The Globus Project USC Information Sciences Institute Argonne National Laboratory

Motivation In a Data Grid, it may be desirable to create remote, read-only copies (replicas) of storage elements (files) To reduce latency of data accesses To increase robustness Need a mechanism for locating replicas Replica Location Problem: Given a unique logical identifier for data content, determine physical locations of one or more copies of that content Replica Location Service: a Data Grid component that maintains and provides access to information about physical locations of copies

A Replica Location Service Framework Applications may operate at different scales, have different resources and different tolerances to inconsistent RLS information We define a flexible RLS framework Allows users to make tradeoffs among: consistency space overhead reliability update costs query costs By different combinations of 5 essential elements, the framework supports a variety of RLS designs

RLS Requirements Support read-only files Mutable files require greater consistency, must use a separate mechanism Scale of Data Grid (e.g., High Energy Physics) 200 replica sites 50 million logical files total 500 million physical files (replicas) total 20 million physical files at a replica site

RLS Requirements (Cont.) Data Grid Performance (e.g., High Energy Physics) Avg. query response time: 10 milliseconds Max. query response time: 5 seconds Max query rates: 10 to 100 per second Max update/insertion rates: 5 to 20 per second

RLS Requirements (cont.) Security Issues: Authorization: Verify that users are allowed to perform requested operations Privacy: Knowledge of existence, location and content of data must be controlled Integrity: Prevent adversary from tampering with replica location results returned from RLS queries RLS: protects information about existence and location of data Individual storage systems: protect privacy and integrity of data contents

RLS Requirements (Cont.) Consistency Relaxed consistency: RLS is not required to maintain strict consistency Strict consistency would require that RLS always returns a complete and accurate list of copies of specified content Difficult or impossible to achieve in a Grid Local sites may delete replicas or become disconnected without warning

RLS Requirements (Cont.) Reliability No single point of failure: No one RLS site, if it fails or becomes inaccessible, can render entire service inoperable Decoupling of local and global state: Failure or inaccessibility of remote RLS components should not affect local access to local replicas Checksums

A Flexible RLS Framework Five essential elements: Reliable Local State Unreliable Global State Soft State mechanisms for maintaining global state Compression of state updates Membership protocol

Example 1: A Centralized, Nonredundant Global Index All updates sent to a centralized GRIN Not scalable: All queries serviced by a single index Not reliable: Single point of failure

Example 2: An RLS with LFN Partitioning, Redundancy and Bloom Filter Compression Updates to specific, redundant GRINs based on LFN More scalable, reliable Limited storage and communication costs

Example 3: An RLS with Redundancy, Compression and Partitioning of Logical Collections Send collection information to GRINs (lossy) Advantage: Partition intelligently based on file contents, creation or access patterns

Example 4: Hierarchical Index with Partitioning, Bloom Compression, Redundancy GRINs can exchange soft state updates Allows large variety of global index configurations