Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

Database Architectures and the Web
High Performance Computing Course Notes Grid Computing.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
DISTRIBUTED DATABASE. Centralized & Distributed Database  Single site database – centralized database –A database is located at a single site or distributed.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Overview Distributed vs. decentralized Why distributed databases
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Chapter 12 Distributed Database Management Systems
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
DISTRIBUTED COMPUTING
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
A centralized system.  Active Directory is Microsoft's trademarked directory service, an integral part of the Windows architecture. Like other directory.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
A Robust Health Data Infrastructure P. Jon White, MD Director, Health IT Agency for Healthcare Research and Quality
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Version 4.0. Objectives Describe how networks impact our daily lives. Describe the role of data networking in the human network. Identify the key components.
1 Multi Cloud Navid Pustchi April 25, 2014 World-Leading Research with Real-World Impact!
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
DISTRIBUTED COMPUTING
CORE 2: Information systems and Databases CENTRALISED AND DISTRIBUTED DATABASES.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Csi315csi315 Client/Server Models. Client/Server Environment LAN or WAN Server Data Berson, Fig 1.4, p.8 clients network.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Copyright © Clifford Neuman and Dongho Kim - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Advanced Operating Systems Lecture.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
1 Distributed Databases BUAD/American University Distributed Databases.
Department of Computing, School of Electrical Engineering and Computer Sciences, NUST - Islamabad KTH Applied Information Security Lab Secure Sharding.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
7. Grid Computing Systems and Resource Management
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Object storage and object interoperability
Introduction to Active Directory
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Introduction to Distributed Platforms
The Data Grid: Towards an architecture for Distributed Management
Vincenzo Spinoso EGI.eu/INFN
CONFIGURING A MICROSOFT EXCHANGE SERVER 2003 INFRASTRUCTURE
Introduction to Data Management in EGI
Grid Computing.
CHAPTER 3 Architectures for Distributed Systems
GRID COMPUTING PRESENTED BY : Richa Chaudhary.
Replication Middleware for Cloud Based Storage Service
An Introduction to Computer Networking
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Database System Architectures
Presentation transcript:

Presenter: Dipesh Gautam

 Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud  Convergence of Grid and Cloud  Vertical RDBMS  Benefits of column-oriented layout 2

 Data Grid: an architecture or set of services that enable individual or group of users ability to access and transact large amounts of geographically distributed data.  The data may be replicated throughout the grid outside the original administrative domain of the data.  The integration between users and the data are handled and controlled by the data grid middleware. 3

 Large dataset size  Geographic distribution of users and resources  Computationally intensive analysis  No other architecture exists that allows us to apply technologies in large scale application domains 4

5

 Mechanism Neutrality ◦ Designed to be as independent as possible of low level mechanisms ◦ Defining interfaces that sum up oddness of specific storage systems.  Compatibility with Grid Infrastructure ◦ Take advantage of fundamental Grid infrastructure ◦ Compatible with lower level Grid mechanisms  Uniformity of Information Infrastructure ◦ The same data model and interface used to access the grids metadata  6

 Middleware provides following services: ◦ Universal namespace ◦ Data transport service ◦ Data access service ◦ Data replication service ◦ Resource management system(RMS) 7

 Number of systems and networks are connected within a grid  Different file naming conventions of separate systems within grid  Physical file names merely do not address the problem locating the data.  Universal namespace provides logical file names  Storage Resource Broker provides service to map between logical and physical file names  Upon requesting logical file names, all matching physical file names are returned and the end user chose appropriate replica 8

 Middleware service for data transfer  The atomicity of the requested data transfer ensures the fault tolerant service ◦ Data transfer is resumed after each interruption until all requested data is receive ◦ Many possible strategies:  Starting the entire transmission from the beginning  Resuming from the point of interruption. E.g: GridFTP sends data from the last acknowledged byte without starting the entire transfer from the beginning.  Provides service for low-level access and connection between hosts for file transfer  Provides I/O functions that allow user to see remote files as if they were local to their system  Provides high level abstraction of the access and transfer of data between different systems hiding the complexity and presenting user as a unified data source 9

 Work with data transport service to provide security, access control and management of data transfer within the grid  Provides security service to authenticate users  Provides authorization service to control access by simple file permission to Access Control Lists (ACLs), Role-Based Access control  Provides encryption service to protect the confidentiality of the data transport (e.g SSL ) 10

 Why replication? ◦ Scalability ◦ Fast access ◦ User collaboration  Replicas are often placed close to the sites where users need them  Replication is controlled by a replica management system  Replica management system determines the needs of replicas based on the requests  Timely update of the replica is performed by propagating the changes in some node to all the nodes in the grid 11

 Centralized model: single master replica updates all others  Decentralized model: all peers update each other  The topology of node placement influence update strategy 12

 Static replication ◦ Uses a fixed replica set of nodes with no dynamic changes to the files being replicated  Dynamic replication ◦ based on popularity of data ◦ If request exceeds the replication threshold, the replica is placed on the server that directly services the client provided that the storage is available ◦ Dynamic deletion of replicas that have null access value  Adaptive replication ◦ The dynamic threshold is computed based on request arrival rates from clients over a period of time ◦ The replicas with lower threshold and were not created in the current replication interval can be removed  Fair-share replication ◦ Based on access load and storage load of candidate servers ◦ Server with less access load is selected for replication as the replicated in server with more access load degrades the performance for all clients ◦ Among the candidate servers with same access load, server with less storage load is selected  Lot more replication placement strategy exists 13

 Core functionality of data grid  Manages all the actions related to storage resources  Fulfils user and application requests for data resources based on type of request and policies  Schedules creation of replicas  Enforces policy and security within the data grid resources by including authentication, authorization and access support systems with different administrative policies to inter-operate  Enforces system fault tolerance and stability requirements 14

 Various topologies have been used to address need of the scientific community  Four major types of topologies ◦ Federation topology ◦ Monadic topology ◦ Hierarchical topology ◦ Hybrid topology 15

 Allows each institution control over their data  The institution who receives request from authorized institution determines whether to send data to the requesting institution  The federation could be loosely or tightly integrated  Preferred by the institutions that wish to share data from already existing systems 16

 All the collected data is fed into a central repository  Central repository responds to all queries for data  No replicas in the topology  This topology is well suited when all access to the data is local or within a single region with high speed connectivity 17

 Suited for collaborating data from single source to distributed multiple locations around the world 18

 Any combination of other topologies  Suited for researches working on projects want to share their results to further research by making it readily available for collaboration 19

 Grid ◦ Grid refers for distributed computing in science and engineering ◦ In grid computing, virtual organizations share computer resources over a network ◦ Scientific research, collaboration ◦ Share local resources ◦ Heterogeneous, real resource ◦ Geographically distributed, locally owned and managed Cloud – Cloud refers for a computer network in the context of network management – In cloud computing anybody can access data and compute services over the internet – Web services, business apps – Make huge data centers available – Homogeneous virtualized resources – Geographically distributed, centrally owned and managed 20

 Interoperability standards among the service providers of both grid and cloud should be considered by the user  Interoperating cloud looks like grid 21

 Column-Oriented DBMS ◦ Store data column wise instead of row wise ◦ In row oriented DBMS the values on the rows are serialized and stored in memory as: 1, Smith, Joe, 40000; 2, Jones, Mary, 50000; 3, Johnson, Cathy, 44000; ◦ In column oriented DBMS the columns are serialized as: ◦ 1, 2, 3; Smith, Jones, Johnson; Joe, Mary, Cathy; 40000, 50000, 44000; EmpIdLastnameFirstnameSalary 1SmithJoe JonesMary JohnsonCathy

 Efficient when aggregate needs to be computed over many rows but only for notably smaller subset of columns  Efficient in writing a column when new values of column for all rows are supplied at once  Suite for Online Analytical Processing(OLAP) like workloads which involve a smaller number of highly complex queries over all data of terabyte size. 23

   Martin Antony Walker, Grids and Clouds, Grids+and+Clouds+OGF25+MAW.pdf  004/documents/Course-DataGrid.ppt 004/documents/Course-DataGrid.ppt  oriented_DBMS oriented_DBMS 24