Distributed Databases and Query Processing. Distributed DB’s vs. Parallel DB’s Many autonomous processors that may participate in database operations.

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Orange complete software solution for management of F&B HEAD OFFICE orange complete software solution for management of F&B SIS TECHNOLOGY 2012 WHITEPAPER.
Chapter 10: Designing Databases
V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS.
ISOM Distributed Databases Arijit Sengupta. ISOM Learning Objectives Understand the concept and necessity of distributed databases Understand the types.
MIS 325 PSCJ. 2  Business processes can be quite complex  Process model: any abstract representation of a process  Process-modeling tools provide a.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Peer-to-Peer Distributed Search. Peer-to-Peer Networks A pure peer-to-peer network is a collection of nodes or peers that: 1.Are autonomous: participants.
Transaction.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Distributed Locking. Distributed Locking (No Replication) Assumptions Lock tables are managed by individual sites. The component of a transaction at a.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
1 Distributed Databases Chapter Two Types of Applications that Access Distributed Databases The application accesses data at the level of SQL statements.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Parallel Algorithms for Relational Operations. Models of Parallelism There is a collection of processors. –Often the number of processors p is large,
Parallel Algorithms for Relational Operations. Many processors...and disks There is a collection of processors. –Often the number of processors p is large,
 You will be able to: › Explain how databases may be stored in more than one physical location and how distribution may be carried out using different.
Integration of Applications MIS3502: Application Integration and Evaluation Paul Weinberg Adapted from material by Arnold Kurtz, David.
1 Distributed Databases Chapter What is a Distributed Database? Database whose relations reside on different sites Database some of whose relations.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed Databases
WPI, MOHAMED ELTABAKH PARALLEL & DISTRIBUTED DATABASES 1.
Distributed Databases
04/20/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
G041: Lecture 11 Further Impacts of ICT Mr C Johnston ICT Teacher
DBS201: Introduction to Database Design
12 1 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Module Title? DBMS Introduction to Database Management System.
Database Design – Lecture 16
Chapter 1 Introduction to Databases Pearson Education ©
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
CORE 2: Information systems and Databases CENTRALISED AND DISTRIBUTED DATABASES.
Session-9 Data Management for Decision Support
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
The Worlds of Database Systems From: Ch. 1 of A First Course in Database Systems, by J. D. Pullman and H. Widom.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
DISTRIBUTED DATABASES JORGE POMBAR. Overview Most businesses need to support databases at multiple sites. Most businesses need to support databases at.
XML Engr. Faisal ur Rehman CE-105T Spring Definition XML-EXTENSIBLE MARKUP LANGUAGE: provides a format for describing data. Facilitates the Precise.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
22 August, 2007Information System Design IT60105, Autumn 2007 Information System Design IT60105 Lecture 8 Use Case Diagrams.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
At the end of each day the evening manager at each store enters a code into the cash register, after which the cash register connects to the central computer.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
King saud university Introduction to Database Systems (Chapter 1: Databases and Database Users)
E-Commerce & M-Commerce. Introduction Electronic commerce, commonly known as e- commerce, It is a type of industry where buying and selling of product.
Distributed DBMS, Query Processing and Optimization
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
CENG 553 Database Management Systems1 Distributed Databases.
Distributed Database Concepts
Physical Changes That Don’t Change the Logical Design
Chapter 19: Distributed Databases
Sales Order Process.
INTRODUCTION TO TRANSACTION PROCESSING SYSTEM
G063 - Distributed Databases
MANAGING DATA RESOURCES
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Distributed Database Management Systems
Presentation transcript:

Distributed Databases and Query Processing

Distributed DB’s vs. Parallel DB’s Many autonomous processors that may participate in database operations. Difference from a shared-nothing parallel system is in the assumption about the cost of communication. Shared-nothing parallel systems: message passing cost is small compared with disk accesses and other costs. Distributed systems: processors are typically physically distant, and so, the cost of message passing is high. Advantages Like parallel systems, a distributed system can use many processors to accelerate query evaluation. Further, since the processors are widely separated, we can increase resilience in the face of failures by replicating data at several sites. Disadvantage Increased complexity of many DB aspects; we have to take care to minimize the number of messages sent between sites.

Why distribute data? Organization is itself distributed among many sites Banks Many branches. Each branch will keep a database of accounts maintained at that branch. Some data are kept in the central office, such as employee records and current interest rates. Department Stores Many individual stores. Each store has a database of sales at that store and inventory at that store. Some data are kept in the central office, such as employee records and a chain-wide inventory, data about credit card customers, and information about suppliers such as unfilled orders. Libraries Consortium of universities that each hold on-line books and other documents. Search at any site will examine the catalog of documents available at all sites and deliver an electronic copy of the document to the user if any site holds it.

Horizontal Partitioning For example, the chain of stores might be imagined to have a single sales relation, such as Sales (item, date, price, purchaser) This relation does not exist physically. Rather, it is the union of a number of relations with the same schema, one at each of the stores in the chain. Local relations are called fragments.

Vertical Partitioning Example: Suppose we want to find out which sales at the Victoria store were made to customers who are more than 90 days in arrears on their credit card payments. Virtual relation: Sales (item, date, price, purchaser, lastCreditCardPayment) Real relations: Sales (item, date, price, creditCardNumber) – in one site CreditCard(purchaser, creditCardNumber, lastCreditCardPayment) – in another site

The Distributed Join Problem R(A,B)  S(B,C) Naïve Strategies 1. Send a copy of R to the site of S, and compute the join there. 2. Send a copy of S to the site of R and compute the join there. However: If the channel has low-capacity, e.g., a phone line or wireless link, then the cost of the join is primarily the time it takes to copy one of the relations. Even if communication is fast, there may be a better query plan if the shared attribute B has values that are much smaller than the values of A and C. E.g., B could be an identifier for documents or videos, while A and C are the documents or videos themselves.

Semijoin Reductions Query plan: Only the relevant part of each relation is shipped to the site of the other. For this, compute first the semijoin: R(X,Y)  < S(Y,Z) = R(X,Y)   Y (S (Y,Z)) by sending  Y (S (Y,Z)) to the site of R. In this way, R is reduced (hopefully)! Finally, send the reduced R to the site of S and compute the join.

Some arguments The semijoin plan is superior if:  Y (S (Y,Z)) is much smaller than S. –  Y (S (Y,Z)) will be small compared with S if: 1.There are many duplicates to be eliminated; 2.The components for the attributes of Z are large compared with the components of Y e.g., Z includes attributes whose values are audios, videos, or documents. R(X,Y)  < S(Y,Z) is smaller than R. – That is, R must contain many dangling tuples in its join with S.