1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Database Architectures and the Web
V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS.
ISOM Distributed Databases Arijit Sengupta. ISOM Learning Objectives Understand the concept and necessity of distributed databases Understand the types.
Distributed Databases Chapter 22 By Allyson Moran.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Chapter 13 (Web): Distributed Databases
Distributed Databases
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Overview Distributed vs. decentralized Why distributed databases
6/25/20151 PSU’s CS Parallel, Distributed Access  Parallel vs Distributed DBMSs  Parallel DBMSs  Measuring success: scalability  Hardware Architectures.
Manajemen Basis Data Pertemuan 11 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Chapter 12 Distributed Database Management Systems
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed Databases
Distributed databases
TDD: Topics in Distributed Databases
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
WPI, MOHAMED ELTABAKH PARALLEL & DISTRIBUTED DATABASES 1.
Distributed Databases
Distributed Database and Replication. Distributed Database A logically interrelated collection of shared data and a description of this data physically.
Distributed Databases and DBMSs: Concepts and Design
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Database Architectures and the Web Session 5
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
Session-9 Data Management for Decision Support
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Implementation of Database Systems, Jarek Gryz1 Distributed Databases Chapter 21, Part B.
DDBMS Distributed Database Management Systems Fragmentation
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Instructor: Marina Gavrilova. Outline Introduction Types of distributed databases Distributed DBMS Architectures and Storage Replication Synchronous replication.
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
Distributed database system
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
 Parallel databases  Architecture  Query evaluation  Query optimization  Distributed databases  Architectures  Data storage  Catalog management.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
DISTRIBUTED DATABASES AND DDBMS. Learning Objectives  Describe various DDBMS implementations  Explain how database design affects the DDBMS environment.
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
CENG 553 Database Management Systems1 Distributed Databases.
Distributed Databases “Fundamentals”
CHAPTER 25 - Distributed Databases and Client–Server Architectures
6/25/2018.
Distributed Database Management Systems
Parallel and Distributed Databases
Chapter 19: Distributed Databases
Introduction to Query Optimization
R*: An Overview of the Architecture
Distributed Databases
Distributed Databases
Presentation transcript:

1 Distributed Databases Chapter 21, Part B

2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed Data Independence: Users should not have to know where data is located (extends Physical and Logical Data Independence principles). v Distributed Transaction Atomicity: Users should be able to write Xacts accessing multiple sites just like local Xacts.

3 Recent Trends v Users have to be aware of where data is located, i.e., Distributed Data Independence and Distributed Transaction Atomicity are not supported. v These properties are hard to support efficiently. v For globally distributed sites, these properties may not even be desirable due to administrative overheads of making location of data transparent.

4 Types of Distributed Databases v Homogeneous: Every site runs same type of DBMS. v Heterogeneous: Different sites run different DBMSs (different RDBMSs or even non- relational DBMSs). DBMS1DBMS2DBMS3 Gateway

5 Distributed DBMS Architectures v Client-Server v Collaborating-Server CLIENT SERVER QUERY SERVER QUERY Client ships query to single site. All query processing at server. - Thin vs. fat clients. - Set-oriented communication, client side caching. Query can span multiple sites.

6 Storing Data v Fragmentation –Horizontal: Usually disjoint. –Vertical: Lossless-join; tids. v Replication –Gives increased availability. –Faster query evaluation. –Synchronous vs. Asynchronous. u Vary in how current copies are. TID t1 t2 t3 t4 R1 R2 R3 SITE A SITE B

7 Distributed Catalog Management v Must keep track of how data is distributed across sites. v Must be able to name each replica of each fragment. To preserve local autonomy: – v Site Catalog: Describes all objects (fragments, replicas) at a site + Keeps track of replicas of relations created at this site. –To find a relation, look up its birth-site catalog. –Birth-site never changes, even if relation is moved.

8 Distributed Queries v Horizontally Fragmented: Tuples with rating = 5 at Tokyo. –Must compute SUM (age), COUNT (age) at both sites. –If WHERE contained just S.rating>6, just one site. v Vertically Fragmented: sid and rating at Shanghai, sname and age at Tokyo, tid at both. –Must reconstruct relation by join on tid, then evaluate the query. v Replicated: Sailors copies at both sites. –Choice of site based on local costs, shipping costs. SELECT AVG(S.age) FROM Sailors S WHERE S.rating > 3 AND S.rating < 7

9 Distributed Joins v Fetch as Needed, Page NL, Sailors as outer: – Cost: 500 D * 1000 (D+S) – D is cost to read/write page; S is cost to ship page. –If query was not submitted at London, must add cost of shipping result to query site. –Can also do INL at London, fetching matching Reserves tuples to London as needed. v Ship to One Site: Ship Reserves to London. –Cost: 1000 S D (SM Join; cost = 3*( )) –If result size is very large, may be better to ship both relations to result site and then join them! SailorsReserves LONDONPARIS 500 pages1000 pages

10 Semijoin v At London, project Sailors onto join columns and ship this to Paris. v At Paris, join Sailors projection with Reserves. –Result is called reduction of Reserves wrt Sailors. v Ship reduction of Reserves to London. v At London, join Sailors with reduction of Reserves. v Idea: Tradeoff the cost of computing and shipping projection and computing and shipping projection for cost of shipping full Reserves relation. v Especially useful if there is a selection on Sailors, and answer desired at London.

11 Bloomjoin v At London, compute a bit-vector of some size k: –Hash join column values into range 0 to k-1. –If some tuple hashes to I, set bit I to 1 (I from 0 to k-1). –Ship bit-vector to Paris. v At Paris, hash each tuple of Reserves similarly, and discard tuples that hash to 0 in Sailors bit-vector. –Result is called reduction of Reserves wrt Sailors. v Ship bit-vector reduced Reserves to London. v At London, join Sailors with reduced Reserves. v Bit-vector cheaper to ship, almost as effective.

12 Distributed Query Optimization v Cost-based approach; consider all plans, pick cheapest; similar to centralized optimization. –Difference 1: Communication costs must be considered. –Difference 2: Local site autonomy must be respected. –Difference 3: New distributed join methods. v Query site constructs global plan, with suggested local plans describing processing at each site. –If a site can improve suggested local plan, free to do so.

13 Updating Distributed Data v Synchronous Replication: All copies of a modified relation (fragment) must be updated before the modifying Xact commits. –Data distribution is made transparent to users. v Asynchronous Replication: Copies of a modified relation are only periodically updated; different copies may get out of synch in the meantime. –Users must be aware of data distribution. –Current products follow this approach.

14 Data Warehousing and Replication v A hot trend: Building giant “warehouses” of data from many sites. –Enables complex decision support queries over data from across an organization. v Warehouses can be seen as an instance of asynchronous replication. –Source data typically controlled by different DBMSs; emphasis on “cleaning” data and removing mismatches ($ vs. rupees) while creating replicas.

15 Summary v Parallel DBMSs designed for scalable performance. Relational operators very well- suited for parallel execution. –Pipeline and partitioned parallelism. v Distributed DBMSs offer site autonomy and distributed administration. Must revisit storage and catalog techniques, concurrency control, and recovery issues.