Distributed Databases

Slides:



Advertisements
Similar presentations
ISOM Distributed Databases Arijit Sengupta. ISOM Learning Objectives Understand the concept and necessity of distributed databases Understand the types.
Advertisements

Distributed Databases Chapter 22 By Allyson Moran.
Distributed databases
Distributed Databases
Overview Distributed vs. decentralized Why distributed databases
6/25/20151 PSU’s CS Parallel, Distributed Access  Parallel vs Distributed DBMSs  Parallel DBMSs  Measuring success: scalability  Hardware Architectures.
Implementation of Database Systems, Jarek Gryz1 Distributed Databases Chapter 21, Part B.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Instructor: Marina Gavrilova. Outline Introduction Types of distributed databases Distributed DBMS Architectures and Storage Replication Synchronous replication.
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Databases “Fundamentals”
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Introduction to Database Systems Chapter 1
CF 1334 Sistem Basis Data (3 SKS)
Diskusi-08 Jelaskan dan berikan contoh penggunaan theta join, equijoin, natural join, outer join, dan semijoin The slides for this text are organized into.
Introduction to Database Systems
Diskusi-5 Sebutkan perangkat (tools) yang berpotensi mendukung kebutuhan tugas-tugas manajerial (management work) Jelaskan enam karakteristik informasi.
MODELS OF DATABASE AND DATABASE DESIGN
Tree-Structured Indexes
Storage and Indexes Chapter 8 & 9
Query-by-Example (QBE)
Diskusi-1 Bacalah materi chapter-01, lalu buatlah ringkasan yang berisi tentang : Definisi EUIS Siapa end user Dampak euis pada lingkungan kerja Perencanaan.
Latihan Answer the following questions using the relational schema from the Exercises at the end of Chapter 3: Create the Hotel table using the integrity.
Diskusi-16 Buatlah ringkasan tentang pertimbangan dalam desain yang ergonomis pada tiga perangkat utama komputer yaitu monitor, keyboard dan mouse (lihat.
Database Application Development
Permutations and Combinations
Hash-Based Indexes Chapter 11
Latihan Create a separate table with the same structure as the Booking table to hold archive records. Using the INSERT statement, copy the records from.
Tugas-05 a. Sebutkan primary key masing-masing tabel
Introduction to Query Optimization
Relational Algebra Chapter 4, Part A
The Entity-Relationship Model
Modeling Your Data Chapter 2 cs542
File Organizations Chapter 8 “How index-learning turns no student pale
COSC 6340 Projects & Homeworks Spring 2002
R*: An Overview of the Architecture
Examples of Physical Query Plan Alternatives
The Entity-Relationship Model
Database Application Development
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Crash Recovery, Part 2 R&G - Chapter 18
B+-Trees and Static Hashing
File Organizations and Indexing
File Organizations and Indexing
Tree-Structured Indexes
Team Project, Part II NOMO Auto, Part II IST 210 Section 4
Hash-Based Indexes R&G Chapter 10 Lecture 18
Hash-Based Indexes Chapter 10
Distributed Databases
Selected Topics: External Sorting, Join Algorithms, …
Relational Algebra Chapter 4, Sections 4.1 – 4.2
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Tree-Structured Indexes
Database Management Systems CSE594
Overview of Query Evaluation
Overview of Query Evaluation
Evaluation of Relational Operations: Other Techniques
Introduction to Database Systems
Relational Query Optimization (this time we really mean it)
Overview of Query Evaluation
Chapter 11 Instructor: Xin Zhang
Tree-Structured Indexes
Physical Database Design
File Organizations and Indexing
COSC 3480 Projects & Homeworks Fall 2003
Distributed Databases
Database Application Development
Relational Calculus Chapter 4, Part B
Presentation transcript:

Distributed Databases Chapter 21, Part B The slides for this text are organized into chapters. This lecture covers the material on distributed databases from Chapter 21. The material on parallel databases can be found in Chapter 21, part B. Chapter 1: Introduction to Database Systems Chapter 2: The Entity-Relationship Model Chapter 3: The Relational Model Chapter 4 (Part A): Relational Algebra Chapter 4 (Part B): Relational Calculus Chapter 5: SQL: Queries, Programming, Triggers Chapter 6: Query-by-Example (QBE) Chapter 7: Storing Data: Disks and Files Chapter 8: File Organizations and Indexing Chapter 9: Tree-Structured Indexing Chapter 10: Hash-Based Indexing Chapter 11: External Sorting Chapter 12 (Part A): Evaluation of Relational Operators Chapter 12 (Part B): Evaluation of Relational Operators: Other Techniques Chapter 13: Introduction to Query Optimization Chapter 14: A Typical Relational Optimizer Chapter 15: Schema Refinement and Normal Forms Chapter 16 (Part A): Physical Database Design Chapter 16 (Part B): Database Tuning Chapter 17: Security Chapter 18: Transaction Management Overview Chapter 19: Concurrency Control Chapter 20: Crash Recovery Chapter 21 (Part A): Parallel Databases Chapter 21 (Part B): Distributed Databases Chapter 22: Internet Databases Chapter 23: Decision Support Chapter 24: Data Mining Chapter 25: Object-Database Systems Chapter 26: Spatial Data Management Chapter 27: Deductive Databases Chapter 28: Additional Topics 1

Introduction Data is stored at several sites, each managed by a DBMS that can run independently. Distributed Data Independence: Users should not have to know where data is located (extends Physical and Logical Data Independence principles). Distributed Transaction Atomicity: Users should be able to write Xacts accessing multiple sites just like local Xacts.

Types of Distributed Databases Homogeneous: Every site runs same type of DBMS. Heterogeneous: Different sites run different DBMSs (different RDBMSs or even non-relational DBMSs). Gateway DBMS1 DBMS2 DBMS3

Distributed DBMS Architectures QUERY Client-Server CLIENT CLIENT Client ships query to single site. All query processing at server. - Thin vs. fat clients. - Set-oriented communication, client side caching. SERVER SERVER SERVER SERVER Collaborating-Server SERVER Query can span multiple sites. SERVER QUERY

Storing Data Fragmentation Horizontal: Usually disjoint. TID Storing Data t1 t2 t3 t4 Fragmentation Horizontal: Usually disjoint. Vertical: Lossless-join; tids. Replication Gives increased availability. Faster query evaluation. Synchronous vs. Asynchronous. Vary in how current copies are. R1 R3 SITE A SITE B R1 R2

Distributed Catalog Management Must keep track of how data is distributed across sites. Must be able to name each replica of each fragment. To preserve local autonomy: <local-name, birth-site> Site Catalog: Describes all objects (fragments, replicas) at a site + Keeps track of replicas of relations created at this site. To find a relation, look up its birth-site catalog. Birth-site never changes, even if relation is moved.

SELECT AVG(S.age) FROM Sailors S WHERE S.rating > 3 AND S.rating < 7 Distributed Queries Horizontally Fragmented: Tuples with rating < 5 at Shanghai, >= 5 at Tokyo. Must compute SUM(age), COUNT(age) at both sites. If WHERE contained just S.rating>6, just one site. Vertically Fragmented: sid and rating at Shanghai, sname and age at Tokyo, tid at both. Must reconstruct relation by join on tid, then evaluate the query. Replicated: Sailors copies at both sites. Choice of site based on local costs, shipping costs.

Distributed Joins Sailors Reserves LONDON PARIS Distributed Joins Sailors Reserves 500 pages 1000 pages Fetch as Needed, Page NL, Sailors as outer: Cost: 500 D + 500 * 1000 (D+S) D is cost to read/write page; S is cost to ship page. If query was not submitted at London, must add cost of shipping result to query site. Can also do INL at London, fetching matching Reserves tuples to London as needed. Ship to One Site: Ship Reserves to London. Cost: 1000 S + 4500 D (SM Join; cost = 3*(500+1000)) If result size is very large, may be better to ship both relations to result site and then join them!

Semijoin At London, project Sailors onto join columns and ship this to Paris. At Paris, join Sailors projection with Reserves. Result is called reduction of Reserves wrt Sailors. Ship reduction of Reserves to London. At London, join Sailors with reduction of Reserves. Idea: Tradeoff the cost of computing and shipping projection and computing and shipping projection for cost of shipping full Reserves relation. Especially useful if there is a selection on Sailors, and answer desired at London.

Bloomjoin At London, compute a bit-vector of some size k: Hash join column values into range 0 to k-1. If some tuple hashes to I, set bit I to 1 (I from 0 to k-1). Ship bit-vector to Paris. At Paris, hash each tuple of Reserves similarly, and discard tuples that hash to 0 in Sailors bit-vector. Result is called reduction of Reserves wrt Sailors. Ship bit-vector reduced Reserves to London. At London, join Sailors with reduced Reserves. Bit-vector cheaper to ship, almost as effective.

Distributed Query Optimization Cost-based approach; consider all plans, pick cheapest; similar to centralized optimization. Difference 1: Communication costs must be considered. Difference 2: Local site autonomy must be respected. Difference 3: New distributed join methods. Query site constructs global plan, with suggested local plans describing processing at each site. If a site can improve suggested local plan, free to do so.