Parallel and Distributed Databases

Slides:



Advertisements
Similar presentations
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Advertisements

Database Systems: Design, Implementation, and Management
Database Architectures and the Web
Distributed databases
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Chapter 12 Distributed Database Management Systems
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed databases
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Distributed Databases
Client/Server Databases and the Oracle 10g Relational Database
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Database Design – Lecture 16
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Distributed Database Systems Overview
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Distributed Databases
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
Distributed database system
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
Distributed DBMS Architecture Chapter 4 Principles Of Distributed Database Systems,2/e By Ozsu, Patrick Valduriez.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Databases “Fundamentals”
LM 9. Distributed Database Dr. Lei Li 1. Note: The content of the slides including figures are mainly based on a publicly available textbook chapter:
Distributed Databases and Client-Server Architectures
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Distributed Database Concepts
Database Architectures and the Web
Chapter 12 Distributed Database Management Systems
Distributed Databases
Distributed DBMS Concepts of Distributed DBMS
Database Architectures and the Web
Chapter 19: Distributed Databases
#01 Client/Server Computing
CMSC 611: Advanced Computer Architecture
Parallel and Multiprocessor Architectures – Shared Memory
Chapter 17: Database System Architectures
Distributed Databases Going Beyond Simple Client Server Configuration
Outline Midterm results summary Distributed file systems – continued
Distributed Databases
Distributed Databases and DBMSs: Concepts and Design
Distributed Databases
Introduction of Week 14 Return assignment 12-1
Database System Architectures
Distributed Databases
#01 Client/Server Computing
Presentation transcript:

Parallel and Distributed Databases Presentation By: Mr. Krutibash Nayak Assistant Professor Department of Electrical Engineering,PIET

A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (distributed DBMS) is then defined as the software system that permits the management of the distributed database and makes the distribution transparent to the users [1]

Centralized database[2] • Data is located in one place (one server) • All DBMS functionalities are done by that server • Enforcing ACID properties of transactions • Concurrency control, recovery mechanisms • Answering queries

Distributed databases • Data is stored in multiple places (each is running a DBMS) • New notion of distributed transactions • DBMS functionalities are now distributed over many machines • Revisit how these functionalities work in distributed environment

What is not a DDBS A database system which resides on a timesharing computer system A loosely or tightly coupled multiprocessor system One of the nodes of a network of computers - this is a centralised database on a network node[3]

Parallel Database Management System A database management system that is implemented on a tightly coupled multiprocessor( the tightly coupled system has shared memory) Parallel database system improves performance of data processing using multiple resources in parallel, like multiple CPU and disks are used parallelly.

Goals of Parallel Databases Improve performance: The performance of the system can be improved by connecting multiple CPU and disks in parallel. Many small processors can also be connected in parallel. Improve availability of data: Data can be copied to multiple locations to improve the availability of data. For example: if a module contains a relation (table in database) which is unavailable then it is important to make it available from another module.  Improve reliability: Reliability of system is improved with completeness, accuracy and availability of data. Provide distributed access of data: Companies having many branches in multiple cities can access data with the help of parallel database system.

PARALLEL VS. DISTRIBUTED DATABASES Distributed processing usually imply parallel processing (not vise versa) • Can have parallel processing on a single machine

Assumptions about Architecture Parallel Databases : • Machines are physically close to each other, e.g., same server room • Machines connects with dedicated high-speed LANs and switches • Communication cost is assumed to be small • Can shared-memory, shared-disk, or shared-nothing architecture Distributed Databases : • Machines can far from each other, e.g., in different continent • Can be connected using public-purpose network, e.g., Internet • Communication cost and problems cannot be ignored • Usually shared-nothing architecture

PARALLEL PROCESSING Divide a big problem into many smaller ones to be solved in parallel • Increase bandwidth (in our case decrease queries’ response time)

Parallel Query Optimization Parallel query optimization is the process of analysing a query and choosing the best combination of parallel and serial access methods to yield the fastest response time for the query.   In addition to the costing performed for serial query optimization, parallel optimization analyses the cost of parallel access methods for each combination of join orders, join types, and indexes. The optimizer can choose any combination of serial and parallel access methods to create the fastest query plan.

Distributed Database Architecture A distributed database system allows applications to access data from local and remote databases. In a homogenous distributed database system, each database is an Oracle Database. [4] In a heterogeneous distributed database system, at least one of the databases is not an Oracle Database. Distributed databases use a client/server architecture to process information requests.

Distributed Database Architecture[5]

Homogeneous Distributed Databases In a homogeneous distributed database, all the sites use identical DBMS and operating systems. Its properties are − The sites use very similar software. The sites use identical DBMS or DBMS from the same vendor. Each site is aware of all other sites and cooperates with other sites to process user requests. The database is accessed through a single interface as if it is a single database. Types of Homogeneous Distributed Database There are two types of homogeneous distributed database − Autonomous − Each database is independent that functions on its own. They are integrated by a controlling application and use message passing to share data updates. Non-autonomous − Data is distributed across the homogeneous nodes and a central or master DBMS co-ordinates data updates across the sites.

Heterogeneous Distributed Databases In a heterogeneous distributed database, different sites have different operating systems, DBMS products and data models. Its properties are − Different sites use dissimilar schemas and software. The system may be composed of a variety of DBMSs like relational, network, hierarchical or object oriented. Query processing is complex due to dissimilar schemas. Transaction processing is complex due to dissimilar software. A site may not be aware of other sites and so there is limited co-operation in processing user requests. Types of Heterogeneous Distributed Databases Federated − The heterogeneous database systems are independent in nature and integrated together so that they function as a single database system. Un-federated − The database systems employ a central coordinating module through which the databases are accessed.

Distributed DBMS Architectures DDBMS architectures are generally developed depending on three parameters − Distribution − It states the physical distribution of data across the different sites. Autonomy − It indicates the distribution of control of the database system and the degree to which each constituent DBMS can operate independently. Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system components and databases.

Distributed Catalog Management Catalogs are databases themselves containing metadata about the distributed database system. Efficient catalog management in distributed databases is critical to ensure satisfactory performance related to site autonomy, view management, and data distribution and replication. Three popular management schemes for distributed catalogs are centralized catalogs, fully replicatedcatalogs, and partitioned catalogs.

Centralized Catalogs In this scheme, the entire catalog is stored in one single site. Owing to its central nature, it is easy to implement. On the other hand, the advantages of reliability, availability, autonomy, and distribution of processing load are adversely impacted. For read operations from noncentral sites, the requested catalog data is locked at the central site and is then sent to the requesting site. On completion of the read operation, an acknowledgement is sent to the central site, which in turn unlocks this data. All update operations must be processed through the central site. This can quickly become a performance bottleneck for write-intensive applications.

Fully Replicated Catalogs In this scheme, identical copies of the complete catalog are present at each site. This scheme facilitates faster reads by allowing them to be answered locally. However, all updates must be broadcast to all sites. Updates are treated as transactions and a centralized two-phase commit scheme is employed to ensure catalog consistency. As with the centralized scheme, write-intensive applications may cause increased network traffic due to the broadcast associated with the writes.

Partially Replicated Catalogs. The centralized and fully replicated schemes restrict site autonomy since they must ensure a consistent global view of the catalog. Under the partially replicated scheme, each site maintains complete catalog information on data stored locally at that site. Each site is also permitted to cache entries retrieved from remote sites. However, there are no guarantees that these cached copies will be the most recent and updated. The system tracks catalog entries for sites where the object was created and for sites that contain copies of this object. Any changes to copies are propagated immediately to the original (birth) site. Retrieving updated copies to replace stale data may be delayed until an access to this data occurs. In general, fragments of relations across sites should be uniquely accessible. Also, to ensure data distribution transparency, users should be allowed to create synonyms for remote objects and use these synonyms for subsequent referrals.

Distributed Query Processing Architecture

Example: Retrieve details of all projects whose status is “Ongoing The global query will be &inus; σstatus="ongoing"(PROJECT)σstatus="ongoing"(PROJECT) Query in New Delhi’s server will be − σstatus="ongoing"(NewD−PROJECT)σstatus="ongoing"(NewD−PROJECT) Query in Kolkata’s server will be − σstatus="ongoing"(Kol−PROJECT)σstatus="ongoing"(Kol−PROJECT) Query in Hyderabad’s server will be − σstatus="ongoing"(Hyd−PROJECT) ”

Distributed Query Optimization The main issues for distributed query optimization are − Optimal utilization of resources in the distributed system. Query trading. Reduction of solution space of the query.

Optimal Utilization of Resources in the Distributed System Following are the approaches for optimal resource utilization − Operation Shipping − In operation shipping, the operation is run at the site where the data is stored and not at the client site. The results are then transferred to the client site. This is appropriate for operations where the operands are available at the same site. Example: Select and Project operations. Data Shipping − In data shipping, the data fragments are transferred to the database server, where the operations are executed. This is used in operations where the operands are distributed at different sites. This is also appropriate in systems where the communication costs are low, and local processors are much slower than the client server. Hybrid Shipping − This is a combination of data and operation shipping. Here, data fragments are transferred to the high-speed processors, where the operation runs. The results are then sent to the client site.

Reference: 1. Distributed and Parallel Database Systems : M. Tamer Ozsu 2. http://web.cs.wpi.edu/~cs561/s12/Lectures/4-5/ParallelDBs.pdf 3. http://mazsola.iit.uni- miskolc.hu/tempus/discom/doc/db/tema01a.pdf 4. https://docs.oracle.com/cd/B28359_01/server.111/b28310/ds_conce pts001.htm#ADMIN12074 https://www.tutorialspoint.com/distributed_dbms/distributed_dbms _database_environments.htm http://www.readorrefer.in/article/Distributed-Catalog- Management_11597/