Chapter 13 Client/Server Database and Distributed Database Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis.

Slides:



Advertisements
Similar presentations
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Advertisements

Fundamentals of Database Management Systems by
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
© Copyright 2011 John Wiley & Sons, Inc.
ACCOUNTING INFORMATION SYSTEMS
Introduction to Information Technology, 2nd Edition Turban, Rainer & Potter © 2003 John Wiley & Sons, Inc Introduction to Information Technology.
Distributed Database Management Systems
Chapter 9 : Distributed Database.
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
Chapter 14 Organizing and Manipulating the Data in Databases
Overview Distributed vs. decentralized Why distributed databases
Data Storage and Data Processing Architectures The difficulty is in the choice George Moore, 1900.
Chapter 12 Distributed Database Management Systems
Lesson 1: Configuring Network Load Balancing
Chapter 5 The Relational Database Model: Introduction
Chapter 3: Data Modeling
5th Edition, Irv Englander
CHAPTER 2: Introduction to Systems Concepts and Systems Architecture
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
12-1 Planning for Information Technology and Systems.
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Distributed Databases
Chapter 3 Data Modeling Fundamentals of Database Management Systems by
Introduction to Databases
Chapter 7 Logical Database Design
Chapter 4 The Database Management System Concept
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design Copyright 2000 © John Wiley & Sons, Inc. All rights reserved. Slide 1 Systems.
Chapter 6 The Relational Database Model: Additional Concepts
Introduction to Databases Chapter 1: Introducing Data and Data Management.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Database Design – Lecture 16
Slide 1 Systems Analysis and Design With UML 2.0 An Object-Oriented Approach, Second Edition Chapter 13: Physical Architecture Layer Design Alan Dennis,
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Fundamentals of Database Management Systems, 2nd ed
Today’s Agenda  Any questions about the assignment (due Mon)?  Quiz  Quiz review  Homework for Friday:  Watch the two videos on the Coursera db website.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design Copyright 2000 © John Wiley & Sons, Inc. All rights reserved. Slide 1 Systems.
Chapter 8 Physical Database Design
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Slide 3-1 Chapter 3 Terms Electronic Commerce and Internet Technologies Introduction to Information Systems Judith C. Simon.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
Copyright © 2000 John Wiley & Sons, Inc. All rights reserved
9-1 COBOL for the 21 st Century Nancy Stern Hofstra University Robert A. Stern Nassau Community College James P. Ley University of Wisconsin-Stout (Emeritus)
Slide 1-1 Chapter 1 Terms Information Systems Overview Introduction to Information Systems Judith C. Simon.
Chapter 10 Structuring Organizations Management 1e Management 1e Management 1e - 2 Management 1e Learning Objectives  Explain how.
Core Concepts of ACCOUNTING INFORMATION SYSTEMS Moscove, Simkin & Bagranoff John Wiley & Sons, Inc. Developed by: S. Bhattacharya, Ph.D. Florida Atlantic.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Copyright © 2000 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976.
MBA 664 Database Management Systems Dave Salisbury ( )
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Slide 6-1 Chapter 6 System Software Considerations Introduction to Information Systems Judith C. Simon.
Chapter 1 Database Access from Client Applications.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Slide 11-1 Chapter 11 Terms Information Resource Management Strategies Introduction to Information Systems Judith C. Simon.
PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 3rd Edition Copyright 2006 © John Wiley & Sons, Inc. All rights reserved.
Slide 6-1 Chapter 6 Terms System Software Considerations Introduction to Information Systems Judith C. Simon.
Distributed Databases
Systems Analysis and Design
5th Edition, Irv Englander
Dinesh Mirchandani University of Missouri – St. Louis
Physical Changes That Don’t Change the Logical Design
Systems Analysis and Design
Systems Analysis and Design With UML 2
Systems Analysis and Design 5th Edition Chapter 8. Architecture Design
Systems Analysis and Design
Distributed Databases
ACCOUNTING INFORMATION SYSTEMS
Presentation transcript:

Chapter 13 Client/Server Database and Distributed Database Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal Chin, Ph.D. Virginia Commonwealth University John Wiley & Sons, Inc.

13-2 Chapter Objectives  Describe the concepts and advantages of the client/server database approach.  Describe the concepts and advantages of the distributed database approach.  Explain how data can be distributed and replicated in a distributed database.

13-3 Chapter Objectives  Describe the problem of concurrency control in distributed database.  Describe the distributed join process.  Describe data partitioning in a distributed database.  Describe distributed directory management.

13-4 Local Area Network (LAN)  An arrangement of personal computers connected together by communications lines.  The PCs must be located fairly close to each other.  Allows sharing of resources such as servers, printers, etc.

13-5 Local Area Network (LAN)  The PCs on a LAN can certainly operate independently, but they can also communicate with each other.  A gateway computer on the LAN can link the LAN and its PCs to other LANs, to one or more mainframe computers, or to the Internet.

13-6 Two-Tiered Client/Server Arrangement  Clients = PCs on the LAN  Server = powerful computer on the LAN  A shared database can be stored on a LAN server so that all of the PCs on the LAN can access it.

13-7 File Server Approach  When a LAN client needs to query, update, or otherwise use a file on the server, the entire file must be sent from the server to that client.  All of the querying, updating, or other processing is then performed at the client.  If changed, the entire file is then shipped back to the server.

13-8 DBMS Server Approach  The database is located at the server.  The processing is split between the client and the server.  Result: there is much less data traffic on the network.

13-9 Two-Tier Approach  Some databases can be stored on a client PC’s own hard drive while other databases that the client might access are stored on the LAN’s server.  Software has been developed that makes the location of the data transparent to the user at the client.

13-10 Two-Tier Approach  The user issues a query at the client.  The software first checks to see if the required data is on the PC’s own hard drive. If yes, the data is retrieved from it.  If it is not on local drive, the software looks for it on the server.

13-11 Three-Tier Approach  If the software doesn’t find the data on the client PC’s hard drive or on the LAN server, it can leave the LAN through a gateway computer and look for the data on, for example, a large, mainframe computer that may be reachable from many LANs.

13-12 Three-Tier Approach  Another use of the term “Three-Tier Approach” with the following three tiers:  The client PCs  Servers known as application servers  Other servers known as database servers

13-13 The Distributed Database Concept  Instead of having one, centralized database, we are going to spread the data out among various cities on the distributed network, each of which has its own computer and data storage facilities.  All of this distributed data is still considered to be a single logical database.

13-14 The Distributed Database Concept  Location transparency - The user just issues the query, and the result is returned.  A person or process anywhere on the distributed network queries the database.  It is not necessary to know where on the network the data being sought is located.

13-15 Distributed DBMS  Distributed database management system  Sophisticated software  Manages location transparency

13-16 Distributing the Data  Headquartered in NY, a company’s database consists of 6 large tables: A, B, C, D, E, F.  With a centralized database, all 6 tables would be located in NY.

13-17 Distributing the Data  The company has major sites in Los Angeles, Memphis, New York, Paris, and Tokyo.  The first and simplest idea in distributing the data would be to disperse the six tables among the five sites, perhaps based on frequency of use of each table.

13-18 Distributing the Data  Tables A and B are kept at New York  Table C is moved to Memphis  Tables D and E are moved to Tokyo  Table F is moved to Paris.

13-19 Distributing the Data  Paris employees can now access Table F without incurring telecommunications costs associated with accessing Table F in NY.  Local autonomy - Paris employees, e.g., can take responsibility for Table F -- its security, backup and recovery, and concurrency control.

13-20 Distributing the Data: Problems  When the database was centralized at New York, a query issued at any of the sites that required a join of two or more of the tables could be handled in the standard way by the computer at New York.  The result would then be sent to the site that issued the query.  In the dispersed approach, a join might require tables located at different sites!

13-21 Replicated Tables  Duplicated tables at two or more sites on the network.  Advantages  Availability - during a site failure, data can still be accessed at a replicated location.  Local access - Replicate table at a site requiring frequent access.

13-22 Replicated Tables  Disadvantages  Security risk  Concurrency control - How do you keep data consistent when it is replicated in tables on three continents?

13-23 Full Data Replication  The maximum approach of replicating every table at every site.  Great for availability  Great for joins

13-24 Full Data Replication  Worst for concurrency control - every change to every table has to be reflected at every site.  Worst for security  Takes up a lot of disk space

13-25 Partial Replication  Have a copy of the entire database at headquarters in New York and have each table replicated exactly once at one of the other sites.

13-26 Partial Replication  Improves availability - each table is now at two sites.  Security and concurrency exposures are limited.  Joins occur at NY.

13-27 Partial Replication  New York could tend to become a bottleneck.  If a table is heavily used in both Tokyo and Los Angeles, it can only be placed at one of the two sites (plus the copy of the entire database in New York), leaving the other with speed and telecom cost problems.

13-28 Replication Principles  Place copies of tables at the sites that use them most heavily in order to minimize telecommunications costs.  Ensure that there are at least two copies of important or frequently used tables to realize the gains in availability.

13-29 Replication Principles  Limit the number of copies of any one table to control the security and concurrency issues.  Avoid any one site becoming a bottleneck.

13-30 Replication Principles

13-31 Concurrency Control in Distributed Database  The “lost update” problem.  The protections that we discussed earlier that can be put into place to handle the problem of concurrent update in a single table are not adequate to handle the new, expanded problem in distributed database systems.

13-32 Asynchronous Approach  If retrieved data does not necessarily have to be up-to-the-minute accurate, we can use “asynchronous” approaches to updating replicated data.

13-33 Asynchronous Schemes  The site where the data was updated can send a message to update to the other sites that contain a copy of the same table.  One of the sites can be chosen to accumulate all of the updates to all of the tables, and transmit changes regularly.  Each table can have one of the sites be declared the “dominant” site for that table, which periodically transmits updates to the other sites.

13-34 Synchronous Approach  If retrieved data does have to be up-to-the- minute accurate.  All data in replicated tables worldwide must always be consistent, accurate, and up-do-date.  Use two-phase commit.

13-35 Two-Phase Commit: Prepare Phase  Each computer on the network has a special log file in addition to its database tables.  The computer at the initiating site sends the updated data to the other sites that have copies of the table to be updated.  The computers at the other sites record the changes in their logs (but not in the actual database tables.)  These computers attempt to lock the database tables involved in the update.  If they are successful (the tables are not busy and can be locked) they inform the initiating site.

13-36 Two-Phase Commit: Commit Phase  If all of the other sites reported they were successful in logging the update and locking the tables, the initiating site issues instructions to transfer the update from the logs to the actual database tables.

13-37 Two-Phase Commit  Either all of the replicated files have to be updated or none of them must be updated.  A complex, costly, and time-consuming process.  The more volatile the data in the database, the less attractive is this procedure for updating replicated tables in the distributed database.

13-38 Distributed Joins  A query that is run from one of the computers in a distributed database system that requires a join of two or more tables that are not all at the same computer.  The distributed DBMS must have its own built-in expert system that is capable of figuring out an efficient way to handle a request for a distributed join.

13-39 Distributed Joins  The DBMS evaluates various options for performing a join by considering:  The number and size of the records from each table involved in the join.  The distances and costs of transmitting the records from one site to another to execute the join.  The distance and cost of shipping the result of the join back to the site that issued the query in the first place.

13-40 Partitioning  The purpose is to have records or columns of a table resident at the sites that use them the most frequently.  Horizontal Partitioning  Vertical Partitioning

13-41 Horizontal Partitioning  A relational table can be split up so that some records are located at one site, other records are located at another site, and so on.  e.g., partitioning of Table G.

13-42 Vertical Partitioning  The columns of a table are divided up among several cities on the network.  Each such partition must include the primary key attribute(s) of the table.  Makes sense when different sites are responsible for processing different functions involving an entity.

13-43 Distributed Directory Management  A distributed DBMS must include a directory that keeps track of where the database tables, the replicated copies of database tables (if any), and the table partitions (if any) are located.  When a query is presented at any site on the network, the distributed DBMS can automatically use the directory to find out where the required data is located and maintain location transparency.

13-44 Directory Location  The entire directory could be stored at only one site.  Copies of the directory could be stored at several of the sites.  A copy of the directory could be stored at every site. (This is generally the best solution.)

13-45 Distributed DBMS Summary  Centralized Database Advantages  Single site provides high degree of security, concurrency, and backup and recovery control.  No need for a distributed directory since all of the data is in one place.  No need for distributed joins since all of the data is in one place.

13-46 Distributed DBMS Summary  Centralized Database Disadvantages  All data accesses from other than the site with the database incur communications costs.  The site with the database can become a bottleneck.  Possible availability problem: if the site with the database goes down, there can be no data access.

13-47 Distributed DBMS Summary  Dispersing Tables on the Network Advantages  Local autonomy.  Reduced communications costs because each table can be located at the site that most heavily uses it.  Improved availability because portions of the database are available even if one or some of the sites are down.

13-48 Distributed DBMS Summary  Dispersing Tables on the Network Disadvantages  Several sites have to be concerned with security, concurrency, and backup and recovery.  Requires a distributed directory and the software to support location transparency.  Requires distributed joins.

13-49 Distributed DBMS Summary  Targeted Data Replication Advantages  Greatly reduced communications costs for read-only data access because copies of tables can be located at multiple sites that most heavily use them.  Greatly improved availability because if a site with a database table goes down, there may be another site with a copy of that table.

13-50 Distributed DBMS Summary  Targeted Data Replication Disadvantages  Multi-site concurrency control when data in replicated tables is updated.

13-51 Distributed DBMS Summary  Partitioned Tables Advantages  Greatest local autonomy because data at the record or column level can be stored at the site(s) that most heavily use it.  Greatly reduced communications costs because data at the record or column level can be stored at the site(s) that most heavily use it.

13-52 Distributed DBMS Summary  Partitioned Tables Disadvantages  Retrieving all or a large portion of a table may require multi-site accesses.

13-53 “Copyright 2004 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without express permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information contained herein.”