Introduction of Week 14 Return assignment 12-1 Collect assignment 3-1-6 Review of week 13 Transaction Transaction log and audit trail Concurrent control and potential problems Lock method, deadlock, deadlock control Optimistic method Database backup and disaster recovery Database Management Systems
Module 6 Advanced Database Concepts Section 3: Distributed Database Module 6 Advanced Database Concepts
Centralized DBMS Database Management Systems
Distributed Database Environment Database Management Systems
DDBMS Advantages Data are located near “greatest demand” site Faster data access and data processing Growth facilitation Less danger of a single-point failure Processor independence Database Management Systems
DDBMS Disadvantages Complexity of management and control Lack of standards Increased storage requirements Greater difficulty in managing the data environment Increased training cost Database Management Systems
Characteristics of Distributed Management Systems Application interface Data validation and transformation Query optimization Backup and recovery, DB administration Concurrency control and transaction management Must perform all the functions of a centralized DBMS Must handle all necessary functions imposed by the distribution of data and processing Must perform these additional functions transparently to the end user Database Management Systems
A Fully Distributed Database Management System Database Management Systems
Multiple-Site Data Fully distributed database management system with support for multiple data processors and transaction processors at multiple sites Homogeneous DDBMS - integrate only one type of centralized DBMS over a network Heterogeneous DDBMS - integrate different types of DBMS that may even support different data models (relational or hierarchical) running under different computer systems over a network Database Management Systems
Heterogeneous Distributed Database Scenario Database Management Systems
Distributed Database Transparency Features Distribution transparency Transaction transparency Failure transparency Performance transparency Heterogeneity transparency Database Management Systems
1. Distribution Transparency Allows management of a physically dispersed database as though it were a centralized database Three levels of distribution transparency are recognized: Fragmentation transparency Location transparency Local mapping transparency Database Management Systems
Fragment Locations Database Management Systems
2. Transaction Transparency Ensures database transactions will maintain distributed database’s integrity and consistency Database Management Systems
Distributed Transaction Request Database Management Systems
Distributed Concurrency Control Multi-site, multiple-process operations are much more likely to create data inconsistencies and deadlocked transactions than are single-site systems Database Management Systems
Premature COMMIT Database Management Systems
Two-Phase Commit Protocol Final COMMIT must not be issued until all sites have committed their parts of the transaction Two-phase commit protocol requires each individual DP’s transaction log entry be written before the database fragment is actually updated Database Management Systems
Two - Phase Commit Phase I: Preparation Coordinator receives a commit request. Coordinator instructs all resource managers to get ready to “go either way” on the transaction. Each resource manager writes all updates from that transaction to its own physical log. Coordinator receives replies from all resource managers. If all are ok, it writes commit to its own log; If not then it writes rollback to its log. Database Management Systems
Two - Phase Commit Phase 2: Final Commit Coordinator then informs each resource manager of its decision and broadcasts a message to either commit or rollback (abort.) If the message is commit, then each resource manager transfers the update from its log to its database. A failure during the commit phase puts a transaction “in limbo.” This has to be tested for and handled with timeouts or polling. Database Management Systems
3. Failure Transparency Ensures that the system will continue to operate in the event of a node failure Provides redundant functions in another network node Limits the function lost to the single node, not the overall system Database Management Systems
4. Performance Transparency Objective of query optimization routine is to minimize total cost associated with the execution of a request Costs associated with a request are a function of the: Access time (I/O) cost Communication cost CPU time cost Database Management Systems
Query Optimization Must provide distribution transparency as well as replica transparency Replica transparency - hide the existence of multiple copies of data from the user Query optimization techniques: Manual or automatic Static or dynamic Statistically based or rule-based algorithms Database Management Systems
5. Heterogeneity Transparency Allows the integration of several different local DBMS (relational, network, and hierarchical) under a common, or global, schema. Translates all data requests from the global schema to the local DBMS schema Database Management Systems
Distributed Database Design Design principles and concepts are still applicable. The design introduces additional new issues: Data fragmentation How to partition the database into fragments Data replication Which fragments to replicate Data allocation Where to locate those fragments and replicas Database Management Systems
Data Fragmentation Breaks single object into two or more segments or fragments Each fragment can be stored at any site over a computer network Information about data fragmentation is stored in the distributed data catalog (DDC), from which it is accessed by the TP to process user requests Database Management Systems
Data Fragmentation Strategies Horizontal fragmentation: Division of a relation into subsets (fragments) of tuples (rows) Vertical fragmentation: Division of a relation into attribute (column) subsets Mixed fragmentation: Combination of horizontal and vertical strategies Database Management Systems
A Sample CUSTOMER Table Database Management Systems
Horizontal Fragmentation Database Management Systems
Vertically Fragmented Database Management Systems
Mixed Fragmentation Database Management Systems
Data Replication Storage of data copies at multiple sites served by a computer network Fragment copies can be stored at several sites to serve specific information requirements Can enhance data availability and response time Can help to reduce communication and total query costs Database Management Systems
Data Allocation Allocation strategies: Centralized data allocation - entire database is stored at one site Partitioned data allocation - database is divided into several disjointed parts (fragments) and stored at several sites Replicated data allocation - copies of one or more database fragments are stored at several sites Data distribution over a computer network is achieved through data partition, data replication, or a combination of both Database Management Systems
Client/Server vs. DDBMS Way in which computers interact to form a system Features a user of resources, or a client, and a provider of resources, or a server Can be used to implement a DBMS in which the client is the transaction processor (TP) and the server is the data process (DP) Database Management Systems
Client/Server Advantages Less expensive than alternate minicomputer or mainframe solutions Allow end user to use microcomputer’s GUI, thereby improving functionality and simplicity More people with PC skills than with mainframe skills in the job market PC is well established in the workplace Numerous data analysis and query tools exist to facilitate interaction with DBMSs available in the PC market Considerable cost advantage to offloading applications development from the mainframe to powerful PCs Database Management Systems
Client/Server Disadvantages Creates a more complex environment, in which different platforms (LANs, operating systems, and so on) are often difficult to manage An increase in the number of users and processing sites often paves the way for security problems Possible to spread data access to a much wider circle of users increases demand for people with broad knowledge of computers and software increases burden of training and cost of maintaining the environment Database Management Systems
Twelve Commandments for Distributed Databases Local site independence Central site independence Failure independence Location transparency Fragmentation transparency Replication transparency Distributed query processing Distributed transaction processing Hardware independence Operating system independence Network independence Database independence Database Management Systems
Wrap Up ER Model Normalization, SQL Database design Entity and relationship, Cardinality Transforming ERD into relations Normalization, Functional dependency 1NF, 2NF, 3NF SQL DDL & DML Create, insert, update, delete Select, group by, where vs. having, join Aggregate function Two or more table (normal) Join and outer join Database design conceptual, logical, physical ODBC and JDBC Database Management Systems
Wrap Up Database warehouse XML DBA Transaction Process DSS data characteristics, database characteristics, OLAP vs. OLTP Star schema: fact table and dimension table XML Well-formed XML, Valid XML, DTD DBA Oracle data dictionary Privileges: system & object privileges, role Transaction Process Transaction, transaction log Concurrent control and potential problems Deadlock, deadlock control Distributed Database Characteristics, two-phase commit Transparency features Database Management Systems