DBSYSTEMS Distributed Database Systems University of Manitoba Asper School of Business 3500 DBMS Bob Travica Updated of 24
DBSYSTEMS Distributed Databases (DDB) Concept Britain Germany France Italy SELECT Sales FROM Britain.Sales UNION SELECT Sales FROM France.Sales UNION SELECT Sales FROM Italy.Sales Multiple independent systems Each has DBMS engine, queries, locking, transactinos, etc. Usually on different machines & locations Can be different hardware, OS, software. 2 of 24
DBSYSTEMS Database Zeus Database Apollo Database Athena United States England France There must be different databases as opposed to a central database (teleprocessing) Network is a part of DDB system Distributed Database System (DDBS) Concept 3 of 24
DBSYSTEMS Teleprocessing vs. DDBS DB Loc 1, central Loc 2 Loc 3 DB Loc1 DB Loc2 Teleprocessing DB System Distributed DB System 4 of 24
DBSYSTEMS DDBS Rules Golden Rule - Transparency: the user should not know or care that the database is distributed and how. User is independent of any constraint. Specifically: No reliance on a central site. Continuous operation. Location independence. Fragmentation (Partitioning) independence Replication independence. Support to distributed query processing. Support to distributed transaction management. Hardware independence. Operating system independence. Network independence. DBMS independence. (C.J. Date) 5 of 24
DBSYSTEMS DDBS Features and Benefits Each database can continue to run even if a portion fails. Performance gains in contrast to teleprocessing. Data and hardware can be moved without affecting operations or users. Operations expansion possible System expansion and upgrades possible 6 of 24
DBSYSTEMS More on DDBS Benefits Support to operations’ dispersion Work and data are segmented on departments. Work and data are geographically segmented. Improved local performance Most updates and queries are performed locally. Local control and responsibility over data. Can still combine data across the system. Scalability and expansion support Local transactions Future expansion Combine data 7 of 24
DBSYSTEMS DDBS Designs Two typical designs PartitionedReplicated Partitioned- Replicated Hybrid Teleprocessing Technical & economic requirements high low Partition Replication C B A C B A C B A C B A C B A Loc 1 Loc 2 Loc 3 8 of 24 Loc 1 Loc 2 Loc 3
DBSYSTEMS Choose hardware and DBMS vendor, and network. Set up network and DBMS connections. Choose locations for data segments. Create backup plan and strategy. Design local views (customized queries). Design stress tests (max. loads, failure cases). Creating Distributed Database System 9 of 24
DBSYSTEMS Query Processing in Partitioned DDBS Networks (LANs, WANs) infleunce the speed of data transfer. Goal is to minimize data transfers Each system must be capable of evaluating queries. Results depend heavily on how the system joins tables. Disk drive LAN WAN 10 of 24
DBSYSTEMS Query Optimization in Partitioned DBS Example NY: tbl.Customers: 1 M rows (Marketing) LA: tbl.Product: 10 M rows (Production) Chicago: tbl.Sale: 20 M rows (main Retail op’s) Query in Chicago: List customer and product data for blue products sold on 1-Mar Must join the 3 tables. Database is partitioned according to spatial dispersion of core operations (above). Only sales data are at the locale where query is run (Chicago). Bad query design : Transfer entire tables Product and Customer to Chicago 11 of 24
DBSYSTEMS Customer(C#, …) 1,000,000 rows NY Product(P#, Color…) 10,000 rows Sale(S#, C#, Date) 20,000,000 rows SaleItem(S#, P#,…) 50,000,000 rows Chicago LA returns requested customer records Better query design qry1: SELECT C# FROM Sale WHERE Date=3/1/2006; qry2: SELECT P# FROM SaleItem INNER JOIN Sale ON SaleItem.S#=Sale.S# WHERE Date=3/1/2006; qry3: SELECT * FROM Customer WHERE C# in qry1’s output; qry4: SELECT * FROM Product WHERE P# in qry2’s output AND Color like “blue”; requests customer records for certain customer IDs requests product records for certain Product IDs returns requested product records 12 of 24
DBSYSTEMS Replicated DDBS Design Rule of thumb: Keep the data close to the location at which they are used. Goals: Minimize transmissions Improve performance Replication Manager Britain: Customers & Sales Spain: Customers & Sales Britain Britain: Customers & Sales Spain: Customers & Sales Spain Periodic (batch) updates Decision support systems: Use replicated data warehouse. 13 of 24
DBSYSTEMS Design: Partitioned database The concurrency issues that apply to centralized database become even more difficult in a partitioned database Challenges: Slow network traffic Site unavailable Concurrency (more people trying to change the same data at the same time) => locks management DDBS and Concurrent Access 14 of 24
DBSYSTEMS Two-Phase Commit Ensuring data consistency across the system. In updating data, DBMS initiating update becomes Coordinator. Two phases: 1. Get Ready: Coordinator sends new data and a request for update to all DDB parts. DB partitions store new data in logs, and report update status. 2. Execute: 2.1 If all DDB parts ready for update, Coordinator requests COMMIT. 2.2 If any DDB part not ready, Coordinator requests ROLLBACK, and the update procedure repeats. DB Partition 1 (Master price list) Initiate Transaction (Reduce price of cookies for 5%) DB Partition 2 (Store A Product tbl) Partition 3 (Store B Product tbl) 1. Prepare to update. All agree? 2. Commit OK? OK! Go! Coordinator 15 of 24
DBSYSTEMS Client-Server DDB Server performs database tasks at request of clients. Clients handle front-end tasks and small data tables that are not shared. Database Server Client Front-end (Forms, Queries, Reports) Query Request Query Output Database Server Update Request Update Response 16 of 24
DBSYSTEMS Three-Tier Client-Server Clients Middleware Database Servers Front-end Databases Transactions Legacy systems Database links Business rules Code 17 of 24
DBSYSTEMS Client-Server Architecture for Internet Client Browsers DBS Middleware Database Servers Web Server Application Server Other Systems 18 of 24
DBSYSTEMS Technologies for DDB Used in E-Commerce - Open Database Connectivity (ODBC) - Microsoft’s Active Data Objects (ADO) - Client--Web Server Connectors (CGI) - Data transfer (XML, SOAP) 19 of 24
DBSYSTEMS Open Database Connectivity: ODBC Since 1992, SQL Access Group; Microsoft’s decisive support Provides SQL access to different DBMSes ODBC handles: Login to database. Send query. Interpret result. Exchange data. Oracle Database Application ODBC driver CLIENT ODBC driver SELECT … Output SQL Database ODBC driver SERVER 20 of 24
DBSYSTEMS Active Data Objects (ADO) Relational Database Server Front end (report) ADO objects Client SELECT … Output ADO - Microsoft object-based technology for accessing data Data access at records level (“cursor programming”) via classes: recordset - similar to table rowset - looser structure Non-relational Database Server Builds on concepts of OO and cursor (the space for storing records in DB systems) Used in Active Server Pages (ASP)* 21 of 24
DBSYSTEMS Client—Web Server Connectors - HTML input needs to be translated for servers down the stream - Output needs to be put back into HTML format - Common Gateway Interface (e.g., CGI programming in PERL) 22 of 24
DBSYSTEMS Communication technology - Extensible Markup Language (XML) & SOAP - SQL can extract DB data into XML documents. So, any database schema can be described => document transfer, validation and format support (e.g., supplier documents). - XML requires additional software for extracting data (parser) and formatting display (style sheets). - Note: Document Type Definition (Declaration) 23 of 24
DBSYSTEMS Extensible Markup Language (XML) & SOAP - SOAP (Simple Object Access Protocol) - protocol for exchanging structured (marked up) data in distributed environment - SOAP uses XML technologies to define an extensible messaging framework, defining message that can be exchanged over a variety of underlying protocols (started with HTTP, hence “Simple”; complex indeed). (See: ) - SOAP and XML provide foundations for Web services – distributed application software, stored across nets, running on different operating systems and devices, written in different programming languages, and made by different vendors (e.g.,.NET) ( More... )More of 24