Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 17: Additional Slides February 6, 2016. Outline Physical Data Management  Fragments  Distributed Query Processing  Transactions Logical Data.

Similar presentations


Presentation on theme: "Chapter 17: Additional Slides February 6, 2016. Outline Physical Data Management  Fragments  Distributed Query Processing  Transactions Logical Data."— Presentation transcript:

1 Chapter 17: Additional Slides February 6, 2016

2 Outline Physical Data Management  Fragments  Distributed Query Processing  Transactions Logical Data Management  Transparency Conceptual Data Management

3 Physical Data Management Fragments What is a Fragment?  Vertical subset (project operation)  Horizontal subset (restrict operation)  Mixed Fragment (combination of project and restrict) A fragment may be allocated to a single or multiple sites Fragments may be replicated where  there is a primary fragment as a single site  Copies of the fragment are placed at multiple sites (secondary)

4 Physical Data Management Distributed Query Processing Involves both local (intra site) and global (inter site) optimization. Multiple optimization objectives The weighting of communication costs versus local processing costs depends on network characteristics. There are many more possible access plans for a distributed query.

5 Physical Data Management Distributed Query Processing cont’d Local vs. Global query processing  In Local, Queries are performed at a central server (single site)  In Global, Must decide which sites to access for the fragments May need to move fragments from site to site Multiple optimization is needed for Global  Due to the multiple sites and access plans Many possible access plans for Global  Choosing the best one may be difficult

6 Physical Data Management Distributed Query Processing cont’d Communication Costs  Communication Time (CT)  Fixed Message Delay (MD)  Variable Transmission Time (TT) CT = MD + TT MD = Number of Messages * Delay per message TT = Number of bits/Data rate

7 Physical Data Management Distributed Query Processing cont’d Global Query Example (p. 632)  List the order number, order date, product number, product name, product price, and order quantity for eastern orders with a specified customer number, date range, and product color. Four possible access plans

8 Physical Data Management Distributed Query Processing cont’d Access Plan 1  Move the Product table to the Tulsa site where the query is processed

9 Physical Data Management Distributed Query Processing cont’d Access Plan 2  Restrict the Product table at the Denver Site  Then move result to the Tulsa site to execute the remainder of the query

10 Physical Data Management Distributed Query Processing cont’d Access Plan 3  Perform join and restrictions of Eastern- Orders and Eastern Order-lines fragments at the Tulsa site  Then move result to Denver site to join with Product Table

11 Physical Data Management Distributed Query Processing cont’d Access Plan 4  Restrict the Product table at the Denver site  Move product numbers to Tulsa and do restrict/join  Then move result back to Denver to combine with Product table to get product names

12 Physical Data Management Distributed Query Processing cont’d Obviously many different access plans can be used to answer the same query Need to investigate  actual network costs  the local processing costs at each site to determine which access plan is the best

13 Physical Data Management Transactions – 2 Phase Commit Protocol 2 Phase Commit (2PC)  Ensures that all transactions are Atomic One site is selected as a Coordinator while other sites are Participants  Each Participant site execute a different part of the transaction  Two phases: Voting Phase and Decision Phase Figure 17.18, page 634

14 Physical Data Management Transactions – 2 Phase Commit Protocol Several Complications IF  Failures during recovery and Timeouts  Log records are lost  Coordinator fails  Etc. Several methods to resolve these, but out of scope for this class

15 Logical Data Management Transparency Transparency is related to data independence. With transparency, users can write queries with no knowledge of the distribution, and distribution changes will not cause changes to existing queries and transactions. Without transparency, users must reference some distribution details in queries and distribution changes can lead to changes in existing queries.

16 Logical Data Management Fragmentation Transparency Fragmentation transparency provides the highest level of data independence. Users formulate queries and transactions without knowledge of fragments (locations, or local formats). If fragments change, queries and transactions are not affected. Table 17.6, p 626

17 Logical Data Management Location Transparency Location transparency provides a lesser level of data independence than fragmentation transparency. Users need to reference fragments in formulating queries and transactions. However, knowledge of locations and local formats is not necessary. Table 17.7, p 627

18 Conceptual Data Management Schema Integration Multiple types of schemas may exist to describe the same dataset Integrate multiple schemas into a single schema Best explained using an Exercise

19 Database III – E/R Model Entities Engineer(Engineer No, name, title, salary) Project(PNo, project name, budget, location) Client(Client Name, Address) Relationships Engineer Works_In Project : (Responsibility, Duration) Project Contract_By Client : (Contract Date) Conceptual Data Management Schema Integration - Exercise Engineering Database – Relational Schema E(eno, ename, title), p.k. = eno J(jno, jname, budget, loc, cname), p.k. = jno G(eno, jno, resp, dur), p.k. = eno, jno S(title, sal), p.k. = title Employee Database – CODASYL Schema Department(dept-name, budget, manager) Employee(e#, name, address, title, salary) Department Employs Employee (1:N relationship) First, find the common entities and relationships between schemas

20 Database III – E/R Model Entities Engineer(Engineer No, name, title, salary) Project(PNo, project name, budget, location) Client(Client Name, Address) Relationships Engineer Works_In Project : (Responsibility, Duration) Project Contract_By Client : (Contract Date) Conceptual Data Management Schema Integration - Exercise Engineering Database – Relational Schema E(eno, ename, title), p.k. = eno J(jno, jname, budget, loc, cname), p.k. = jno G(eno, jno, resp, dur), p.k. = eno, jno S(title, sal), p.k. = title Employee Database – CODASYL Schema Department(dept-name, budget, manager) Employee(e#, name, address, title, salary) Department Employs Employee (1:N relationship) Second, draw the conceptual diagram for the common entities and relationships

21 Conceptual Data Management Schema Integration No single correct solution to this exercise It may be seen that there are multiple solutions to this problem Considered a very HARD problem Often hard to find the best synonyms especially from a large set of schemas

22 Weekly Exercise Questions 2, 4, and 5


Download ppt "Chapter 17: Additional Slides February 6, 2016. Outline Physical Data Management  Fragments  Distributed Query Processing  Transactions Logical Data."

Similar presentations


Ads by Google