Distributed Database The University of California Berkeley Extension Copyright © 2011 Patrick McDermott
Impossible Dream? Rehab impossibility at DIR APL: US & HK Goals –Location Transparency –Local Autonomy
DDS Concerns Fragmentation Replication Update Propagation Catalog Management Distributed Query Processing
Definitions A single logical database that is spread physically across computers in multiple locations that are connected by a data communication link. A collection of multiple logically interrelated databases distributed over a computer network. DDBMS: A software system that manages a distributed database while making the distribution transparent to the user
Advantages Organizational decentralization –Horizontal partitions by location Economical processing –Smaller database based on local information Increased reliably and availability –Data replication as well and site autonomy
Same or Different DDBMS? 1.Homogeneous –All servers use identical software and all users use identical software 2.Heterogeneous –Legacy systems –FDBMS: “Federated database system” –Different software at the user or server level Is there a Global Schema? How much Local Autonomy?
Heterogeneous Heterodoxy Differences in Data Models –If the different legacies systems include data from varied models (formats) it is difficult to deal with the information in a global schema or process them in a single language. Differences in Constraints –Different Referential Integrity constraints and triggers may have been used in databases. The Global Schema must deal with the potential conflict among constraints Also different Validation requirements
Heterogeneous Processing Semantic Heterogeneity –Two or more databases can have identical names with different sets of information. –This leads to problems if assumptions are made on the database. Differences in query Languages –Each autonomous system may use different query languages, or different versions. SQL comes in multiple versions The way the Query is set can have a drastic effect on network traffic and database speed.
Desirable Transparencies Location: The command used to perform a task is independent of the location of data and the location of the system where the command was issued. Naming transparency : Once an object is named no additional network information is needed to access it. Replication User is unaware of where the data is being saved. Fragmentation Data may be partitioned vertically or horizontally or mixed, users completely unaware of the fragmentation.