DISTRIBUTED DATABASES AND DDBMS
Understand the concept of “Distributed Data” Describe various Distributed Data and DDBMS implementations Explain how database design affects the DDBMS environment Apply DDBMS principles to solve problems Learning Objectives
Distributed Database: A single logical database that is spread physically across computers in multiple locations that are connected by a data communications link Decentralized Database: A collection of independent databases on non-networked computers They are not the same thing! Definitions
What are we talking about here? Key Questions: Are components of the application in more than one place? Are the data in more than one place? Does the app use more than one DBMS or “system” for data management? Which facets, if any, are transparent to users?
Why distribute your app or data? It’s hard. It’s complex. So why do it? Scalability. Redundancy.
Application Complexity Monolithic Everything works / is contained within one computer. Ex. Ms Word Distributed Various working pieces are in different physical places, working over a computer network. Ex. Google Docs
Data Distribution Single Site Data (Simple) All data stored in / retrieved from one place on a network. Ex. Wordpress Multi-Site Data (Complex) Various parts of the data come from various sites on a network. Ex. My Slice, DNS
Data Complexity All data associated with the application is stored in the same DBMS Ex. Wordpress Various data components of the application are stored in different DBMSes Ex. SU Blackboard, Facebook Homogeneous (Easier)Heterogeneous (More Difficult)
Multisite Data DBMS Options Horizontal Partitioning – Distributing data by row Vertical Partitioning – Distributing data by table or column. Replication – Copying data either on a schedule or in real-time
Summary: The taxonomy App MonolithicDistributed Single SiteMulti Site Homo.Hetero. Multi Site Horiz. Partitioned Vert. Partitoned Replicated
Homogeneous == Same DBMS Customers Sales Staff Orders CRM Db Customers Sales Staff N. America Orders Europe User’s View of Db Actual Implementation Oracle Same
Heterogeneous == Multiple DBMS Customers Sales Staff Orders CRM Db Customers Sales Staff N. America Orders Europe User’s View of Db Actual Implementation Oracle MySQL Orders Invoices Europe File System
Example of Replication Customers Sales Staff Orders CRM Db All Customers All Sales Staff All Orders N. America All Customers All Sales Staff All Orders Europe User’s View of Db Actual Implementation Master Replica
Example of Horizontal Partitioning Customers Sales Staff Orders CRM Db NA Customers NA Sales Staff NA Orders N. America E Customers E Sales Staff E Orders Europe User’s View of Db Actual Implementation
Example of Vertical Partitioning Financials Customer Service Prod. Support Human Resources ERP System Financials Human Resources N. America Customer Service Prod Support Europe User’s View of Db Actual Implementation
5 Typical Distributed Databases Centralized with Single Site Data Replicated with Snapshots (in real time) Replicated with Synchronization (on demand, or a schedule) Integrated Partitions ( Paritioning in data center) Independent Partitions (Geographically distributed partitioning)
5 Typical Distributed Databases
Location Transparency User/application does not need to know where data resides Replication Transparency User/application does not need to know about duplication of data Failure Transparency Either all or none of the actions of a transaction are committed Transparency is difficult but important. The greater the distribution of data the more there will be a need for transparency to offset the complexity. Transparency
Applying The Concepts Via Example: Monolithic or Distributed? Single Site or Multi Site data? If multi-site: H / V Partitioned or Replicated? Homogeneous or Heterogeneous? Location Transparency? Replication Transparency? Failure Transparency?
DISTRIBUTED DATABASE AND DDBMS Questions?