Distributed Database Management Systems
Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring 20112
Design Issues Placing of data and programs (DBMS and application) Placing of data and programs (DBMS and application) Network issues Network issues FarkasCSCE Spring 20113
Level of Sharing No sharing No sharing Data sharing Data sharing Data and program sharing Data and program sharing FarkasCSCE Spring Heterogeneous environment!
Top-Down Design Global Conceptual schema distribution Global Conceptual schema distribution –Fragmentation –Replication –Allocation Figure 3.2 Figure 3.2 FarkasCSCE Spring 20115
Correctness of Fragmentation 1. Completeness: F R ={R 1, …, R n } 2. Reconstruction: R= R i, R i R 3. Disjointness: –Horizontal: does not d j R i such that d j R k where k i –Vertical: same as horizontal for non- primary key attributes FarkasCSCE Spring &2: Lossless-join (normalization)
Data Directory Global vs. local conceptual schemas Global vs. local conceptual schemas –How to search? –Where to store? –Single vs. multiple copies? FarkasCSCE Spring 20117
Current Research Allocation: new requirements, technology, etc. Allocation: new requirements, technology, etc. Where to store the fragments? Where to store the fragments? Dynamic environment Dynamic environment –Usage pattern –Application characteristics –Network changes –Security FarkasCSCE Spring 20118
Bottom-Up Approach Multi-database systems Multi-database systems How to integrate them into 1 database? How to integrate them into 1 database? –Interoperability FarkasCSCE Spring 20119
Database Integration Physical integration Physical integration –Materialized database: data warehouses –Extract-transform-load (ETL) tools Logical integration Logical integration –Virtual (not materialized) integration –Enterprise Information Integration FarkasCSCE Spring
Data Warehouses On-line Analytical Processing (OLAP) applications: On-line Analytical Processing (OLAP) applications: –Decision support systems –Trend analysis and forecasting Complex queries, large databases Complex queries, large databases Materialized view maintanence Materialized view maintanence FarkasCSCE Spring
Logical Integration No materialized global database No materialized global database Virtual integration: data remains at the local (operational) databases Virtual integration: data remains at the local (operational) databases Global conceptual schema may not contain everything from local schemas Global conceptual schema may not contain everything from local schemas Autonomous and heterogeneous local systems Autonomous and heterogeneous local systems FarkasCSCE Spring
Bottom-Up Design Global Conceptual Schema (GCS or mediated schema) Global Conceptual Schema (GCS or mediated schema) –Defined first: local conceptual schemas (LCS) are mapped to GCS –Defined during the integration of the LCSs and develop the corresponding mappings from LCSs to the GCS FarkasCSCE Spring
GCS Defined First Local-as-view (LAV) systems Local-as-view (LAV) systems –Each LCS is treated as a view over the GCS –Query results: constrained to the objects in the local DBs while the GCS definition may be richer –Potential incomplete answers Global-as-view GCS is defined as a set of views over the LCSs Global-as-view GCS is defined as a set of views over the LCSs –View definition defines how to derive elements of the GCS –Query results: constrained to the GCS while the local DBs might be richer FarkasCSCE Spring
Design Tasks Schema translation Schema translation Schema generation Schema generation Figure 4.3 Figure 4.3 FarkasCSCE Spring
Intermediate Canonical Representation Expressive to incorporate all concepts in the local databases Expressive to incorporate all concepts in the local databases Simple, intuitive, practical, etc. Simple, intuitive, practical, etc. Example: E/R model, relational model, graph/tree models, etc. Example: E/R model, relational model, graph/tree models, etc. Tools Tools FarkasCSCE Spring
Schema Generation Schema matching: syntax and semantics Schema matching: syntax and semantics Integration of common schema elements Integration of common schema elements Schema mapping Schema mapping See example 4.1, 4.2 See example 4.1, 4.2 FarkasCSCE Spring
Schema Matching Defined or discovered (e.g., web data) Defined or discovered (e.g., web data) Rules: Rules: –Correspondence between 2 elements –Predicate whether the correspondence holds or not –Similarity value between the 2 elements FarkasCSCE Spring
Finding Correspondence Difficult process due to schema heterogeneity Difficult process due to schema heterogeneity Can be automated? Can be automated? –Insufficient schema and instance information –Unavailability of schema documentation –Subjectivity of matching FarkasCSCE Spring
Matching Algorithm Issues Schema vs. instance matching Schema vs. instance matching –Concept match –Data instance: semantic inconsistencies Element-level vs. structure-level mapping Element-level vs. structure-level mapping –Element name semantics –Multiple attribute mapping? Matching cardinality Matching cardinality –One-to-one, one-to-many, many-to-many FarkasCSCE Spring
Semantic Schema Heterogeneity Semantic: meaning, interpretation, and intended use of data Semantic: meaning, interpretation, and intended use of data –Synonyms, homonyms, hypernyms –Different ontologies –Imprecise wording FarkasCSCE Spring
Structural Schema Heterogeneity –Type conflict: attribute vs. entity –Dependency conflict: mapping cardinality inconsistencies –Key conflict: different primary keys –Behavioral conflict: modeling assumptions, e.g., referential integrity, deletion, etc. Farkas CSCE Spring
Schema Integration Binary Binary N-ary N-ary FarkasCSCE Spring
Schema Mapping How the data from local databases can be mapped to GCS How the data from local databases can be mapped to GCS Mapping creating Mapping creating Mapping maintanence Mapping maintanence FarkasCSCE Spring
Mapping Creation Input: LCS, GCS, M (schema matches) Input: LCS, GCS, M (schema matches) Output: Q={Q 1, …, Q k } such that Output: Q={Q 1, …, Q k } such that –DB GCS = Q(DB CLS ) FarkasCSCE Spring
Security Objectives Confidentiality Confidentiality Integrity Integrity Availability Availability FarkasCSCE Spring
Question 1 How distributed databases impact the security objectives? How distributed databases impact the security objectives? –Confidentiality in traditional vs. distributed DBs –Integrity in traditional vs. distributed DBs –Availability in traditional vs. distributed DBs FarkasCSCE Spring
Integrity Correctness criteria Correctness criteria –Top-down design –Bottom-up design FarkasCSCE Spring
Availability What are the issues related to availability when dealing with What are the issues related to availability when dealing with –Top-down design –Bottom-up design FarkasCSCE Spring
Confidentiality (will be covered in 2 nd part of semester but…) (will be covered in 2 nd part of semester but…) Centralized vs. distributed security policy Centralized vs. distributed security policy –Top-down design –Bottom-up design FarkasCSCE Spring
FarkasCSCE Spring Next Class Semantics-based Database Integration