Measuring referential Integrity in Distributed Databases Dhara Shah.

Slides:



Advertisements
Similar presentations
The Relational Database Model
Advertisements

MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
“Turn you Smart phone into Business phone “
Chapter 3 The Relational Model Transparencies © Pearson Education Limited 1995, 2005.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
Overview Distributed vs. decentralized Why distributed databases
Physical Database Monitoring and Tuning the Operational System.
FIS 318/618: Financial Systems & Databases Queries Oakland University School of Business Administration Accounting and Finance Joe Callaghan.
The Relational Database Model. 2 Objectives How relational database model takes a logical view of data Understand how the relational model’s basic components.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 3 The Relational Database Model.
Business Driven Technology Unit 2 Exploring Business Intelligence Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
3 1 Chapter 3 The Relational Database Model Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Databases & Data Warehouses Chapter 3 Database Processing.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 4 The Relational Model Pearson Education © 2014.
Chapters 17 & 18 Physical Database Design Methodology.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Chapter 3 The Relational Model. 2 Chapter 3 - Objectives u Terminology of relational model. u How tables are used to represent data. u Connection between.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
 2004 Prentice Hall, Inc. All rights reserved. 1 Segment – 6 Web Server & database.
1 The Relational Database Model. 2 Learning Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical.
MIS 327 Database Management system 1 MIS 327: DBMS Dr. Monther Tarawneh Dr. Monther Tarawneh Week 2: Basic Concepts.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
Chapter No 4 Query optimization and Data Integrity & Security.
Microsoft Access 2013 Design and Implement Powerful Relational Databases Chapter 6.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
1 Distributed Databases BUAD/American University Distributed Databases.
Database Systems, 9th Edition 1.  In this chapter, students will learn: That the relational database model offers a logical view of data About the relational.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 3 The Relational Database Model.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 3 The Relational Database Model.
Indexes and Views Unit 7.
Ing. Erick López Ch. M.R.I. Replicación Oracle. What is Replication  Replication is the process of copying and maintaining schema objects in multiple.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 3 The Relational Database Model.
The Relational Model. 2 Relational Model Terminology u A relation is a table with columns and rows. –Only applies to logical structure of the database,
Session 1 Module 1: Introduction to Data Integrity
Distributed DBMS, Query Processing and Optimization
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 3 The Relational Database Model.
Chapter 3: Relational Databases
Chapter 3 The Relational Database Model. Database Systems, 10th Edition 2 * Relational model * View data logically rather than physically * Table * Structural.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 3 The Relational Database Model.
Week 2 Lecture The Relational Database Model Samuel ConnSamuel Conn, Faculty Suggestions for using the Lecture Slides.
Chapter 4 The Relational Model Pearson Education © 2009.
Distributed Databases
Indexes By Adrienne Watt.
Database Systems: Design, Implementation, and Management Tenth Edition
Chapter 4 Relational Databases
Databases A brief introduction….
Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.
G063 - Distributed Databases
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 The Relational Model Pearson Education © 2009.
Physical Database Design
Chapter 4 The Relational Model Pearson Education © 2009.
The Relational Model Transparencies
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 The Relational Model Pearson Education © 2009.
Distributed Database Management Systems
Enhancing ER Diagrams to View Data Transformations Computed with Queries Carlos Ordonez, Ladjel Bellatreche UH (USA), ENSMA (France) 1.
Presentation transcript:

Measuring referential Integrity in Distributed Databases Dhara Shah

Introduction Distributed database: multiple databases residing at different locations which are communicated through the Internet. Violation of referential integrity due to similar content from different sources. Goal: Identify referential integrity problem to detect and avoid inconsistency or incompleteness. Promising alternative to detect and fix data quality issues in scientific database.

Assumptions Same tables but different content. Rows may have null values for primary key. Metadata has been integrated before. Content may be inconsistent due to both local and global issues. Broadcasting updates happens independently and asynchronously.

Column Metrics Metrics are measured on scale of [0…1] (1 being the optimal) lrcom(Ti.K) = |Ti K Tj | / |Ti| grcom(Ti.K) = |Ti K Tj | / |Ti| lrcon(Ti.F) = |Ti K,F Tj | / |Ti| grcon(Ti.K, Ti.F) = |Ti K,F Tj | / |Ti|

Table Metrics gcur(Ti) = |D1.Ti ∩ D2.Ti ∩ ・ ・ ・ ∩ Dn.Ti| / |D1.Ti ∪ D2.Ti ∪ ・ ・ ・ ∪ Dn.Ti| grcom(Ti) = Σ k j=1 |Ti|grcom(Ti.Kj ) / k|Ti| grcon(Ti) = Σ f j=1 |Ti|grcon(Ti.Fj ) / f|Ti|

Database Metrics lrcom(Di) = Σ m j=1 |Tj |lrcom(Tj ) / Σ j |Tj | lrcon(Di) = Σ m j=1 |Tj |lrcon(Tj ) / Σ j |Tj | grcom(D) = Σ m j=1 |Tj |grcom(Tj ) / Σ j |Tj | grcon(D) = Σ m j=1 |Tj |grcon(Tj ) / Σ j |Tj |

Query Optimization Local metrics in a single database  Aggregations grouping by FK before joins for table with several FKs.  Creating secondary index on each FK. Global metrics in distributed database  Transfer n-1 copies to central site  Compute metrics at one site and then incrementally update  Compute metrics for each pair of tables linked by a FK  Smallest table is transferred when join is required for two tables at different sites

Applications Applications w/ Scientific Databases  Central database: need fast connection and should be available all time  Local database: flexible and faster, many have more referential errors Program:  uses Logical data model (LDM) to calculate metrics.  Has graphical user interface, list which explains why errors happend

Conclusion Related work:  MOCHA: middleware system to integrate distributed data sources. Metrics that measure absolute and relative error w/ respect to referential integrity. Measures completeness and consistency. Raises new issues such as distributed query optimizations.

Citation Authors: Carlos Ordonez, Javier Garcia-Garcia, Zhibo Chen Title: Measuring Referential Integrity in Distributed Databases Name of Journal: CIMS '07 Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience Publication Date: November 2007 Page Range: 61-66