Data Quality Class 5. Goals Project Data Quality Rules (Continued) Example Use of Data Quality Rules.

Slides:



Advertisements
Similar presentations
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Advertisements

The Relational Database Model
Maintenance Modifying the data –Add records –Delete records –Update records Modifying the design –Add fields into tables –Remove fields from a table –Change.
Chapter 3 The Relational Model Transparencies © Pearson Education Limited 1995, 2005.
ETEC 100 Information Technology
Data Quality Class 4. Goals Discuss Project Midterm Statistical Process Control Data Quality Rules.
Data Quality Class 3. Goals Dimensions of Data Quality Enterprise Reference Data Data Parsing.
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
Introduction to Databases CIS 5.2. Where would you find info about yourself stored in a computer? College Physician’s office Library Grocery Store Dentist’s.
Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.
The Relational Database Model. 2 Objectives How relational database model takes a logical view of data Understand how the relational model’s basic components.
Database Features Lecture 2. Desirable features in an information system Integrity Referential integrity Data independence Controlled redundancy Security.
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 3 The Relational Database Model.
LOGICAL DATABASE DESIGN
Team Dosen UMN Physical DB Design Connolly Book Chapter 18.
Chapter 14 & 15 Conceptual & Logical Database Design Methodology
The Relational Database Model
Module 3: Table Selection
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
Chapter 4 The Relational Model Pearson Education © 2014.
© Pearson Education Limited, Chapter 2 The Relational Model Transparencies.
Chapter 4 The Relational Model.
Chapter 3 The Relational Model Transparencies Last Updated: Pebruari 2011 By M. Arief
Introduction –All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar.
Introduction to Accounting Information Systems
Chapter 9 Designing Databases Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Concepts and Terminology Introduction to Database.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Constraints  Constraints are used to enforce rules at table level.  Constraints prevent the deletion of a table if there is dependencies.  The following.
CODD’s 12 RULES OF RELATIONAL DATABASE
Chapter 16 Methodology – Physical Database Design for Relational Databases.
ABC Insurance Co. Paul Barry Steve Randolph Jing Zhou CSC8490 Database Systems & File Management Dr. Goelman Villanova University August 2, 2004.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
DATABASE MGMT SYSTEM (BCS 1423) Chapter 5: Methodology – Conceptual Database Design.
1 Design Issues in XML Databases Ref: Designing XML Databases by Mark Graves.
Databases Shortfalls of file management systems Structure of a database Database administration Database Management system Hierarchical Databases Network.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
1 SQL - II Data Constraints –Applying data constraints Types of data constraints –I/O constraints The PRIMARY KEY constraints The FOREIGN KEY constraints.
Chapter 4 Constraints Oracle 10g: SQL. Oracle 10g: SQL 2 Objectives Explain the purpose of constraints in a table Distinguish among PRIMARY KEY, FOREIGN.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 3 The Relational Database Model.
IS 320 Notes for April 15, Learning Objectives Understand database concepts. Use normalization to efficiently store data in a database. Use.
Constraints Lesson 8. Skills Matrix Constraints Domain Integrity: A domain refers to a column in a table. Domain integrity includes data types, rules,
The Relational Model. 2 Relational Model Terminology u A relation is a table with columns and rows. –Only applies to logical structure of the database,
Session 1 Module 1: Introduction to Data Integrity
Object storage and object interoperability
Data Profiling 13 th Meeting Course Name: Business Intelligence Year: 2009.
The Relational Model © Pearson Education Limited 1995, 2005 Bayu Adhi Tama, M.T.I.
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
Chapter 3 The Relational Model. Objectives u Terminology of relational model. u How tables are used to represent data. u Connection between mathematical.
Week 2 Lecture The Relational Database Model Samuel ConnSamuel Conn, Faculty Suggestions for using the Lecture Slides.
INFORMATION TECHNOLOGY DATABASE MANAGEMENT. A database is a collection of information organized to provide efficient retrieval. The collected information.
Chapter 4 The Relational Model Pearson Education © 2009.
Welcome: To the fifth learning sequence “ Data Models “ Recap : In the previous learning sequence, we discussed The Database concepts. Present learning:
CDT/1 Creating data tables and Referential Integrity Objective –To learn about the data constraints supported by SQL2 –To be able to relate tables together.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Understanding Data Storage
Chapter 6 - Database Implementation and Use
Lecture 2 The Relational Model
Normalization Referential Integrity
Chapter 4 The Relational Model Pearson Education © 2009.
Database Systems: Design, Implementation, and Management Ninth Edition
SQL DATA CONSTRAINTS.
Data Model.
Chapter 4 The Relational Model Pearson Education © 2009.
Database Design: Relational Model
Relational data model. Codd's Rule E.F Codd was a Computer Scientist who invented Relational model for Database management. Based on relational model,
Presentation transcript:

Data Quality Class 5

Goals Project Data Quality Rules (Continued) Example Use of Data Quality Rules

Data Quality Rules Classes 1) Null value rules 2) Value rules 3) Domain membership rules 4) Domain Mappings 5) Relation rules 6) Table, Cross-table, and Cross-message assertions 7) In-Process directives 8) Operational Directives 9) Other rules

Representing Data Quality Rules Data is divided into 2 sets: –conformers –violators Sets can be represented using SQL Create SQL statements representing violating set

Using SQL Direct queries Embedded queries –Using ODBC/JDBC, can create validation scripts in C C++ Java Visual Basic Etc.

Null Value Representations Maintain a table of null representation types and names: create table nullreps ( namevarchar(30), nulltypechar(1), description varchar(1024), sourcevarchar(512), nullval varchar(100), nullrepidinteger );

Null Value Rules Allows nulls –If the rule is “allows nulls” without any additional characterization Nothing necessary –If the rule is “allows nulls,” but only of a specific type Must check for real nulls (and possibly blanks and spaces): SELECT * from WHERE. is NULL;

Null Value Rules Does not allow nulls –Must check for nulls(and possibly blanks and spaces): SELECT * from WHERE. is NULL;

Value Rules Value rule is specified as some set of constraints Makes use of operators and functions: –+, -, *, /,, >=, !=, ==, AND, OR –User defined functions Example: –value >= 0 AND value <= 100

Value Rules 2 Validation test is opposite of constraint Use DeMorgan’s laws –If constraint was “value >= 0 AND value <= 100), use: SELECT * from where. < 0 OR. > 100;

Domain Membership Domains are stored in a database table Test for domain membership of an attribute is a test to make sure that all values are represented in domain table

Domain Reference Tables create table domainref ( namevarchar(30), dtypechar(1), descriptionvarchar(1024), sourcevarchar(512), domainidinteger );

Domain Reference Tables create table domainvals ( domainidinteger, valuevarchar(128) );

Domain Membership Test for membership of attribute foo in the domain named bar : SELECT * from where foo not in (SELECT value from domainvals where domainid = (SELECT domainid from domainref where domainref.name = “bar”));

Domain Assignment The values in the attribute define the domain: –Find all the values not in the domain already –Update domain tables with those values

Domain Assignment 2 SELECT * from where foo not in (SELECT value from domainvals where domainid = (SELECT domainid from domainref where domainref.name = “bar”)); For all values in this set, create a record with (the value, the domain id for “bar”), and insert into domainvals.

Mapping Membership Similar to domain membership, except: –Must include domain membership tests for both values –Also must be looked up in the mapping tables

Completeness Defines when a record is complete –Ex: IF (Orders.Total > 0.0), Complete With {Orders.Billing_Street, Orders.Billing_City, Orders.Billing_State, Orders.Billing_ZIP} Format: –Condition –List of fields that must be complete

Completeness 2 Equivalent to a set of null tests using condition Select * from where and ;

Exemption Defines which fields may be missing IF (Orders.Item_Class != “CLOTHING”) Exempt {Orders.Color, Orders.Size } Format: –Condition –List of fields that must be complete

Exemption 2 If condition is true, the fields may be null Therefore, if condition is false, fields may not be null Equivalent for test of opposite of condition and test for nulls

Consistency Define a relationship between attributes based on field content –IF (Employees.title == “Staff Member”) Then (Employees.Salary >= AND Employees.Salary < 30000) –Format: Condition Assertion

Consistency 2 If condition is true, the assertion must be true Equivalent to test for cases where the condition is true and the assertion is false: Select * from where and not ;

Derivation Prescriptive form of consistency rule Details how one attribute’s value is determined based on other attributes IF (Orders.NumberOrdered > 0) Then { Orders.Total = (Orders.NumberOrdered * Orders.Price) * 1.05 } Format: –Condition –assignment

Derivation 2 The assigned fields must be updated if condition is true Find all records where the condition is true Generate update SQL calls with updated values Execute updates

Functional Dependence Functional Dependence between columns X and Y: –For any two records R1 and R2 in a table, if field X of record R1 contains value x and field X of record R2 contains the same value x, then if field Y of record R1 contains the value y, then field Y of record R2 must contain the value y. In other words, attribute Y is said to be determined by attribute X.

Functional Dependence 2 Rule Format: –Attribute X determines Attribute Y Validation test makes sure that the functional dependence criterion is met This means that if we extract the X value from the set of all distinct value pairs, that set should have no duplicates

Functional Dependence 3 Create view FD as select distinct X, Y from ; Select count (*) from FD; Select count (distinct X) from ; These should be the same numbers.

Primary Key/Uniqueness A set of attributes defined as a primary key must uniquely identify a record Can also be viewed as a uniqueness constraint Format: –{attribute list} is PRIMARY –{attribute list} is UNIQUE

Primary Test to make sure that the number of distinct records with the expected key is the same as the number of records Select count(*) from ; Select count (distinct ) from ; These numbers should be the same

Uniqueness Test for multiple record occurrences with the same set of values that should have been unique, if there is a separate known primary key SELECT.,. FROM AS t1, AS t2 WHERE t1. = t2. and t1. <> t2. ;

Foreign Key When the values in field f in table T is chosen from the key values in field g in table S, field S.g is said to be a foreign key for field T.f If f is a foreign key, the key must exist in table S, column g (=referential integrity)

Foreign Key 2 Similar to primary key Test is to make sure that all values in foreign key field exist in target table Select * from where not in (Select distinct from );

Use of Data Quality Rules Data Validation Root Cause Analysis Message Transformation Data-driven GUIs Metadata Collection

Data Validation Translate rule set into select statements Create a program that: –Loads select statements into an array, indexed by a unique integer –Connects to database via ODBC –Iterates through the array of select statements those results

Data Validation 2 –Each type of rule has an expected result; check against the expected result –Outputs the result of each statement to output file, tagged by rule identifier –Results can be tallied to yield an overall percentage of valid records to total records

Root Cause Analysis Root cause analysis can be started by looking at the counts of violated rules Use the most frequently violated rule as a starting place

Message Transformation Electronic Data Interchange Use DQ rules to validate incoming messages Use DQ rules (derivations, mappings) to transform incoming messages into an internal format

Data-driven GUIs Data dependence is specified in a collection of rules Generate equivalence classes of data values based on dependence specification

Data-driven GUIS First, look for all independent attributes – this is class 0 For class i, collect all attributes that depend on class (i – 1) The GUI will be constructed to iteratively request data from class 0..n Based on the results from collecting data at step j, the rules associated with the actual values are applied, determining which values are requested at step j + 1

Metadata Collection Use domain and mapping derivation rules to collect metadata Use other rules as a documentation of business operations