PCLec08 / 1 8. Integrity This session will be directed at the many and various aspects of ‘Integrity’ Integrity processes are included to ensure that the data in the database, and the information derived from data is clear, complete and accurate. And we will have another quick look at ‘Business Rules’ also
PCLec08 / 3 Integrity Components of a Data Model Structure Operators Integrity
PCLec08 / 4 Dangers to Data A DBMS must protect data from several dangers. - Accidents( programming errors and miskeying) : these are integrity issues. - Malicious Use : A security issue. - Hardware and Software Failures : these are concurrency and restart issues.
PCLec08 / 5 Definitions of Integrity Data integrity requires the database to be an accurate reflection of the real world Data should be valid and complete Integrity issues may have been handled external to the database in the application code and possibly in multiple programs. Codd (1985) states that integrity constraints specific to a particular RDBMS must be definable in SQL and stored in the database dictionary (catalog).
PCLec08 / 6 A DBMS Enforced Integrity Employee Table Empno, emp_name,Age, Salary EMPNO NUMBER(6,0) NOT NULL Attempt to add an employee without an EMPNO value INSERT INTO EMPLOYEE (EMP_NAME,AGE,SALARY) VALUES (‘Smith’,22,35000) This process is rejected by the DBMS but what would happen if the user entered (35000,’Smith’,22 ) ?
PCLec08 / 7 An Applications Forced Integrity A further constraint IF AGE > 16 OR AGE < 99 THEN O-K ELSE REJECT ‘Age Invalid’ This represents a segment of program code.
PCLec08 / 8 Integrity Enforcement Integrity enforcement is usually split between the DBMS and the application programs. Using application programs for integrity assertions has disadvantages. Programming is more complex Integrity constraints may be repeated Change management is difficult Constraints may contradict Ad hoc operations may avoid the constraints
PCLec08 / 9 Integrity as a Role of the DBMS Integrity rules must be considered at design time Transactions must be monitored for violations and appropriate actions taken Rules should be few, without overlap and should not impact performance too much (this is not an invitation to exclude rules)
PCLec08 / 10 Classifications of Integrities There are various possible classifications. Date(Vol 2) - Domain Integrity (attribute based) - Relational Integrity (table based)
PCLec08 / 11 Codd’s CURED or CRUDE Type E - Entity Integrity Type R - Referential Integrity Type D - Domain Integrity :A user defined datatype Type C - Column Integrity:Linked to Domain integrity Type U - User Defined Integrity
PCLec08 / 12 Data Integrity Some terms you will encounter: Entity Integrity Referential Integrity Functional Dependency (constraints between determinants and attributes. For each value of the determinant there is only one value for each of the attributes it determines) Multivalued Dependency Join Dependency Domain Constraints Cardinality Constraint User Defined Constraints
PCLec08 / 13 Data Integrity General Principle: Data compliance with a set of rules Rules Location: Best embodied in the DBMS If they are contained in an application, there is the danger of saturating a network and causing degraded performance. This is particularly so in client / server computing - but are ALL the rules applicable to ALL users ? CONSTRAINTS: Declarative approach where integrity constraints are ‘declared’ as part of a table specification. ANSI SQL-89 and SQL-92 and SQL-93 standards include specifications for integrity constraints syntax and behaviour
PCLec08 / 14 Domain Integrity A domain is a conceptual pool of values from which one or more attributes draw their actual values. Domain age range Attribute employee_age Two values can only be compared if they come from the same domain.
PCLec08 / 15 Primary Key Integrity (based on Oracle) A primary key has these properties –unique value (no duplicates permitted in table) –not null –multiple keys if required –referenced qualification - foreign key(s) –may be limited to a small range of values (the check option) –may be limited to a large range of values (the ‘exists’ option)
PCLec08 / 16 Foreign Key Properties May be unique ( 1 : 1 relationship) May be multiple keys May be limited to a range of values (Domain -as for primary Keys) May be null (as required) May be not null (as required) Will reference a Primary Key (or keys) May be subject to cascade update, delete, insert
PCLec08 / 17 A Domain Definition DOMAIN GENDER –Data Type: Character –Length: 6 bytes –Allowable Values: Male, Female, Null –Storage Format: Uppercase –Operations Allowed: –Inherited Operators: String, Unstring, = –Input Editing: Nil –Extra Functions: Is_ Male, Is_Female,What_Gender
PCLec08 / 18 Timing Constraints When should an integrity be checked ? TC - Test constraint no later than the end of the current relational request. TT - Test at the end of the transaction.(terminal test) START TRANSACTION UPDATE EMPLOYEE SET SALARY= SALARY*1.1 TC DELETE FROM EMPLOYEE WHERE SALARY > 1000 END TRANSACTION TT In Oracle, the integrity check is determined by the commands ‘Update Immediate’ or ‘Update Deferred’ There is also a ‘set constraints.. Immediate or deferred’ option
PCLec08 / 19 A Few Examples A transaction is a unit of work 1. Single Table - The transaction affects 1 row only does not alter any domain setting. 2. Single Table - The transaction affects multiple rows and will affect domain settings. When should the domain integrity breach be reported ? At the first, second - or at the end of the processes ? When should the transaction be aborted ? Should there be a log held of these occurrences/rows ?
PCLec08 / 20 A Few Examples 3. Single Table - bulk loading. Should the load process be stopped at the detection of the first breach ? Or should the load row be ‘diverted’ to a log file ? Should there be a number count of failures ? Should there be a limit over which the process should be stopped ?
PCLec08 / 21 Slightly More Complex 1:M It is possible that multiple rows in table A, table B and table C will be affected by the transaction Transaction A B C
PCLec08 / 22 An Algorithm for Integrity Checks Determine constraints that apply to request. Inspect timing types. Before the end of the relational request run types TC. Append types TT to the end of the transaction. Before the end of transaction run types TT.
PCLec08 / 23 Foreign Key Rules For each foreign key three rules need to be answered: Can the foreign key accept nulls ? What should happen on an attempt to delete the target of a foreign key reference? What should happen on an attempt to update the target of a foreign key reference ? (primary key) EmployeeDept Empno e1 e2 e3 ename red blue brown Worksfordept d1 d2 Dept d1 d2 d3 Dname Pay Tax Art
PCLec08 / 24 Foreign Key Rules When should foreign key rules be checked ? Dept (Deptno, Dname, Budget) Emp (Empno, Ename, Salary, WorksforDeptno) WorksforDeptno References Dept delete cascades, update cascades Depend (Empno, Dependname, Date-of-birth) Empno references Emp delete cascades, update cascades
PCLec08 / 25 Foreign Keys and Referencing Action CREATE TABLE SUPPLIER etc. Primary Key (Sno ) CREATE TABLE PART etc. Primary Key( Pno ) CREATE TABLE SUPPLIER_PART(etc. Primary Key (Sno,Pno) Foreign Key (Sno) REFERENCES SUPPLIER (Sno) ON DELETE RESTRICT ON UPDATE CASCADES Foreign Key (Pno)REFERENCES PART (Pno) ON DELETE RESTRICT ON UPDATE CASCADES) e.g. SUPPLIER (Sno,Sname) PART (Pno,Pname) SUPPLIER-PART (Sno,Pno,Qty)
PCLec08 / 26 Foreign Keys and Referencing Action The relation each Foreign Key identifies is defined. The foreign key clause also contains other information. DELETE when the target record of a foreign key reference is detected Performs the operation - CASCADE - all matching SUPPLIER-PART records are also deleted. RESTRICT - the delete is restricted such that there are no matching SUPPLIER-PART records. SET NULL - the foreign key values are all set to NULL (only if nulls are allowed)
PCLec08 / 27 Foreign Keys and Referencing Action UPDATE when the Primary Key of the target record of a foreign key is updated. CASCADE RESTRICT SET NULL These options are similar to delete. Note that the design decisions embodied in pseudo SQL represent constraint information which reflects the nature or business rules of the organisation being modelled.
PCLec08 / 28 Possible Referential Integrity Processes 1. Limited Insert : If an incoming Foreign Key DOES NOT EXIST as a referenced table Primary Key: ABORT TRANSACTION - REPORT 2. Limited Update : If an incoming Foreign Key DOES NOT EXIST as a referenced table Primary Key TERMINATE PROCESS 3. Restricted Delete : If there are referencing FOREIGN KEYS in a referencing table TERMINATE DELETE PROCESS ON REFERENCED TABLE
PCLec08 / 29 Possible Referential Integrity Processes 4. Restricted Update : If there are referencing Foreign Keys in a referencing table INHIBIT UPDATE OPERATION ON THE REFERENCED KEY 5. Cascade Delete : If there are Referenced Keys INITIATE DELETION OPERATION ON REFERENCED TABLE BY DELETING ALL REFERENCING ROWS 6. Cascade Update : Commence an UPDATE on the REFERENCED TABLE by UPDATING the Foreign Keys on all Referencing Rows in the Referencing Table(s)
PCLec08 / 30 Possible Referential Integrity Processes 7. Nullify Delete : Commence a DELETE operation on the REFERENCED table by setting ALL the FOREIGN KEYS on the Referencing Table(s) to NULL (watch Data Types) 8. Nullify Update : Set all of the Foreign Keys of the Referencing Table to NULL. This will invalidate any referencing of the Referenced Key (which must not be NULL ) 9. Default Update : Invalidate references to Updated Referenced Keys by setting all Referencing Table Foreign Keys to a DEFAULT value
PCLec08 / 31 Possible Referential Integrity Processes 10. Default Delete : Invalidate references to the deleted Referencing Key Value(s) by setting all Referencing Foreign Key values to a DEFAULT value 11. Warning Delete : Permit the deletion BUT Warn the User of the Unattached Foreign Keys which are now present in the Referencing Table(s) 12. Warning Update : Permit the Update BUT Warn the User of Unattached Foreign Keys which are now present in the Referencing Table(s)
PCLec08 / 32 Some Integrity Schema Examples Create table monash1( cityvarchar2(13) not null, studydatedate not null, noonreadingnumber(4,1), midnightreadingnumber(4,1), rainfallnumber, unique (city,studydate) ); Creates a table with the candidate key of city,studydate There may be a number of Unique constraints
PCLec08 / 33 Some Integrity Schema Examples Create table monash1( cityvarchar2(13) not null, studydatedate not null, noonreadingnumber(4,1), midnightreadingnumber(4,1), rainfallnumber, primary key (city,studydate) ); Creates a table with the Primary Key of city,studydate and there is only 1 such set of values in the table. There may be a number of Unique constraints.
PCLec08 / 34 Some Integrity Schema Examples Create table monash1( cityvarchar2(13) not null, studydatedate not null, noonreadingnumber(4,1), midnightreadingnumber(4,1), rainfallnumber, constraint pk_citystudy primary key (city,studydate) ); Creates a table with the Primary Key key of city,studydate and names the constraint citystudy in the Constraints table.
PCLec08 / 35 Enable, Disable There is a feature in Oracle which permits the Disabling and Enabling of constraints. e.g. alter table shipping add primary key (ship_no, container_no) This identifies the composite primary key as ship_no + container_no, and ensures that no two rows have the same values. The Disable option defines the constraint but does not enforce it. The Enable function resets the enforcement of the constraint.
PCLec08 / 36 Enable, Disable The formats of the ‘disable’ and ‘enable’ commands are : disable {{unique(column[,column]…) | primary key | Constrains constraint} [cascade] } | all triggers enable {{ unique(column[,column]…| primary key | [using index [initrans integer] [maxtrans integer] [Tablespace tablespace] [Storage storage] all triggers
PCLec08 / 37 Triggers Oracle triggers are used to include more processing power to the DBMS function for events which affect a database. In the following example a Trigger will be set which ensures that changes to employee records will only take place during business hours on working days ( security ?) See if you agree...
PCLec08 / 38 Triggers Create trigger emp_permit_change before delete or insert or update on emp declare dummy integer; begin /* if today is a Saturday or Sunday, then return an error*/ if (to_char(sysdate, ‘dy’) = ‘sat’ or to_char (sysdate, ‘dy’) = ‘sun’) then raise_application_error (-20501, ‘May not change employee table during the weekend’); end if;
PCLec08 / 39 Triggers Perhaps we need this as well :- If (to_char(sysdate, ‘hh24’) < 8 or to_char(sysdate, ‘hh24’) >= 18) then raise application_error (-20502, ‘May only change employee table during working hours’); end if; end; which raises and interesting point - what happens with flexible time and enterprise bargaining ?
PCLec08 / 40 Something different..
PCLec08 / 41 Business Rules and Data Modelling Business Rules are necessary to ensure that data in a database reflects accurately those conditions which apply to data in the real world environment The following overheads introduce some additional material on this subject
PCLec08 / 42 Business Rules and Data Modelling Business Rules are at the core of commercial applications If systems ‘obey’ the Business Rules, then – data will be correct –applications will function –users and management will be happy Which leads us to –what is a business rule ? –where are they declared ? –where are they enforced ?
PCLec08 / 43 Business Rules and Data Modelling 4 Proposed Levels of Business Rules 1. Single attribute (column) format definitions enforced by the database The ‘payment’ column is an amount interpreted as dollars and cents The Surname column is a text field expressed in the ASCII character set The Amount_on_Hand column must never be less than 0
PCLec08 / 44 Business Rules and Data Modelling 2. Multiple key column relationships The ‘Brand Name’ column in the Brand table has a many to one relationship with the Manufacturer Name in the Manufacturer table The Product foreign key in the Sales table has a many to one relationship to the Product primary key in the Product table
PCLec08 / 45 Business Rules and Data Modelling 3. Relationships between Entities This is declared on the entity-relationship diagram, but is not directly enforced by the database because the relationship is many-to-many The employee is a sub-type of Person Supplier supplies the Customer
PCLec08 / 46 Business Rules and Data Modelling 4. Complex Business Logic This relates to Business processes It may be enforced at data entry time by a complex application such as this :- “When an insurance policy has been committed but has not yet been approved by the underwriter, the administration date can be NULL, but when the policy has been underwritten, the administration date must be present (NOT NULL) and must be more recent than the agreement date”
PCLec08 / 47 Business Rules and Data Modelling From this it can be stated that : The core database software manages the first 2 levels only - the single column format definitions and multiple column key relationships Level 3 (relationships between entities) and Level 4 (complex business logic) should also be enforced as there is much valuable business content at this level (or should that be essential ?)
PCLec08 / 48 Business Rules and Data Modelling Entity relationship modelling (E/R modelling) seems to be a comprehensive language for mapping and describing relationships between entities. E/R modelling is a diagrammatic technique which specifies one-to-one many-to-one and many-to-many relationships among data elements It is a logical model
PCLec08 / 49 Business Rules and Data Modelling Computer Associates’ Erwin converts an E/R diagram into data definition language declarations These declarations define key definitions and join constraints You can follow this up, and use an Erwin example at Gershwin, which you have probably met, is a simpler E/R modelling tool
PCLec08 / 50 Business Rules and Data Modelling E/R modelling is a useful technique for beginning the process of understanding and enforcing business rules It does not provide a guarantee of completeness E/R Modelling is incomplete in that the diagrams represent only what the designer decided to stress, or was aware of. There is no test of an E/R diagram to determine if the designer has specified all possible one-to-one, one-to-many or many-to-many relationships.
PCLec08 / 51 Business Rules and Data Modelling E/R modelling is not unique A given set of data relationships can be represented by a number of E/R diagrams Many real data relationships are many-to-many. The E/R diagram model does not enforce the M:N situations which may involve various conditions and degrees of correlation which would be useful (and perhaps essential) to include a business rules. E/R modelling provides no extensions to the basic many-to-many declaration
PCLec08 / 52 Business Rules and Data Modelling Many E/R models are ideal, not real Many corporate models are based on ‘how things should be’ This is very useful in understanding the business BUT if the model must be populated with real data E/R models are rarely models of real data There aren’t any tools for trawling over real data data sets and developing E/R models The E/R model is invariably constructed and the data is ‘fitted’ into the model - and that means we need to clean data before it becomes resident in the model.
PCLec08 / 53 Business Rules and Data modelling E/R models lead to complex schemas which mitigate against the objectives of Information Delivery As an example, the E/R diagram which underpins Oracle Financials (a current Applications Package) requires approximately 2000 tables SAP’s model can require 10,000 tables. All of which tends to work against the objectives of easy to understand models, and high performance.
PCLec08 / 54 Business Rules and Data Modelling Chris Date (An Introduction to Database Systems, 7th edition) has this to say : ‘the E/R model is incapable of dealing with integrity constraints or ‘business rules’ except for a few special cases. Declarative rules are too complex to be captured as part of the business model and must be defined separately by the analyst/developer’.