CS530 Database Architecture Models and Design Prof. Ian HORROCKS Dr. Robert Stevens Wednesday - Practical Tables.

Slides:



Advertisements
Similar presentations
Normalisation. Informal guidelines  Semantics of the attributes  easy to explain relation  doesn’t mix concepts  Reducing the redundant values in.
Advertisements

Relational Database. Relational database: a set of relations Relation: made up of 2 parts: − Schema : specifies the name of relations, plus name and type.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Ch 10, Functional Dependencies and Normal forms
The Relational Model System Development Life Cycle Normalisation
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 © Prentice Hall, 2002 Chapter 5: Logical Database Design and the Relational Model Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Databases 6: Normalization
Michael F. Price College of Business Chapter 6: Logical database design and the relational model.
Data Definition, Relational Manipulation and Data Control Using SQL.
Normalisation Example CS2312. Normalisation Example BEER_DATABASE Additional Notes: Warehouses are shared by breweries. Each beer is unique to the brewer.
DATA NORMALISATION Pamela Quick. Data Normalisation 2 Objectives  Data normalisation aims to derive record structures which avoid anomalies in u Insertion.
Daniel AdinugrohoDatabase Programming 1 DATABASE PROGRAMMING Lecture on 29 – 04 – 2005.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Week 6 Lecture Normalization
Relational Data Model. A Brief History of Data Models  1950s file systems, punched cards  1960s hierarchical  IMS  1970s network  CODASYL, IDMS 
Relational Query Languages. Languages of DBMS  Data Definition Language DDL  define the schema and storage stored in a Data Dictionary  Data Manipulation.
Introduction to Database Systems Mapping ER Models to Relational Schemas Irvanizam Zamanhuri, M.Sc Informatics (Computer Science) Study Program Syiah Kuala.
Concepts and Terminology Introduction to Database.
Keys  SuperKey  a set of attributes whose values together uniquely identify a tuple in a relation  Candidate Key  a superkey for which no proper subset.
Database Systems Lecture # 7 8 th Feb, Conceptual and Logical Design Person buys Product name pricenamessn Conceptual Model: Relational Model: (plus.
(Extended) Entity Relationship Modelling and Mappings to the Relational Data Model.
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
Normalization. Learners Support Publications 2 Objectives u The purpose of normalization. u The problems associated with redundant data.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Instructor: Churee Techawut Functional Dependencies and Normalization for Relational Databases Chapter 4 CS (204)321 Database System I.
1 The Relational Database Model. 2 Learning Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
9/7/2012ISC329 Isabelle Bichindaritz1 The Relational Database Model.
1 5 Normalization. 2 5 Database Design Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
CSE314 Database Systems Basics of Functional Dependencies and Normalization for Relational Databases Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E.
11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003.
Lecture No 14 Functional Dependencies & Normalization ( III ) Mar 04 th 2011 Database Systems.
Lecture 5 Normalization. Objectives The purpose of normalization. How normalization can be used when designing a relational database. The potential problems.
1 Functional Dependencies and Normalization Chapter 15.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Lecture 8: Database Concepts May 4, Outline From last lecture: creating views Normalization.
9/23/2012ISC329 Isabelle Bichindaritz1 Normalization.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
CS34311 The Relational Model. cs34312 Why Relational Model? Currently the most widely used Vendors: Oracle, Microsoft, IBM Older models still used IBM’s.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
Logical Database Design and the Relational Model.
Ch 7: Normalization-Part 1
11/10/2009GAK1 Normalization. 11/10/2009GAK2 Learning Objectives Definition of normalization and its purpose in database design Types of normal forms.
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
Relational Database Design Algorithms and Further Dependencies.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
1 CS490 Database Management Systems. 2 CS490 Database Normalization.
Simplified phases of Database Design
Functional Dependency and Normalization
Chapter 5: Logical Database Design and the Relational Model
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Quiz Questions Q.1 An entity set that does not have sufficient attributes to form a primary key is a (A) strong entity set. (B) weak entity set. (C) simple.
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Normalisation Exercise
Normalization Dale-Marie Wilson, Ph.D..
02 - The Relational Database Model
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Practical Relevance Examples Class Laboratory
Relational Database Design
Presentation transcript:

CS530 Database Architecture Models and Design Prof. Ian HORROCKS Dr. Robert Stevens Wednesday - Practical Tables

CS530 - Ian Horrocks and Robert Stevens27/09/ In this Section… ­ Topics Covered –Functional Dependencies –Normalisation –SQL Data Defn and Manipulation –SQL Query ­Examples Classes

CS530 - Ian Horrocks and Robert Stevens27/09/ Informal guidelines  Semantics of the attributes –easy to explain relation –doesn’t mix concepts  Reducing the redundant values in tuples  Choosing attribute domains that are atomic  Reducing the null values in tuples  Disallowing spurious tuples

CS530 - Ian Horrocks and Robert Stevens27/09/ Functional Dependencies

CS530 - Ian Horrocks and Robert Stevens27/09/ Functional Dependency  an attribute A is functionally dependent on a set of attributes X if and only if –value of A is determined solely by the values of X –values of X uniquely determine a value of A child → mother The value of child implies the value of mother Value of mother does NOT imply value of child Child is the determinant Mother is the dependent/determined mother → child X → A

CS530 - Ian Horrocks and Robert Stevens27/09/ Our case study example studno name given family hons slot labmark exammark STUDENT SCHOOL YEAR ENROL YEARREG REG TUTOR YEARTUTOR STAFF COURSE courseno subject equip name year faculty appraiser appraisee APPRAISAL TEACH m n 1 m m n m m 1 m 1 roomno STUDENT(studno,givenname,familyname, hons,tutor,slot,year) studno → studno, givenname, familyname, hons tutor, slot, year ENROL(studno,courseno,labmark,exammark) studno, courseno → labmark, exammark COURSE(courseno,subject,equip) courseno → courseno, subject, equip STAFF(lecturer,roomno,appraiser) lecturer → lecturer, roomno, appraiser roomno → lecturer, appraiser, roomno YEAR(year,yeartutor) year → year, yeartutor yeartutor → year, yeartutor SCHOOL(hons,faculty) hons → hons, faculty TEACH(courseno,lecturer) courseno, lecturer → courseno, lecturer

CS530 - Ian Horrocks and Robert Stevens27/09/ More Examples of Functional Dependency part_ number part_ description quantity_in _stock studno courseno labmark name tutor roomno subject

CS530 - Ian Horrocks and Robert Stevens27/09/ Use functional dependencies to … check that a relation is legal or good. e.g keys  K is a superkey of relation R if K → R i.e. whenever t1[k] = t2[k] thent1[R]= t2[R] K functionally determines all attributes in a tuple in R STUDENT (studno,name,hons,tutor,slot,year) studno → studno, name, hons, tutor, slot, year

CS530 - Ian Horrocks and Robert Stevens27/09/ Use functional dependencies to … check that a relation is legal or good. e.g. remove redundancy  Partial Dependency studno, courseno → subject (studno, courseno, subject)  Transitive Dependency studno → yeartutor studno → year year → yeartutor so, studno → yeartutor (studno, yeartutor)  Base functional dependencies F  Set of logically implied functional dependencies CLOSURE F+

CS530 - Ian Horrocks and Robert Stevens27/09/ Normalisation (in Brief)

CS530 - Ian Horrocks and Robert Stevens27/09/ Normalisation Overview ­ Stops information repeating over tables ­ Uses Functional Dependency ­ Uses a number of ‘forms’ 1 through 7 –(1NF, 2NF, 3NF, BCNF, 5NF, DK/NK, 7NF) ­ We shall go to 3 rd Look at Background for more. ­ After you’ve built 10 DBs you’ll just ‘know’ – it’ll become more craft than engineering.

CS530 - Ian Horrocks and Robert Stevens27/09/ UN-Normalised Data ­ To make it 1NF –Remove Repeating Groups

CS530 - Ian Horrocks and Robert Stevens27/09/ st NF (the Key) ­ To make it 2NF –Remove Part-Key Dependencies –Every non-primary-key attribute is fully functionally dependant on the primary key.

CS530 - Ian Horrocks and Robert Stevens27/09/ nd NF (the Whole Key) ­ To make it 3NF –No Transitive dependencies ­ e.g. A ->B / B ->C therefore A ->C ­ AH06 -> Sony Music / Sony Music -> UK Track Table CD Table

CS530 - Ian Horrocks and Robert Stevens27/09/ rd NF (and nothing but the key) Track Table CD Table Company Table

CS530 - Ian Horrocks and Robert Stevens27/09/ Boyce-Codd Normal Form A relation scheme R is in BCNF if, for all functional dependencies that hold on R of the form X → Y where R ⊇ X and R ⊇ Y at least one of the following holds  X → Y is trivial  X is a candidate key for the scheme R i.e. X → R Every attribute must depend on the key, the whole key and nothing but the key  Other Normal Forms: 1NF, 2NF and 3NF... uses primary key only  BCNF... generalised for candidate keys

CS530 - Ian Horrocks and Robert Stevens27/09/ Round Up If column (N) is FD on another column (M) then every value of M must define uniquely the value of N. M->N Student(id, name, staffID, time) Student(jbr, Joe Brown, har, 12-13) Student(spl, Sam Plant, gou, 14-15) Student(spl, Sam Plant, har, 12-13) id->name id->staffID id->time 12-13) name is Functionally Dependant on id staffID is NOT Functionally Dependant on id time is NOT Functionally Dependant on id Meets(spl, har, 12-13) – meeting once Meets(spl, har, 12-13) – meeting many

CS530 - Ian Horrocks and Robert Stevens27/09/ Background - Next SQL - Slide 76

CS530 - Ian Horrocks and Robert Stevens27/09/ Further Notes on Normalisation

CS530 - Ian Horrocks and Robert Stevens27/09/ Normalisation Given a relation R with a set of functional dependencies F, and a key K We must identify independent attributes 1. the key identifies all the attributes but… if an attribute only depends on part of the key, then it is independent of the rest of it. Attribute is partially dependent on the key if an attribute only depends on the key transitively, then it really depends directly on another attribute and is independent of the key. Attribute is transitively dependent on the key

CS530 - Ian Horrocks and Robert Stevens27/09/ Use functional dependencies to … check constraints on the set of legal relations F studno → name, tutor tutor → roomno roomno → tutor courseno → subject studno, courseno → labmark F+ studno, courseno → name partial studno → roomno transitive

CS530 - Ian Horrocks and Robert Stevens27/09/ Consequences of redundancy ­ Wasted space ­ Potential performance cost ­ Potential inconsistency ­ Inability to represent data

CS530 - Ian Horrocks and Robert Stevens27/09/ Use functional dependencies to … check the EER model mapping correctness ReaderBook readerid name finedate bookid title mn Return History readerid → readerid readerid → name bookid → bookid bookid → title Many:many relationships that could be weak entity types because they have hidden partial keys. ReturnHistory(readerid, bookid, date, fine) readerid, bookid → date ? readerid, bookid → fine ?

CS530 - Ian Horrocks and Robert Stevens27/09/ Using Functional Dependencies to... check EER mappings  STUDENT(studno, name, labmark) studno → name studno → labmark ?  COURSE(courseno, subject, roomno) courseno → subject courseno → roomno ?  STAFF(staffname, salary) staffname → salary where is staffname → roomno ? COURSE ENROL m n name STUDENT studno subject courseno labmark n TEACH STAFF m staffname roomno salary Attributes on wrong entities

CS530 - Ian Horrocks and Robert Stevens27/09/  STUDENT(studno, name) studno → name  COURSE(courseno, subject, studno) courseno → subject courseno → studno ? Wrong cardinalities on a relationship type COURSE ENROL n name STUDENT studno subject courseno 1 Using Functional Dependencies to... check EER mappings

CS530 - Ian Horrocks and Robert Stevens27/09/  COURSE (courseno, subject, lecturer,roomno) courseno → subject courseno → lecturer ? courseno → roomno lecturer → roomno Using Functional Dependencies to... check EER mappings Missing 1:many relationship type and entity type or missing multi-valued attribute COURSE subject roomno courseno lecturer

CS530 - Ian Horrocks and Robert Stevens27/09/ Functional Dependencies are hidden in EER Model studno name slot labmark STUDENT ENROL TUTOR STAFF COURSE courseno subjectname 1 m n m roomno

CS530 - Ian Horrocks and Robert Stevens27/09/ Using the EER Model and Functional Dependencies 1. Draw EER model 2. Map EER schema to relational schema 3. For every relation –List the functional dependencies – what does determine every attribute? –Check that every relation is in BCNF  does the key really solely uniquely identify each attribute?  if its not in BCNF then why?  Fix the problem –normalise and/or –trace back to EER model 4. Are there any functional dependencies missing? 5. Optimise the relational schema

CS530 - Ian Horrocks and Robert Stevens27/09/ Database design  Extended Entity Relationship –Top Down –Conceptual/Abstract View  Functional Dependencies –Bottom Up –Implementation View –The Determinancy Approach –Synthesise relations 1. List all attributes 2. Consider the relationships between them  those which determine the values of others are entities  those whose values are determined by other items are attributes.

CS530 - Ian Horrocks and Robert Stevens27/09/ Use functional dependencies to…Synthesise relations STUDENT (studno,givenname,familyname,hons,tutor,slot,year) studno, coursenolabmark studno, coursenoexammark ENROL(studno,courseno,labmark,exammark) courseno subject coursenoequip COURSE(courseno,subject,equip) studno familyname studnogivenname studnohons studnotutor studnoslot studnoyear lecturer roomno lecturerappraiser roomnolecturer roomno appraiser STAFF(lecturer,roomno,appraiser) year yeartutor year yeartutor YEAR(year,yeartutor) honsfaculty SCHOOL(hons,faculty) hons

CS530 - Ian Horrocks and Robert Stevens27/09/ er.… TEACH(courseno,lecturer) courseno, lecturer TEACH(courseno,lecturer, num_of_lectures) courseno, lecturernum_of_lectures

CS530 - Ian Horrocks and Robert Stevens27/09/ Complementary Approaches  Disadvantages of EER Top Down 1.Not all entity types are represented by nouns or noun-phrases - association entity types 2.Not all nouns and noun-phrases correspond to entities - single attribute entities  Disadvantages of determinancy bottom- up 1.Long-winded 2.Hides overall picture of data model

CS530 - Ian Horrocks and Robert Stevens27/09/ The Steps of Normalisation ­ Take one dependency at a time ­ Treat each relation separately and independently ­ Iterative process

CS530 - Ian Horrocks and Robert Stevens27/09/ Use functional dependencies to…  Systematically create legal relations  Derive relations which avoid anomalies in –Insertion –Deletion –Modification –Accessing  Ensure single valued-ness of facts represented in attributes in keyed relations  Ensure the removal of redundancy in a relation NORMALISE relations

CS530 - Ian Horrocks and Robert Stevens27/09/ Normalisation ­ Given –a universal relation that is unnormalised –a set of functional dependencies on the attributes in the relation –produce a set of relations where each relation is normalised for the functional dependencies on the attributes in the relation –Three approaches: –1. Relational synthesis –2. Step-wise normalisation –3. Using BCNF decomposition

CS530 - Ian Horrocks and Robert Stevens27/09/ The Process of Normalisation  Usually four steps giving rise to –First Normal Form (1NF) –Second Normal Form (2NF) –Third Normal Form (3NF) –Boyce-Codd Normal Form (BCNF) –Fourth Normal Form (4NF)  At each step we consider relationships between the functional dependencies of a relation’s attributes  Normalisation is a: –framework –series of tests UNNORMALISED ENTITY step1 remove repeating groups 1st NORMAL FORM step2remove partial dependencies 2nd NORMAL FORM step3 remove transitive dependencies 3rd NORMAL FORM / Boyce-Codd Normal Form step4 remove multi-dependencies 4th NORMAL FORM

CS530 - Ian Horrocks and Robert Stevens27/09/ First Normal Form  Attributes form Repeating Groups  When a group of attributes has multiple values then we say there is a repeating group of attributes in the relation  An relation is in 1NF if there are no repeating groups of attribute types  Any un-normalised relation is transformed to 1NF –Remove all repeating attribute groups –Repeating attribute groups become new relations in their own right –The key of the original relation must be an attribute (but not necessarily a key) of the derived relation.

CS530 - Ian Horrocks and Robert Stevens27/09/ First Normal Form : Repeating Groups STUDENT (studno, name, tutor, roomno) studno → name, tutor tutor → roomno, roomno → tutor STUDENT_DETAILS (studno, name, tutor, roomno, {courseno, labmark, subject}) studno → name, tutor courseno → subject tutor → roomno, roomno → tutor studno, courseno → labmark ENROL (studno, courseno, subject, labmark) courseno → subject studno, courseno → labmark

CS530 - Ian Horrocks and Robert Stevens27/09/ Benefits from First Normal Form  Any ‘hidden’ relations (entities) are identified  Process results in separation of different objects  BUT anomalies may still exist ENROL (studno, courseno, subject, labmark) –subject appears on every enrolment occurrence. –This may result in anomalies when updating or deleting tuples –The problem in example is that subject is functionally dependent only on courseno which is only part of the key

CS530 - Ian Horrocks and Robert Stevens27/09/ Second Normal Form  A relation is in 2NF if it is in 1NF and each non identifying attribute depends upon the whole key (identifier)  Any relation in 1NF is transformed to 2NF –Identify functional dependencies –Re-write relations so that each non-identifying attribute is functionally dependent on the whole of the key –Decompose ENROL into two relations ENROL (studno, courseno, subject, labmark) courseno → subject studno, courseno → labmark ENROL’ (studno, courseno, labmark) studno, courseno → labmark COURSE (courseno, subject) courseno → subject

CS530 - Ian Horrocks and Robert Stevens27/09/ Second Normal Form STUDENT(studno, name, tutor, roomno) studno → name, tutor tutor → roomno roomno → tutor ENROL’ (studno, courseno, labmark) studno, courseno → labmark COURSE (courseno, subject) courseno → subject

CS530 - Ian Horrocks and Robert Stevens27/09/ Third Normal Form  An relation is in 3NF if it is in 2NF and all non- identifying attributes are independent  Any relation in 2NF is transformed in 3NF  Determine functional dependencies between non identifying attributes  Decompose relation into new relations STUDENT (studno, name, tutor, roomno) studno → name, tutor tutor → roomno roomno → tutor STUDENT (studno, name, tutor) studno → name, tutor TUTOR (tutor, roomno) tutor → roomno roomno → tutor

CS530 - Ian Horrocks and Robert Stevens27/09/ Student Relational Schema in 3NF  STUDENT (studno, name, tutor) studno → name, tutor  TUTOR (tutor, roomno) tutor → roomno roomno → tutor  ENROL (studno, courseno, labmark) studno, courseno → labmark  COURSE (courseno, subject) courseno → subject

CS530 - Ian Horrocks and Robert Stevens27/09/ Decomposition: Lossless or Non- additive Join  R is a relational scheme, F is a set of functional dependencies on R. R1 and R2 form a decomposition of R.  The decomposition of R is non-additive if at least one of the following functional dependencies are in F+ R1 ∩ R2 → R1 R1 ∩ R2 → R2  The decomposition of R is non-additive if for every state r of R that satisfies F (π (r),..., π (r) ) = r where condition is the natural join

CS530 - Ian Horrocks and Robert Stevens27/09/ Decomposition: Lossless or Non-additive Join  ENROL’ ∩ COURSE = courseno  courseno → subject  (courseno, subject) = COURSE ENROL (studno, courseno, subject, labmark) courseno → subject studno, courseno → labmark ENROL’ (studno, courseno, labmark) studno, courseno → labmark COURSE (courseno, subject) courseno → subject

CS530 - Ian Horrocks and Robert Stevens27/09/ Lossless or Non-additive Join  STUDENT1 (tutor = tutor)TUTORS = STUDENT studno → name studno → tutor tutor → roomno roomno → tutor studno → name studno → tutor tutor → roomno roomno → tutor

CS530 - Ian Horrocks and Robert Stevens27/09/ Spurious Tuples Lossless or Non-additive Join TEACH TEACH’ LECTURES

CS530 - Ian Horrocks and Robert Stevens27/09/ Decomposition Algorithm: Decomposition D, relation R  set D := { R } ;  while there is a relation schema Q in D that is not in BCNF do  begin –choose a relation schema Q in D that is not in BCNF; –find a functional dependency X→Y in Q that violates BCNF;  violation means that (X) + fails to find all of Q, so X can’t be a key. –replace Q in D by two schemas  R1 (Q - (Y) + ∪ X) –leave copy of X in relation to be the foreign key for R2 and  R2 (X ∪ (Y) + ) –new relation for functional dependency and its closure, X will be the primary key  end;

CS530 - Ian Horrocks and Robert Stevens27/09/ Lossless or Non-additive Join X YZ X Y X Z XY foreign key

CS530 - Ian Horrocks and Robert Stevens27/09/ Decomposition: Dependency Preservation  When an update is made to a database, should be able to check that update satisfies all functional dependencies.  It is desirable to allow validation of relational database schemes that allow update validation without the computation of joins.  independent manipulation of relations.

CS530 - Ian Horrocks and Robert Stevens27/09/ Dependency Preservation  The union of dependencies that hold on the individual relations in decomposition D must be equivalent to F.  Given F on R, π F (R i ) where R i ⊆ R is the set of dependencies X Y in F + such that the attributes in X ∪ Y are all contained in R i  Decomposition D = {R 1, R 2,..., R m } of R is dependency preserving w.r.t. F if (π F (R 1 )) ∪.... ∪ π F (R m ))) + = F +  Given the restriction of functional dependencies to a relation is the fds that involve attributes of that relation F i for R i n n U Fi ≠ F possible, but... (U F i ) + = F + i=1 i =1

CS530 - Ian Horrocks and Robert Stevens27/09/ Dependency Preservation  STUDENT (studno, name, tutor, roomno, appraiser) studno → name, tutor tutor → roomno, appraiser roomno → tutor, appraiser  STUDENT1 (studno, name, tutor) studno → name, tutor  TUTOR (studno, roomno, appraiser) studno → roomno, appraiser This is in Boyce-Codd Normal Form and is a lossless (nonadditive) join decomposition but we have lost....  tutor → roomno, appraiser roomno → tutor, appraiser

CS530 - Ian Horrocks and Robert Stevens27/09/ STUDENT’ TUTOR = STUDENT studno → name studno → tutor tutor → roomno tutor → appraiser roomno → tutor roomno → appraiser studno → appraiser studno → roomno studno → name studno → tutor studno → appraiser studno → roomno Dependency Preservation

CS530 - Ian Horrocks and Robert Stevens27/09/ Designing a relational schema  Build a relational database –without redundancy  normalisation –without loss of information or gain of data  lossless join decomposition –without losing dependency integrity  dependency preservation

CS530 - Ian Horrocks and Robert Stevens27/09/ Multi-valued Dependencies and Fourth Normal Form

CS530 - Ian Horrocks and Robert Stevens27/09/ Multi-valued Dependencies  a course has many lecturers  a course has many texts  lecturers and texts are independent  a lecturer teaches many courses  a text is used by many courses ­ lecturer and text are independent sets ­ for each courseno there is an associated set of lecturers ­ for each courseno there is an associated set of texts ­ the sets are independent.

CS530 - Ian Horrocks and Robert Stevens27/09/ Multi-valued Dependencies courseno →→ lecturer courseno →→ text This is in BCNF key is {courseno,lecturer,t ext} courseno, lecturer,text →courseno, lecturer,text  trivial dependencies

CS530 - Ian Horrocks and Robert Stevens27/09/ Multi-valued Dependencies Each TEXT is associated with all the LECTURERS that teach a COURSE The attribute TEXT contains redundant values. If TEXT were deleted from rows 1, 2 & 3 the values could be deduced from rows 4,5 & 6

CS530 - Ian Horrocks and Robert Stevens27/09/ Multivalued Dependencies courseno →→ lecturer courseno →→ text  if (c,l,t) and (c,l’,t’) appear then  (c,l,t’) and (c,l’,t) appear also  tuple (c,l,t) appears if c can be taught by l using text t  for each course all possible combinations of lecturer and text appear

CS530 - Ian Horrocks and Robert Stevens27/09/ Multi-Valued Dependencies  Whenever X →→ Y holds in R so does X →→(R - (XY)).  a MVD is trivial if Y ⊂ X or X ∪ Y = R. i.e. the two attributes form the whole relation  non-trivial MV dependencies need at least 3 attributes.

CS530 - Ian Horrocks and Robert Stevens27/09/ Fourth Normal Form  A relation R is in 4NF if it is in 3NF and there are no multi-valued dependencies between its attribute types  A relation R is in 4NF iff whenever there exists a non- trivial multi-valued dependency in F + for R X →→ Y  X is a superkey for R, i.e. all attributes are functionally dependent on X.  Any relation in 3NF is transformed in 4NF –Detect any multi-valued dependencies –Decompose relation

CS530 - Ian Horrocks and Robert Stevens27/09/ Fourth Normal Form courseno →→ lecturer courseno →→ text trivial dependenciesonly

CS530 - Ian Horrocks and Robert Stevens27/09/ Lossless join decomposition into 4NF  Algorithm: Decomposition D, relation R 1.set D := { R } ; 2. while there is a relation schema Q in D that is not in 4NF do begin choose a relation schema Q in D that is not in 4NF; find a non-trivial MVD X →→ Y in Q that violates 4NF; replace Q in D by two schemas (Q -Y) and (X ∪ Y) end;

CS530 - Ian Horrocks and Robert Stevens27/09/ Fourth Normal Form EER modelling  Leads to correctly normalised relational schema COURSE STAFF TEXT teaches recommendation m n m n name texttitle courseno

CS530 - Ian Horrocks and Robert Stevens27/09/ Fourth Normal Form EER modelling  Leads to relational schema that is not in 4NF COURSE STAFF TEXT Course-Staff-Text m p n name courseno texttitle

CS530 - Ian Horrocks and Robert Stevens27/09/ Conclusions  Data Normalisation is a technique that ensures the basic properties of the relational model –no duplicate tuples –no nested relations  Data normalisation is sometimes used as the only technique for database design— implementation view  A more appropriate approach is to complement conceptual modelling with data normalisation

CS530 - Ian Horrocks and Robert Stevens27/09/ Lossless or Non-additive Join Algorithm Decomposition D, relation R 1.set D := {R} ; 2.while there is a relation schema Q in D that is not in BCNF do begin choose a relation schema Q in D that is not in BCNF; find a functional dependency X→Y in Q that violates BCNF; replace Q in D by two schemas R1 (Q - Y) leave copy of X in relation to be foreign key for R2 and R2 (X ∪ Y) new relation for functional dependency and its closure, X will be the primary key end;

CS530 - Ian Horrocks and Robert Stevens27/09/ Example

CS530 - Ian Horrocks and Robert Stevens27/09/ Normalisation Example BEER_DATABASE Additional Notes: Warehouses are shared by breweries. Each beer is unique to the brewer. Each brewery is based in a city.

CS530 - Ian Horrocks and Robert Stevens27/09/ Minimal Sets of Functional Dependencies  A set of functional dependencies F is minimal if: 1. Every dependency F has a single determined attribute A 2. We cannot remove any dependency from F and still have a set of dependencies equivalent to F 3. We cannot replace and dependency X → A in F with a dependency A→ X, where A ⊂ X and still have a set of dependencies that is equivalent to F I.e. a canonical form with no redundancies (beer, brewery, strength, city, region, warehouse, quantity)  beer→ brewery  beer→ strength  brewery → city  city → region  beer, warehouse, → quantity

CS530 - Ian Horrocks and Robert Stevens27/09/ Relational Synthesis Algorithm into 3NF: (beer, brewery, strength, city, region, {warehouse, quantity}) set D := { R } ; P. 426, P Find a minimal cover G for F 2. For each determinant X of a functional dependency that appears in G create a relation schema { X ∪ A1, X ∪ A2…X ∪ Am} in D where X → A1, X → A1, … X → A1m are the only dependencies in G with X as the determinant; 3. Place any remaining (unplaced) attributes in a single relation to ensure attribute preservation property so we don’t lose anything. 4. If none of the relations contains a key of R, create one more relation that contains attributes that form a key for R.  beer→ brewery(beer, brewery, strength)  beer→ strength  brewery → city (brewery, city)  city → region (city, region)  beer, warehouse, → quantity(beer, warehouse, quantity)

CS530 - Ian Horrocks and Robert Stevens27/09/ Step-wise normalisation: (beer, brewery, strength, city, region, {warehouse, quantity})  beer→ brewery, strength partial dependency  brewery → city transitive dependency  city → region transitive dependency  beer, warehouse, → quantity repeating group 1NF remove repeating group (beer, brewery, strength, city, region, {warehouse, quantity}) (beer, warehouse, quantity) beer, warehouse, → quantity (beer, brewery, strength, city, region) beer→ brewery, strength transitive dependency brewery → city transitive dependency city → region

CS530 - Ian Horrocks and Robert Stevens27/09/ (beer, brewery, strength, city, region)  beer→ brewery, strength  brewery → city transitive dependency  city → regiontransitive dependency  2NFno partial dependencies  3NF/BCNFno transitive dependencies (beer, brewery, strength, city, region) (city, region) city → region (beer, brewery, strength, city) beer→ brewery, strength brewery → city (brewery, city) brewery → city (beer, brewery, strength) beer→ brewery, strength Take the most indirect transitive dependencies

CS530 - Ian Horrocks and Robert Stevens27/09/ Using BNCF decomposition algorithm: (beer, brewery, strength, city, region, warehouse, quantity)  beer→ brewery, strength partial dependency  brewery → city transitive dependency  city → region transitive dependency  beer, warehouse, → quantity Directly to BCNF take a violating dependency and form a relation from it. First choose a direct transitive dependency and its closure (beer, brewery, strength, city, region, warehouse, quantity) brewery → city (brewery, city, region) brewery → city city → region transitive dependency (beer, brewery, strength, warehouse, quantity) beer→ brewery, strengthpartial dependency beer, warehouse, → quantity

CS530 - Ian Horrocks and Robert Stevens27/09/ Using BNCF decomposition algorithm: (beer, brewery, strength, city, region, warehouse, quantity)  beer→ brewery, strength partial dependency  brewery → city transitive dependency  city → region transitive dependency  beer, warehouse, → quantity take a violating dependency and form a relation from it. First the partial dependency and its closure (beer, brewery, strength, city, region, warehouse, quantity) beer→ brewery, strength (beer, brewery, strength, city, region) beer→ brewery, strength brewery → city transitive dependency city → region transitive dependency normalise as before... (beer, warehouse, quantity) beer, warehouse, → quantity

CS530 - Ian Horrocks and Robert Stevens27/09/ Keys and Indexes, Data Definition, Relational Manipulation and Data Control Using SQL

CS530 - Ian Horrocks and Robert Stevens27/09/ Keys  SuperKey –a set of attributes whose values together uniquely identify a tuple in a relation  Candidate Key –a superkey for which no proper subset is a superkey…a key that is minimal. –Can be more than one for a relation  Primary Key –a candidate key chosen to be the main key for the relation. –One for each relation  Keys can be composite

CS530 - Ian Horrocks and Robert Stevens27/09/ e.g.: Staff(lecturer,roomno,appraiser) SK = {lecturer,roomno,appraiser}, {lecturer,roomno}, {lecturer, appraiser}, {roomno,appraiser}, {lecturer} and {roomno} CK = {lecturer} and {roomno} PK = {lecturer}

CS530 - Ian Horrocks and Robert Stevens27/09/ Foreign Key  a (set of) attribute(s) in a relation that exactly matches a (primary) key in another relation –the names of the attributes don’t have to be the same but must be of the same domain –a foreign key in a relation A matching a primary key in a relation B represents a  many:one relationship between A and B Student(studno,name,tutor,year) Staff(lecturer,roomno,appraiser)

CS530 - Ian Horrocks and Robert Stevens27/09/ Data Definition and Manipulation

CS530 - Ian Horrocks and Robert Stevens27/09/ Languages of DBMS  Data Definition Language DDL –define the logical schema (relations, views etc) and storage schema stored in a Data Dictionary  Data Manipulation LanguageDML –Manipulative populate schema, update database –Retrieval querying content of a database  Data Control LanguageDCL –permissions, access control etc...

CS530 - Ian Horrocks and Robert Stevens27/09/ Data Definition:Creating tables create table accountants as(select studno, name, tutor, year from student where hons = ‘ca’ );  Can specify column names, default values and integrity constraints (except referential)  Datatypes and lengths derived from query  Not null constraints passed on from query tables

CS530 - Ian Horrocks and Robert Stevens27/09/ Defining a Relation create table student (studentno number(8) primary key, givenname char(20), surname char(20), hons char(3) check ( hons in ( 'cis','cs','ca','pc','cm','mcs' )), tutorid number(4), yearno number(1) not null, constraint year_fk foreign key ( yearno ) references year ( yearno ), constraint super_fk foreign key ( tutorid ) references staff ( staffid ));

CS530 - Ian Horrocks and Robert Stevens27/09/ Data Definition: Create Table create table enrol ( studno number( 8 ), courseno char( 5 ), primary key ( studno, courseno ), cluster ( studno ), labmark number( 3 ) check ( labmark between 0 and 100 ), exammark number( 3 ) check ( exammark between 0 and 100 ), constraint stud_fk foreign key ( studno ) references student, constraint course_fk foreign key ( courseno ) references course );

CS530 - Ian Horrocks and Robert Stevens27/09/ Data Definition: Altering Relations ­ alter table student add ( address char( 20 ), default null); alter table student modify ( name not null);  this won’t work if there are any nulls in the name column

CS530 - Ian Horrocks and Robert Stevens27/09/ Data Manipulation: Insert Operator insert (cs310, elec, sun) into course; Course insert into course (courseno,subject,equip) values (‘cs310’, ‘elec’, ‘sun’) ; insert into course values (‘cs310’, ‘elec’, NULL) ; insert into table where search-condition

CS530 - Ian Horrocks and Robert Stevens27/09/ Inserting Tuples into a Relation insert into weak_students (studno,name,courseno,exammark) where (select s.studno,name,courseno,exammark from enrol, student s where exammark <= 40 and enrol.studno = s.studno );

CS530 - Ian Horrocks and Robert Stevens27/09/ Insertion Anomalies  An insert operation might voliate the uniqueness and minimality properties of the primary key of the referential integrity constraint  insert (cs250,databases,sun) into course Insertion anomalies can be corrected by rejecting the insertion correcting the reason for rejecting the update

CS530 - Ian Horrocks and Robert Stevens27/09/ Data Manipulation: Update Operator  Modifies a tuple or tuples of a relation  Don’t violate constraints as long as the modified attributes are not primary keys or foreign keys  Update of a primary key corresponds to a deletion followed by an insertion  Update of a foreign key attribute is legal only if the new value corresponds to an existing tuple in the referenced relation or is null update enrol set labmark = labmark * 1.1 where courseno = ‘cs250’; update table set column = expression [ where search-condition]

CS530 - Ian Horrocks and Robert Stevens27/09/ Data Manipulation: Delete Operator  Deletes a tuple or a set of tuples from a relation  Might violate the referential integrity constraint  Anomalies can be overcome by –rejecting the deletion –cascading the deletion (delete tuples that reference deleted tuple) –modifying the referencing attribute values delete from table [ where search-condition] delete from course where equip = ‘pc’ ; delete from student where year = ‘3’ and( hons != ‘mi’ or hons <> ‘ si’ );

CS530 - Ian Horrocks and Robert Stevens27/09/ Delete Operator delete from student where studno in (select student.studno from enrol e, teach t, student s where t.lecturer = ‘woods’ and t.courseno = e.courseno and e.studno = s.studno );

CS530 - Ian Horrocks and Robert Stevens27/09/ Data Control: Data Sharing and Security  Permissions, access control etc... ­ create view myyear as select * from student where year in (select year from student where name = user) with check option

CS530 - Ian Horrocks and Robert Stevens27/09/ Data Control: Data Sharing and Security grant privilege, privilege2… | all on table | view to userID | roleID grant select on student to bloggsf;  Grant can be attached to any combination of select, insert, update, delete, alter  Restricting access to parts pf a table can be effected by using the view and grant commands  Privileges can be withdrawn with the revoke command

CS530 - Ian Horrocks and Robert Stevens27/09/ Synonyms for Objects ­ select name from CAROLE.student; ­ create [public] synonym synonym_name for table | view; ­ create synonym student for CAROLE.student; ­ drop synonym mystudent;

CS530 - Ian Horrocks and Robert Stevens27/09/ The Role of the Data Dictionary  A set of tables and views to be used by the RDBMS as a reference guide to the data stored in the database files  Every user retrieves data from views stored in the Data Dictionary  The Data Dictionary stores: –user names of those permitted to access the database –names of tables, space definitions, views, indexes, clusters, synonyms etc –rights and privileges that have been granted

CS530 - Ian Horrocks and Robert Stevens27/09/ Examples Class

CS530 - Ian Horrocks and Robert Stevens27/09/ Relational Query Languages

CS530 - Ian Horrocks and Robert Stevens27/09/ Query Operators  Relational Algebra –tuple (unary)Selection, Projection –set (binary)Union, Intersection, Difference –tuple (binary)Join, Division  Additional Operators –Outer Join, Outer Union

CS530 - Ian Horrocks and Robert Stevens27/09/ A Retrieval DML Must Express  Attributes required in a result –target list  Criteria for selecting tuples for that result –qualifier  The relations that take part in the query –set generators  Independent of the instances in the database  Expressions are in terms of the database schema

CS530 - Ian Horrocks and Robert Stevens27/09/ Relational Algebra

CS530 - Ian Horrocks and Robert Stevens27/09/ SQL Retrieval Statement SELECT[all|distinct] {*|{table.*|expr[alias]|view.*} [,{table.*|expr[alias]}]...} FROM table [alias][,table[alias]]... [WHERE condition] [CONNECT BY condition [START WITH condition]] [GROUP BY expr [,expr]...] [HAVING condition] [{UNION|UNION ALL|INTERSECT|MINUS} SELECT...] [ORDER BY {expr|position} [ASC|DESC][,expr|position}[ASC|DESC]. [FOR UPDATE OF column [,column]... [NOWAIT]]

CS530 - Ian Horrocks and Robert Stevens27/09/ π Project Operator selects a subset of the attributes of a relation Result = π (attribute list) (relation name) attribute list are drawn from the specified relation; if the key attribute is in the list then card(result) = card(relation) resulting relation has only the attributes in the list, in same order as they appear in the list the degree(result) = number of attributes in the attribute list no duplicates in the result

CS530 - Ian Horrocks and Robert Stevens27/09/ π Project Operator π tutor (STUDENT)

CS530 - Ian Horrocks and Robert Stevens27/09/ π Project Operator SELECT select * from student ; select tutor from student ;

CS530 - Ian Horrocks and Robert Stevens27/09/ σ Select Operator selects a subset of the tuples in a relation that satisfy a selection condition Result = σ (selection condition) (relation name) a boolean expression specified on the attributes of a specified relation a relation that has the same attributes as the source relation; stands for the usual comparison operators ‘ ‘, ‘ ‘, ‘>=‘, etc clauses can be arbitrarily connected with boolean operators AND, NOT, OR degree(result) = degree(relation); card(result) <= card(relation)

CS530 - Ian Horrocks and Robert Stevens27/09/ σ Select Operator σ name=‘bloggs’ (STUDENT)

CS530 - Ian Horrocks and Robert Stevens27/09/ retrieve tutor who tutors Bloggs π tutor ( σ name=‘bloggs’ (STUDENT)) select tutor from student where name = ‘bloggs’ ;

CS530 - Ian Horrocks and Robert Stevens27/09/ SQL retrieval expressions ­ select studentno, name from student where hons != ‘ca’ and ( tutor = ‘goble’ or tutor = ‘kahn’ ); ­ select * from enrol where labmark > 50 ; ­ select * from enrol where labmark between 30 and 50 ;

CS530 - Ian Horrocks and Robert Stevens27/09/ ­ select * from enrol where labmark in ( 0, 100 ); ­ select * from enrol where labmark is null; ­ select * from student where name is like ‘b%’ ; ­ select studno, courseno, exammark + labmark total from enrol where labmark is not NULL;

CS530 - Ian Horrocks and Robert Stevens27/09/ Cartesian Product Operator Definition:  The cartesian product of two relations R1(A 1,A 2,...,A n ) with cardinality i and R2(B 1,B 2,...,B m ) with cardinality j is a relation R3 with degree k=n+m, cardinality i*j and attributes (A 1,A 2,...,A n,B 1,B 2,...,B m )  The result, denoted by R1XR2, is a relation that includes all the possible combinations of tuples from R1 and R2  Used in conjunction with other operations

CS530 - Ian Horrocks and Robert Stevens27/09/ Cartesian Product Example

CS530 - Ian Horrocks and Robert Stevens27/09/ X Cartesian Product

CS530 - Ian Horrocks and Robert Stevens27/09/ θ Join Operator Definition: The join of two relations R1(A 1,A 2,...,A n ) and R2(B 1,B 2,...,B m ) is a relation R3 with degree k=n+m and attributes (A 1,A 2,...,A n, B 1,B 2,...,B m ) that satisfy the join condition stands for the usual comparison operators ‘ ‘, ‘ ‘, ‘>=‘, etc comparing terms in the Θ clauses can be arbitrarily connected with boolean operators AND, NOT, OR The result is a concatenated set but only for those tuples where the condition is true. It does not require union compatibility of R1 and R2 Result = R1 (θ join condition) R2

CS530 - Ian Horrocks and Robert Stevens27/09/ θ Join Operator

CS530 - Ian Horrocks and Robert Stevens27/09/ More joins

CS530 - Ian Horrocks and Robert Stevens27/09/ Natural Join Operator  Of all the types of θ-join, the equi-join is the only one that yields a result in which the compared columns are redundant to each other—possibly different names but same values  The natural join is an equi-join but one of the redundant columns (simple or composite) is omitted from the result  Relational join is the principle algebraic counterpart of queries that involve the existential quantifier ∃

CS530 - Ian Horrocks and Robert Stevens27/09/ Self Join: Joins on the same relation π (lecturer, (staff (appraiser = lecturer) staff) roomno,appraiser, approom) select e.lecturer, e.roomno, m.lecturer appraiser, m.roomno approom from staff e, staff m where e.appraiser = m.lecturer

CS530 - Ian Horrocks and Robert Stevens27/09/ Exercise Get student’s name, all their courses, subject of course, labmark for course, lecturer of course and lecturer’s roomno for ‘ca’ students University Schema  STUDENT(studno,name,hons,tutor,year)  ENROL(studno,courseno,labmark,exammar k)  COURSE(courseno,subject,equip)  STAFF(lecturer,roomno,appraiser)  TEACH(courseno,lecturer)  YEAR(yearno,yeartutor)

CS530 - Ian Horrocks and Robert Stevens27/09/ Set Theoretic Operators  Union, Intersection and Difference  Operands need to be union compatible for the result to be a valid relation Definition : Two relations R1(A 1,A 2,...,A n ) and R2(B 1,B 2,...,B m ) are union compatible iff: n = m and, dom(A i )= dom (B i ) for 1 ≤ i ≤ n

CS530 - Ian Horrocks and Robert Stevens27/09/ ∪ Union Operator Definition : The union of two relations R1(A 1,A 2,...,A n ) and R2(B 1,B 2,...,B m ) is a relation R3(C 1,C 2,...,C n ) such that dom(C i )= dom(A i ) =dom (B i ) for 1 ≤ i ≤ n The result R1 ∪ R2 is a relation that includes all tuples that are either in R1 or R2 or in both without duplicate tuples The resulting relation might have the same attribute names as the first or the second relation

CS530 - Ian Horrocks and Robert Stevens27/09/ Retrieve all staff that lecture or tutor Lecturers π (lecturer) TEACH Tutors π (tutor) STUDENT Lecturers ∪ Tutors

CS530 - Ian Horrocks and Robert Stevens27/09/ ∩ Intersection Operator Definition: The intersection of two relations R1(A 1,A 2,...,A n ) and R2(B 1,B 2,...,B m ) is a relation R3(C 1,C 2,...,C n ) such that dom(C i )= dom(A i ) ∩ dom (B i ) for 1 ≤ i ≤ n  The result R1 ∩ R2 is a relation that includes only those tuples in R1 that also appear in R2  The resulting relation might have the same attribute names as the first or the second relation

CS530 - Ian Horrocks and Robert Stevens27/09/ Retrieve all staff that lecture and tutor Lecturers π (lecturer) TEACH Tutors π (tutor) STUDENT Lecturers ∩ Tutors

CS530 - Ian Horrocks and Robert Stevens27/09/ − Difference Operator Definition: The difference of two relations R1(A 1,A 2,...,A n ) and R2(B 1,B 2,...,B m ) is a relation R3(C 1,C 2,...,C n ) such that dom(C i )= dom(A i ) −dom (B i ) for 1 ≤ i ≤ n The result R1 − R2 is a relation that includes all tuples that are in R1 and not in R2  The resulting relation might have the same attribute names as the first or the second relation

CS530 - Ian Horrocks and Robert Stevens27/09/ Retrieve all staff that lecture but don’t tutor Lecturers π (lecturer) TEACH Tutors π (tutor) STUDENT Lecturers -Tutors

CS530 - Ian Horrocks and Robert Stevens27/09/ Relational Algebra Re Cap

CS530 - Ian Horrocks and Robert Stevens27/09/ Relational Algebra  Relational Algebra –tuple (unary)  Selection - Result = σ (selection condition)(relation name)  Projection - Result = π (attribute list)(relation name) - π tutor ( σ name=‘bloggs’ (STUDENT) ) –set (binary)  Union- Lecturers ∪ Tutors  Intersection- Lecturers ∩ Tutors  Difference- Lecturers -Tutors –tuple (binary)  Join- Result = R1 wv (θ join condition) R2  Division- (A / B) Not supported as a primitive operator

CS530 - Ian Horrocks and Robert Stevens27/09/ Relational Algebra –Additional Operators  Outer Join & Outer Union (+) –Pads with nulls –(includes all tuples not just matches - as with Natural / Equi-join) –Aggregation Functions  ƒ (relation name)  How many courses is a student enrolled for?  studno ƒ COUNT courseno (ENROL)

CS530 - Ian Horrocks and Robert Stevens27/09/ For Next Lecture I expect you to have SKIM Read the notes for the next lecture before it’s delivered. The sequence of: skim read; lecture delivery; SAQ will make revision a whole lot easier.