University of Milano Bicocca, Italy Carlo Batini Course in Data Base Design Conceptual Design: qualities and methods
Introduction to strategies and qualities Motivating examples
Requirements “not toy example” Students of a University attend courses, and pass them. Courses have a last name, a first name and a year of enrollment. Students have an ID, a Last Name and First Name. For Chinese Students we want to represent City of Birth and Region of Birth. For Foreign Students we want to represent the Country of Birth and the Continent of Birth. A Course passed by a Student has a grade and a date. Courses are taught by professors. Professors have a last name and a first name. They may be Associate Professors or Full professors, only for Full Professors we are interested to the City of Birth, with name and region. Every professor belongs to a Department. Departments have a Name and an address.
Example of complex step that cannot be performed“one shot” Students of a University attend courses, and pass them. Courses have a last name, a first name and a year of enrollment. Students have an ID, a Last Name and First Name. For Chinese Students we want to represent City of Birth and Region of Birth. For Foreign Students we want to represent the Country of Birth and the Continent of Birth. A Course passed by a Student has a grade and a date. Courses are taught by professors. Professors have a last name and a first name. They may be Associate Professors or Full professors, only for Full Professors we are interested to the City of Birth, with name and region. Every professor belongs to a Department. Departments have a Name and an address. Database Design 1. Student (Student Id, Last Name, First Name, Type) 2. Course (Course Id, Year of Enrollment) 3. Enrolled in (Student Id, Course Id) 4. Passed (Student Id, Course Id, Grade, Date) 5. Professor (Last Name, First Name, Type, Department Name) 6. Department (Name, Address) 7. Full Professor (Last Name, First Name, City of Birth) 8. City (Name, Region) 9. Foreign Student (Student Id, Country) 10. Chinese Student (Student Id, City of Birth) 11. Country (Name, Continent)
First strategy: two phases instead of one phase (divide et impera)
Phase 1 Phase 2 Phase 1 - Conceptual Design Phase 2 - Logical Design Students of a University attend courses, and pass them. Courses have a last name, a first name and a year of enrollment. Students have an ID, a Last Name and First Name. For Chinese Students we want to represent City of Birth and Region of Birth. For Foreign Students we want to represent the Country of Birth and the Continent of Birth. A Course passed by a Student has a grade and a date. Courses are taught by professors. Professors have a last name and a first name. They may be Associate Professors or Full professors, only for Full Professors we are interested to the City of Birth, with name and region. Every professor belongs to a Department. Departments have a Name and an address. First Name Last Name Born in one many ID Teach Name Region Associate Professor Full Professor Belongs to Address Year of Enrollment Passed Enrolled in Grade Date Chinese St. Foreign St. Born Continent Department Country Student City Course First Name Last Name Born in one many ID Teach Name Region Associate Professor Full Professor Belongs to Address Year of Enrollment Passed Enrolled in Grade Date Chinese St. Foreign St. Born Continent Department Country Student City Course Phase 1 - Conceptual Design Phase 1 1. Student (Student Id, Last Name, First Name, Type) 2. Course (Course Id, Year of Enrollment) 3. Enrolled in (Student Id, Course Id) 4. Passed (Student Id, Course Id, Grade, Date) 5. Professor (Last Name, First Name, Type, Department Name) 6. Department (Name, Address) 7. Full Professor (Last Name, First Name, City of Birth) 8. City (Name, Region) 9. Foreign Student (Student Id, Country) 10. Chinese Student (Student Id, City of Birth) 11. Country (Name, Continent) Phase 2 - Logical Design Phase 2
Not so easy…. Phase 1 - Conceptual Design Students of a University attend courses, and pass them. Courses have a last name, a first name and a year of enrollment. Students have an ID, a Last Name and First Name. For Chinese Students we want to represent City of Birth and Region of Birth. For Foreign Students we want to represent the Country of Birth and the Continent of Birth. A Course passed by a Student has a grade and a date. Courses are taught by professors. Professors have a last name and a first name. They may be Associate Professors or Full professors, only for Full Professors we are interested to the City of Birth, with name and region. Every professor belongs to a Department. Departments have a Name and an address. Not so easy…. Phase 1 - Conceptual Design First Name Last Name Born in one many ID Teach Name Region Associate Professor Full Professor Belongs to Address Year of Enrollment Passed Enrolled in Grade Date Chinese St. Foreign St. Born Continent Department Country Student City Course
Second strategy: proceed step by step (manage complexity)
Need for strategies in conceptual design Teach one many Associate Professor Full Professor Related Chinese St. Foreign St. Student Course Students of a University attend courses, and pass them. Courses have a last name, a first name and a year of enrollment. Students have an ID, a Last Name and First Name. For Chinese Students we want to represent City of Birth and Region of Birth. For Foreign Students we want to represent the Country of Birth and the Continent of Birth. A Course passed by a Student has a grade and a date. Courses are taught by professors. Professors have a last name and a first name. They may be Associate Professors or Full professors, only for Full Professors we are interested to the City of Birth, with name and region. Every professor belongs to a Department. Departments have a Name and an address. Step 1 Step 2 First Name Last Name Born in one many ID Teach Name Region Associate Professor Full Professor Belongs to Address Year of Enrollment Passed Enrolled in Grade Date Chinese St. Foreign St. Born Continent Department Country Student City Course Teach one many Associate Professor Full Professor Related Chinese St. Foreign St. Student Course
The issue of schema quality
Different schemas for the same requirements…. Students of a University attend courses, and pass them. Courses have a last name, a first name and a year of enrollment. Students have an ID, a Last Name and First Name. For Chinese Students we want to represent City of Birth and Region of Birth. For Foreign Students we want to represent the Country of Birth and the Continent of Birth. A Course passed by a Student has a grade and a date. Courses are taught by professors. Professors have a last name and a first name. They may be Associate Professors or Full professors, only for Full Professors we are interested to the City of Birth, with name and region. Every professor belongs to a Department. Departments have a Name and an address. First Name Last Name Born in one many ID Teach Name Region Associate Professor Full Professor Belongs to Address Year of Enrollment Passed Enrolled in Grade Date Chinese St. Foreign St. Born Continent Department Country Student City Course Schema 2 Student ID Last Name First Name Country of Birth Continent of Birth City of Birth Course Id Course Name …… ……. Student Schema 1
The same in the relational model Students of a University attend courses, and pass them. Courses have a last name, a first name and a year of enrollment. Students have an ID, a Last Name and First Name. For Chinese Students we want to represent City of Birth and Region of Birth. For Foreign Students we want to represent the Country of Birth and the Continent of Birth. A Course passed by a Student has a grade and a date. Courses are taught by professors. Professors have a last name and a first name. They may be Associate Professors or Full professors, only for Full Professors we are interested to the City of Birth, with name and region. Every professor belongs to a Department. Departments have a Name and an address. 1. Student (Student Id, Last Name, First Name, Type) 2. Course (Course Id, Year of Enrollment) 3. Enrolled in (Student Id, Course Id) 4. Passed (Student Id, Course Id, Grade, Date) 5. Professor (Last Name, First Name, Type, Department Name) 6. Department (Name, Address) 7. Full Professor (Last Name, First Name, City of Birth) 8. City (Name, Region) 9. Foreign Student (Student Id, Country) 10. Chinese Student (Student Id, City of Birth) 11. Country (Name, Continent) Schema 2 1. Student (Student Id, Last Name, First Name, Type, Enrolled Course Id, Year of Enrollment, Passed course Id, Grade, Date, Professor Last Name, Professor First Name, Professor Type, Department Name, Department Address, Professor City of Birth, Region, Foreign Student Country of Birth, Foreign Student Continent, Chinese Student City of Birth, Chinese Student Region of Birth) Schema 1
Motivating example: find “errors” in schemas Students of a University attend courses, and pass them. Courses have a last name, a first name and a year of enrollment. Students have an ID, a Last Name and First Name. For Chinese Students we want to represent City of Birth and Region of Birth. For Foreign Students we want to represent the Country of Birth and the Continent of Birth. A Course passed by a Student has a grade and a date. Courses are taught by professors. Professors have a last name and a first name. They may be Associate Professors or Full professors, only for Full Professors we are interested to the City of Birth, with name and region. Every professor belongs to a Department. Departments have a Name and an address. ? ? Schema 2 Schema 1 Last Name Last Name ID First Name Name Year of Enrollment ID Name Year of Enrollment many many one many Student Enrolled in Course Student Enrolled in Course many many one many Passed Passed Teach one Teach City of Birth one Country Grade Date Foreign St. Chinese St. Grade Date Address Name many First Name many Born many one one First Name Born many one Professor many one Professor many Department Belongs to Located at Department Belongs to Last Name one one Last Name Country Name Name Scientific Area Full Professor Associate Professor Name Continent Address Name Name many one many Region City Born in Region City Born in many
Contents Qualities of a conceptual schema Strategies for conceptual design
Schema Quality Dimensions @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
Correctness with respect to the model Concerns the correct use of the categories of the model in representing requirements. Example – In the Entity Relationship model we may represent the logical link between persons and their first names using the two entities Person and FirstName and a relationship between them. The schema is not correct wrt the model since an entity should be used only when the concept has a unique existence in the real world and has an identifier. Student (1,n) has (1,1) First Name @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
Correctness with respect to requirements Concerns the correct representation of the requirements in terms of the model categories. Example - In an organization each department is headed by exactly one manager and each manager may head exactly one department. If we represent Manager and Department as entities, the Relationship between them should be one-to-one; in this case, the Schema is correct wrt requirements. If we Use a one-to-many relationship, the schema is incorrect. Manager (1,n) heads (1,1) Department @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
Minimality/Redundancy A schema is minimal if every part of the requirements is represented only once in the schema. In other words, it is not possible to eliminate some element from the schema without compromising the information content. Student 1,n Attends 1,n Course Assigned to 1,? Teaches 1,n Instructor 1,n @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
@C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008 Completeness Measures the extent to which a conceptual schema includes all the conceptual elements necessary to meet specified requirements. It is possible that the designer has not included certain characteristics present in the requirements in the schema, e.g., attributes related to an entity Person; in this case, the schema is incomplete. Students have a code, a name, a place of birth. @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
@C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008 Completeness Measures the extent to which a conceptual schema includes all the conceptual elements necessary to meet specified requirements. It is possible that the designer has not included certain characteristics present in the requirements in the schema, e.g., attributes related to an entity Person; in this case, the schema is incomplete. Students have a code, a name, a place of birth. Student Id Student Student Name @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
@C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008 Pertinence Measures how many unnecessary conceptual elements are included in the conceptual schema. In the case of a schema that is not pertinent, the designer has Gone too far in modeling the requirements, and has included too many concepts. Students have a code and a name. @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
Pertinence Measures how many unnecessary conceptual Students have a code and a name. Student Id Student Name Student Place of Birth Measures how many unnecessary conceptual elements are included in the conceptual schema. In the case of a schema that is not pertinent, the designer has Gone too far in modeling the requirements, and has included too many concepts.
How to check and achieve Completeness and Pertinence
Exercise 4.1 - Completeness Students have Student Id, Given Name, Surname, Date of Birth, City of Birth, Country of Birth. Courses have Course Id, Name, Year of Enrollment, Number of hours. Student pass Exams, with Grade and Date. Name Student Student ID City Given Name Country born many one Surname many Exam Grade many Course Course ID Course Name Year of Enrollment Number of hours
Exercise 4.1 - Completeness Students have Student Id, Given Name, Surname, Date of Birth, City of Birth, Country of Birth. Courses have Course Id, Name, Year of Enrollment, Number of hours. Student pass Exams, with Grade and Date. Name Student Student ID City Country born Given Name many one Surname many Exam Grade many Course Course ID Course Name Year of Enrollment Number of hours
Exercise 4.1 - Completeness Students have Student Id, Given Name, Surname, Date of Birth, City of Birth, Country of Birth. Courses have Course Id, Name, Year of Enrollment, Number of hours. Student pass Exams, with Grade and Date. Name Student Student ID City Country born Given Name many one Surname many Exam Grade many Course Course ID Course Name Year of Enrollment Number of hours
Exercise 4.1 - Completeness Students have Student Id, Given Name, Surname, Date of Birth, City of Birth, Country of Birth. Courses have Course Id, Name, Year of Enrollment, Number of hours. Student pass Exams, with Grade and Date. Name Student Student ID City Country born Given Name many one Surname many Exam Grade many Course Course ID Course Name Year of Enrollment Number of hours
Exercise 4.1 - Completeness Students have Student Id, Given Name, Surname, Date of Birth, City of Birth, Country of Birth. Courses have Course Id, Name, Year of Enrollment, Number of hours. Student pass Exams, with Grade and Date. Name Student Student ID City Country born Given Name many one Surname many Exam Grade The schema is lacking: - Date of Birth of Students - Date of Exams many Course Course ID Course Name Year of Enrollment Number of hours
Exercise 4.2 – Pertinence – Do yourself Students have Student Id, Given Name, Surname, Date of Birth, City of Birth, Country of Birth. Courses have Course Id, Name, Year of Enrollment. Student pass Exams, with grade. Name Student Student ID City Given Name Country born many one Surname many Exam Grade many Course Course ID Course Name Year of Enrollment Number of hours
Readability Intuitively, a schema is readable whenever it represents the meaning of the reality represented by the schema in a clear way for its intended use. This simple, qualitative definition is not easy to translate in a more formal way, since the evaluation expressed by the word clearly conveys some elements of subjectivity. In models, such as the Entity Relationship model, that provide a graphical representation of the schema, readability concerns both the diagram and the schema itself.
Unreadable schema Employee Department Floor Purchase Vendor Warehouse Of Order Worker Engineer City Born Warehouse Warranty Type In Employee Vendor Acquires Works Head Floor Located Department Produces Item Manages
Diagrammatic Readability as embedding in a grid Floor Located Manages Department Head Employee Born City Works Produces Vendor Worker Engineer Item In Warranty Type Warehouse Order Purchase Acquires Of
A Readable schema
Diagrammatic readability With regard to the diagrammatic representation, readability can be expressed objectively by a number of aesthetic criteria that human beings adopt in drawing diagrams: crossings between lines should be minimized, graphic symbols should be embedded in a grid, lines should be made of horizontal or vertical segments, The number of bends in lines should be minimized, the total area of the diagram should be minimized, and, finally, Parents in generalization hierarchies should be positioned at a higher level in the diagram in respect to children. The children entities in the generalization hierarchy should be symmetrical with respect to the parent entity.
Our example Embedded in a grid No crossings in lines All lines are horizontal or vertical Childs in generalizations symmetric w.r.t. parent Parent upon childs Purchase Of Order Worker Engineer City Born Warehouse Warranty Type In Employee Vendor Acquires Works Head Floor Located Department Produces Item Manages Floor Located Manages Depart ment Head Employee Born City Works Produces Vendor Worker Engineer Item In Type Warehouse Warranty Acquires Order Of Purchase
A Readable schema Department Employee Item Order Purchase Worker Of Worker Engineer City Born Warranty Type Employee Vendor Acquires Works Head Floor Located Department Produces Item Warehouse In Manages
@C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008 Compactness The second issue addressed by readability is the compactness of schema representation. Among the different conceptual schemas that equivalently represent a certain reality, we prefer the one or the ones that are more compact, because compactness favors readability. @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
Transformation that preserves information content and enhances compactness Employee Vendor Worker Engineer Born Born City Born @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
Transformation the preserves information content and enhances compactness Worker Engineer City Born Employee Vendor @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
Normalization in the ER model Employee-Project Employee Id Surname Given Name Project Id % of time of Employee in Project Project Name
Normalization Employee Project Employee Id Surname Given Name Assigned to (1,n) Employee Id Surname Given Name Project Id % of time of Employee in Project Project Name
Unnormalized ER schema Normalization Normalized ER schema Employee-Project Employee # Salary Project # Budget Role Employee Project Assigned to 1,n Unnormalized ER schema
Strategies in Conceptual Design
Bottom up Start from attributes, then choose entities, then relationships, finally is a and generalization hierarchies
Inside out Start from the most important entity, and assign attributes. Then move to adjacent entities, relationships and is-a and generalization hierachies, until all the schema is represented.
Top down Start from one entity, then refine the entity into a more articulated schema, then for each concept of the schema apply further refinements until all the schema is produced.
Top down Final refinement Requirements in natural language Initial Second refinement Requirements in natural language Initial schema
We compare three strategies Bottom up Inside Out Top down
Bottom up Course Student Foreign St. Chinese St. Country We are in China. Students are of two types, Foreign Students and Chinese Students. For both of them we want to represent ID, First Name and Last Name. Furthermore, for Foreign Students we want to represent the Country where they are born with Name and Continent. Students are related to courses, Courses have Name and Year of Enrollment. Students are enrolled in courses; furthermore they pass Course with Grade and Date. Name Year of Enrollment Enrolled in many ID Last Name First Name Country Chinese St. Foreign St. Course Student Passed many Grade Date ID Last Name First Name ID Last Name First Name Born many one Name Continent
Inside Out Student Course Foreign St. Chinese St. Country We are in China. Students are of two types, Foreign Students and Chinese Students. For both of them we want to represent ID, First Name and Last Name. Furthermore, for Foreign Students we want to represent the Country where they are born with Name and Continent. Students are related to courses, Courses have Name and Year of Enrollment. Students are enrolled in courses; furthermore they pass Course with Grade and Date. Name Year of Enrollment Enrolled in many ID Last Name First Name Student Course Passed many Grade Date Chinese St. Foreign St. Born many one Country Name Continent
Top down Student Career Student Career Course Student Course Student Name Year of Enrollment Course Student Career Passed Enrolled in many Student Course Student ID Last Name First Name Chinese St. Foreign St. Student Student Course Related Grade Date Passed many one Country Foreign St. Born Name Continent Country Name Year of Enrollment Course Foreign St. many one Country Born Chinese St. Foreign St. Student Student Career Course Related Related Passed Enrolled in many Student ID Last Name First Name Grade Date Passed Name Continent Country
Strategies compared Strategies / Pros and cons Pros Cons Bottom up Intuitive and «easy» Forces restructurings Oil stain Compromise between bottom up and top down Some restructuring Top down Need for an holistic view – difficult to achieve No restructuring
Lesson Exercise 4.2
Exercise 4.2 Study the following data requirements for a hospital database and produce a conceptual schema using these strategies: The bottom up strategy The inside out strategy The top-down strategy The hospital database stores data about patients, their admission and discharge from hospital’s departments, and their treatments. For each patient, we know the name, address, sex, social security number, and insurance code. For each department, we know the department’s name, its location, the name of the doctor who heads it, the number of beds available, and the number of beds occupied. Each patient goes through multiple treatments during hospitalization; for each treatment, we store its name, duration, and the possible reactions to it the patient may have.
Inside out Strategy
Patient schema – Inside Out Strategy Name Address Sex SSN Insurance Code Location Name Name of head # of beds available # of beds occupied Patient Department Passed many Admission date Discharge receive many Duration Name Reactions (1,n) Treatment
Patient schema – Bottom up Strategy The hospital database stores data about patients, their admission and discharge from hospital’s departments, and their treatments. For each patient, we know the name, address, sex, social security number, and insurance code. For each department, we know the department’s name, its location, the name of the doctor who heads it, the number of beds available, and the number of beds occupied. Each patient goes through multiple treatments during hospitalization; for each treatment, we store its name, duration, and the possible reactions to it the patient may have. Name Address Sex SSN Insurance Code Name Location Name of head # of beds available # of beds occupied Admission date Discharge date Name Reactions (1,n) Duration
Bottom up Strategy
Patient schema – Bottom up Strategy - 1 Name Address Sex SSN Insurance Code Name Location Name of head # of beds available # of beds occupied Admission date Discharge Duration Name Reactions (1,n)
Patient schema – Bottom up Strategy - 2 Name Address Sex SSN Insurance Code Name Location Name of head # of beds available # of beds occupied Patient Department Admission date Discharge Duration Name Reactions (1,n) Treatment
Patient schema – Bottom up Strategy - 3 Admitted many Patient Department Treatment Name Address Sex SSN Insurance Code Location Name of head # of beds available # of beds occupied Reactions (1,n) Duration Admission date Discharge Receives
Patient – Inside Out - 1 Patient Name Address Sex SSN Insurance Code
Patient – Inside Out - 2 Patient Department Admitted Name Address Sex SSN Insurance Code Name Location Name of head # of beds available # of beds occupied Patient Department many many Admitted Admission date Discharge
Patient – Inside Out - 3 Patient Department Treatment Admitted Name Address Sex SSN Insurance Code Name Location Name of head # of beds available # of beds occupied Patient Department many many Admitted Admission date Discharge many Receives Duration many Name Reactions (1,n) Treatment
Patient schema – Top down Strategy
Patient schema – Top down Strategy Therapies
Patient schema – Top down Strategy Patient and Treatments Department Admitted many
Patient schema – Top down Strategy Name Address Sex SSN Insurance Code Name Location Name of head # of beds available # of beds occupied Patient and Treatments Department Admitted many
Patient schema – Top down Strategy Name Address Sex SSN Insurance Code Name Location Name of head # of beds available # of beds occupied Patient and Treatments Department Admitted many Admission date Discharge
Patient schema – Top down Strategy Name Address Sex SSN Insurance Code Name Location Name of head # of beds available # of beds occupied Patient Department Admitted many Admission date Discharge Receives many Treatment
Patient schema – Top down Strategy Name Address Sex SSN Insurance Code Name Location Name of head # of beds available # of beds occupied Patient Department Admitted many Admission date Discharge Receives many Name Reactions (1,n) Treatment
Patient schema – Top down Strategy Name Address Sex SSN Insurance Code Name Location Name of head # of beds available # of beds occupied Patient Department Admitted many Admission date Discharge Receives many Duration Name Reactions (1,n) Treatment