CAS CS 460/660 Entity/Relationship Model
Review What a DBMS offer? Efficient data organization Efficient data retrieval Data integrity
Review-the big picture Data Modelling Relational E-R Query Languages SQL Relational Algebra Relational Calculus Storing Data File Indexes Buffer Pool Management Query Optimization External Sorting Join Algorithms Query Plans, Cost Estimation Data Integrity Data Organization Data retrieval
Databases Model the Real World “Data Model” translates real world things into structures computers can store Many models: Relational, E-R, O-O, Network, Hierarchical, etc. Relational (more next time) Rows & Columns Keys & Foreign Keys to link Relations Students Enrolled
Problems with Relational Model CREATE TABLE Students (sid CHAR(20), name CHAR(20), login CHAR(10), age INTEGER, gpa FLOAT) CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2)) With complicated schemas, it may be hard for a person to understand the structure from the data definition. Enrolled Students
One Solution: The E-R Model Instead of relations, it has: Entities and Relationships These are described with diagrams both structure, notation more obvious to humans since lot name Employee ssn dname budget did Department Works_In
Steps in Database Design Requirements Analysis user needs; what must database do? Conceptual Design high level descr (often done w/ER model) Logical Design translate ER into DBMS data model Schema Refinement consistency, normalization Physical Design indexes, disk layout Security Design who accesses what, and how
Example: DBA for Bank of America Requirements Specification Determine the requirements of clients ( Database to store information about customers, accounts, loans, branches, transactions, …) Conceptual Design Express client requirements in terms of E/R model. Confirm with clients that requirements are correct. Specify required data operations Logical Design Convert E/R model to relational, object-based, XML-based,… Physical Design Specify file organizations, build indexes
ER Model Basics Entity: Employees ssn name lot Entity: Real-world thing, distinguishable from other objects. Noun phrase (e.g., Bob Smith, Comm Ave Branch, Account 1234, etc) Entity described by set of attributes. Entity Set: A collection of similar entities. E.g., all employees. All entities in an entity set have the same set of attributes. (Until we consider hierarchies, anyway!) Each attribute has a domain. 3
ER Model Basics (Contd.) since lot name Employees ssn dname budget did Departments Works_In Relationship: Association among two or more entities. E.g., Bob Smith works in Pharmacy department. relationships can have their own attributes. Verb phrases (e.g., works_at, enrolled_in, etc) Relationship Set: Collection of similar relationships. An n-ary relationship set R relates n entity sets E1 ... En ; each relationship in R involves entities e1 E1, ..., en En 4
E/R Data Model An Example Works_At Employees Branches essn phone bname bcity since seniority ename children Entity Set Relationship Set Attribute Employee Works_For phone Works_For manager worker Lots of notation to come.
E/R Data Model Types of Attributes Works_At Employees Branches essn phone bname bcity since seniority ename children Works_For manager worker ename Default children Multivalued Derived seniority
E/R Data Model Types of relationships Works_At Employees Branches essn phone bname bcity since seniority ename children Works_For manager worker Works_For Many-to-One (n:1) Works_At Many-to-Many (n:m)
E/R Data Model Recursive relationships Works_At Employees Branches essn phone bname bcity since seniority ename children Works_For Recursive relationships: Must be declared with roles manager worker manager Employee Works_For worker
E/R Data Model Design Issue #1: Entity Sets vs. Attributes An Example: Employees can have multiple phones (b) (a) Uses Employees vs Employees Phones phone_no phone_loc no loc To resolve, determine how phones are used 1. Can many employees share a phone? (If yes, then (b)) 2. Can employees have multiple phones? (if yes, then (b), or (a) with multivalued attributes) 3. Else (a), perhaps with composite attributes Employees phone no loc
E/R Data Model Design Issue #2: Entity Sets vs. Relationship Sets An Example: How to model bank loans Loans Borrows Customers Branches Customers Loans vs cssn cname bname bcity cssn cname lno amt (a) lno amt (b) To resolve, determine how loans are issued 1. Can there be more than one customer per loan? If yes, then (a). Otherwise, loan info must be replicated for each customer (wasteful, potential update anomalies) 2. Is loan a noun or a verb? Both, but more of a noun to a bank. (hence (a) probably more appropriate)
E/R Data Model Design Issue #3: Relationship Cardinalities An Example: Borrows ? ? Customers Loans Variations on Borrows: 1. Can a customer hold multiple loans? 2. Can a loan be jointly held by more than 1 customer?
E/R Data Model Design Issue #3: Relationship Cardinalities Borrows ? ? Customers Loans Cardinalities of Borrows: Borrows Borrows Borrows Borrows
E/R Data Model Design Issue #3: Relationship Cardinalities (cont) In general... 1:1 n:1 1:n n:m
E/R Data Model Design Issue #4: N-ary vs Binary Relationship Sets An Example: Works_At Works_at Employees Branches Ternary: Choose n-ary when possible! (Avoids redundancy, update anomalies) Depts (Joe, Moody, Acct) Î Works_At vs Binary: WAE WAB Employees WA Branches WAD (Joe, w3) Î WAE (Moody, w3) Î WAB (Acct, w3) Î WAD Depts
E/R Data Model Keys Key = set of attributes identifying individual entities or relationships Employees essn ename eaddress ephone A. Superkey: any attribute set that distinguishes identities e.g., {essn}, {essn, ename, eaddress} B. Candidate Key: “minimal superkey” (can’t remove attributes and preserve “keyness”) e.g., {essn}, {ename, eaddress} C. Primary Key: candidate key chosen as the key by a DBA e.g., {essn} (denoted by underline)
E/R Data Model Relationship Set Keys Works_At Employees Branches ... ... essn ename bname bcity since Q: What attributes needed to represent relationships in Works_At? e1 b1 e2 b2 e3 A: {essn, bname, since}
E/R Data Model Relationship Set Keys (cont.) Works_At Employees Branches ... ... essn ename bname bcity since Q: What are the candidate keys of Works_At? e1 b1 e2 b2 e3 A: {essn}
E/R Data Model Relationship Set Keys (cont.) Works_At Employees Branches ... ... essn ename bname bcity since Q: What are the candidate keys if Works_At is...? a. 1:n A: {bname} b. n:m A: {essn, bname} Assumption: employees have <= 1 record per branch c. 1:1 A: {essn}, {bname}
E/R Data Model Relationship Set Keys (cont.) General Rules for Relationship Set Keys R E1 E2 ... ... P (E1) P (E2) If R is: R 1:1 1:n n:1 n:m Candidate Keys P (E1) or P (E2) P (E2) P (E1) P (E1) È P (E2)
E/R Data Model Existence Dependencies and Weak Entity Sets Idea: Existence of one entity depends on another Example: Loans and Loan Payments Loan_Pmt Payments Loans lno lamt pno pamt pdate Weak Entity Set Identifying Relationship Total Participation
E/R Data Model Existence Dependencies and Weak Entity Sets Loan_Pmt Payments Loans lno lamt pno pamt pdate Weak Entity Sets existence of payments depends upon loans have no superkeys: different payment records (for different loans) can be identical instead of keys, discriminators: discriminate between payments for given loan (e.g., pno)
E/R Data Model Existence Dependencies and Weak Entity Sets Loan_Pmt Payments Loans lno lamt pno pamt pdate Identifying Relationships We say: Loan is identifying owner in Loan_Pmt Payment is weak entity set in Loan_Pmt Payment is existence dependent on Loan
E/R Data Model Existence Dependencies and Weak Entity Sets Loan_Pmt Payments Loans lno lamt pno pamt pdate Total Participation All elements of Payment appear in Loan_Pmt
E/R Data Model Existence Dependencies and Weak Entity Sets ... atta1 attam attb1 ... attbn Q. Is {attb1, …, attbn} a superkey of E2? A: No Q. Name a candidate key of E2 A: {atta1, attb1}
E/R Data Model Extensions to the Model: Specialization and Generalization An Example: Customers can have checking and savings accts Checking ~ Savings (many of the same attributes) Old Way: Has1 Customers Savings Accts acct_no balance interest Has2 Checking Accts acct_no balance overdraft
E/R Data Model Extensions to the Model: Specialization and Generalization An Example: Customers can have checking and savings accts Checking ~ Savings (many of the same attributes) New Way: acct_no balance Has superclass Customers Accounts Isa Savings Accts Checking Accts interest overdraft subclasses
E/R Data Model Extensions to the Model: Specialization and Generalization Subclass Distinctions: 1. User-Defined vs. Condition-Defined User: Membership in subclasses explicitly determined (e.g., Employee, Manager < Person) Condition: Membership predicate associated with subclasses - e.g: Person Isa Child Adult Senior age < 18 age³ 18 age³ 65
E/R Data Model Extensions to the Model: Specialization and Generalization Subclass Distinctions: 2. Overlapping vs. Disjoint Overlapping: Entities can belong to >1 entity set (e.g., Adult, Senior) Disjoint: Entities belong to exactly 1 entity set (e.g., Child) Person Isa Child Adult Senior age < 18 age³ 18 age³ 65
E/R Data Model Extensions to the Model: Specialization and Generalization Subclass Distinctions: 3. Total vs. Partial Membership Total: Every entity of superclass belongs to a subclass e.g., Partial: Some entities of superclass do not belong to any subclass (e.g., if Adults condition is age³ 21 ) Person Isa Child Adult Senior age < 18 age³ 18 age³ 65
E/R Data Model Extensions to the Model: Aggregation E/R: No relationships between relationships E.g.: Associate loan officers with Borrows relationship set Borrows Customers Loans ? Loan_Officer Employees Associate Loan Officer with Loan? What if we want a loan officer for every (customer, loan) pair?
E/R Data Model Extensions to the Model: Aggregation E/R: No relationships between relationships E.g.: Associate loan officers with Borrows relationship set Borrows Customers Loans Loan_Officer Employees Associate Loan Officer with Borrows? Must First Aggregate
E/R Data Model Summary Entities, Relationships (sets) Both can have attributes (simple, multivalued, derived, composite) Cardinality or relationship sets (1:1, n:1, n:m) Keys: superkeys, candidate keys, primary key DBA chooses primary key for entity sets Automatically determined for relationship sets Weak Entity Sets, Existence Dependence, Total/Partial Participation Specialization and Generalization (E/R + inheritance) Aggregation (E/R + higher-order relationships)
These things get pretty hairy! Many E-R diagrams cover entire walls! A modest example:
A Cadastral E-R Diagram cadastral: showing or recording property boundaries, subdivision lines, buildings, and related details Source: US Dept. Interior Bureau of Land Management, Federal Geographic Data Committee Cadastral Subcommittee http://www.fairview-industries.com/standardmodule/cad-erd.htm