2003.10.02 - SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003

Slides:



Advertisements
Similar presentations
Entity-Relationship (ER) Modeling
Advertisements

+ Review: Normalization and data anomalies CSCI 2141 W2013 Slide set modified from courses.ischool.berkeley.edu/i257/f06/.../Lecture06_257.ppt.
Information Systems Planning and the Database Design Process
SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.
The Relational Model System Development Life Cycle Normalisation
Database Design University of California, Berkeley
Normalization of Database Tables
9/6/2001Database Management – Fall 2000 – R. Larson Information Systems Planning and the Database Design Process University of California, Berkeley School.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
SLIDE 1IS Fall 2002 Database Design: Conceptual Model and ER Diagramming University of California, Berkeley School of Information Management.
SLIDE 1IS 257 – Fall 2006 Database Design: Logical Models: Normalization and The Relational Model University of California, Berkeley School.
9/7/1999Information Organization and Retrieval Database Design: Conceptual Model and ER Diagramming University of California, Berkeley School of Information.
11/28/2000Information Organization and Retrieval Introduction to Databases and Database Design University of California, Berkeley School of Information.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
SLIDE 1IS Fall 2010 Information Systems Planning and the Database Design Process Ray R. Larson University of California, Berkeley School.
The Relational Database Model:
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003
SLIDE 1IS Fall 2002 Information Systems Planning and the Database Design Process University of California, Berkeley School of Information.
8/28/97Information Organization and Retrieval Intergalactic Courier Service: Database and Application Design University of California, Berkeley School.
1 © Prentice Hall, 2002 Chapter 5: Logical Database Design and the Relational Model Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B.
Normalization of Database Tables
Database Design: Logical Model and Normalization
SLIDE 1IS 257 – Fall 2005 Database Design: Normalization and The Relational Model University of California, Berkeley School of Information.
8/28/97Information Organization and Retrieval Database Design: Normalization University of California, Berkeley School of Information Management and Systems.
Callie’s Birthday SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm.
SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Introduction to Database Design.
SLIDE 1IS 257 – Fall 2004 Database Design: Normalization and The Relational Model University of California, Berkeley School of Information.
SLIDE 1IS 257 – Spring 2004 Information Systems Planning and the Database Design Process Ray R. Larson University of California, Berkeley.
SLIDE 1IS 257 – Spring 2004 Database Design: Conceptual Model and ER Diagramming Ray R. Larson University of California, Berkeley School of.
“DOK 322 DBMS” Y.T. Database Design Hacettepe University Department of Information Management DOK 322: Database Management Systems.
8/28/97Information Organization and Retrieval Files and Databases University of California, Berkeley School of Information Management and Systems SIMS.
Chapter 5 Normalization of Database Tables
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
8/28/97Information Organization and Retrieval Database Design University of California, Berkeley School of Information Management and Systems SIMS 202:
SLIDE 1IS Fall 2006 Information Systems Planning and the Database Design Process Ray R. Larson University of California, Berkeley School.
SLIDE 1IS 257 – Spring 2004 Database Design: Normalization and Access DB Creation University of California, Berkeley School of Information.
Information Systems Planning and the Database Design Process
Week 6 Lecture Normalization
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Concepts and Terminology Introduction to Database.
Architecture for a Database System
Database Systems: Design, Implementation, and Management Tenth Edition
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 6 Normalization of Database Tables.
1 DATABASE SYSTEMS DESIGN IMPLEMENTATION AND MANAGEMENT INTERNATIONAL EDITION ROB CORONEL CROCKETT Chapter 7 Normalisation.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
Logical Database Design Relational Model. Logical Database Design Logical database design: process of transforming conceptual data model into a logical.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
1 A Guide to MySQL 2 Database Design Fundamentals.
1 5 Normalization. 2 5 Database Design Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that.
SLIDE 1IS 202 – FALL 2006 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Introduction to Database Design.
9/23/2012ISC329 Isabelle Bichindaritz1 Normalization.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
Logical Database Design and the Relational Model.
11/10/2009GAK1 Normalization. 11/10/2009GAK2 Learning Objectives Definition of normalization and its purpose in database design Types of normal forms.
Lecture 4: Logical Database Design and the Relational Model 1.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
Logical Database Design and Relational Data Model Muhammad Nasir
SLIDE 1IS 257 – Fall 2006 Normalization Normalization theory is based on the observation that relations with certain properties are more effective.
1 CS490 Database Management Systems. 2 CS490 Database Normalization.
12/4/2001Information Organization and Retrieval Database Design University of California, Berkeley School of Information Management and Systems SIMS 202:
University of California, Berkeley School of Information
Normalization Karolina muszyńska
Chapter 5: Logical Database Design and the Relational Model
Chapter 6 Normalization of Database Tables
Normalization.
Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting.
University of California, Berkeley School of Information
Database Design Hacettepe University
Presentation transcript:

SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall SIMS 202: Information Organization and Retrieval Lecture 12: Database Design

SLIDE 2IS 202 – FALL 2003 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Discussion Questions

SLIDE 3IS 202 – FALL 2003 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Discussion Questions

SLIDE 4IS 202 – FALL 2003 Models (1) Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

SLIDE 5IS 202 – FALL 2003 Database System Life Cycle Growth, Change, & Maintenance 6 Operations 5 Integration 4 Design 1 Conversion 3 Physical Creation 2

SLIDE 6IS 202 – FALL 2003 Another View of the Life Cycle Operations 5 Conversion 3 Physical Creation 2 Growth, Change 6 Integration 4 Design 1

SLIDE 7IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

SLIDE 8IS 202 – FALL 2003 Entity An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information –Persons (e.g.: customers in a business, employees, authors) –Things (e.g.: purchase orders, meetings, parts, companies) Employee

SLIDE 9IS 202 – FALL 2003 Attributes Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (This is the Metadata for the entities) Employee Last Middle First Name SSN Age Birthdate Projects

SLIDE 10IS 202 – FALL 2003 Relationships Relationships are the associations between entities They can involve one or more entities and belong to particular relationship types –One to One –One to Many –Many to Many

SLIDE 11IS 202 – FALL 2003 Relationships Class Attends Student Part Supplies project parts Supplier Project

SLIDE 12IS 202 – FALL 2003 Types of Relationships Concerned only with cardinality of relationship Truck Assigned EmployeeProject Assigned EmployeeProject Assigned Employee 11 n n 1 m Chen ER notation

SLIDE 13IS 202 – FALL 2003 More Complex Relationships Project Evaluation Employee Manager 1/n/n 1/1/1 n/n/1 Project Assigned Employee 4(2-10) 1 SSNProjectDate Manages Employee Manages Is Managed By 1 n

SLIDE 14IS 202 – FALL 2003 Weak Entities Owe existence entirely to another entity Order-line Contains Order Invoice # Part# Rep# QuantityInvoice#

SLIDE 15IS 202 – FALL 2003 Supertype and Subtype Entities Clerk Is one of Sales-rep Invoice Other Employee Sold Manages

SLIDE 16IS 202 – FALL 2003 Many to Many Relationships Employee Project Is Assigned Project Assignment Assigned SSN Proj# SSN Proj# Hours

SLIDE 17IS 202 – FALL 2003 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Discussion Questions

SLIDE 18IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

SLIDE 19IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

SLIDE 20IS 202 – FALL 2003 Requirements Analysis Conceptual Requirements –Systems Analysis Process Examine all of the information sources used in existing applications Identify the characteristics of each data element –Numeric –Text –Date/time –Etc. Examine the tasks carried out using the information Examine results or reports created using the information

SLIDE 21IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

SLIDE 22IS 202 – FALL 2003 Conceptual Design Conceptual Model –Merge the collective needs of all applications –Determine what Entities are being used Some object about which information is to maintained –What are the Attributes of those entities? Properties or characteristics of the entity What attributes uniquely identify the entity –What are the Relationships between entities How the entities interact with each other?

SLIDE 23IS 202 – FALL 2003 Developing a Conceptual Model Overall view of the database that integrates all the needed information discovered during the requirements analysis Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details Can also be represented using other modeling tools (such as UML)

SLIDE 24IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

SLIDE 25IS 202 – FALL 2003 Logical Design Logical Model –How is each entity and relationship represented in the Data Model of the DBMS Hierarchic? Network? Relational? Object-Oriented?

SLIDE 26IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

SLIDE 27IS 202 – FALL 2003 Physical Design Internal Model –Choices of index file structure –Choices of data storage formats –Choices of disk layout

SLIDE 28IS 202 – FALL 2003 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

SLIDE 29IS 202 – FALL 2003 Database Application Design External Model –User views of the integrated database –Making the old (or updated) applications work with the new database design

SLIDE 30IS 202 – FALL 2003 Terms and Concepts Key –An attribute or set of attributes used to identify or locate records in a file Primary Key –An attribute or set of attributes that uniquely identifies each record in a file Candidate Key –An attribute or set of attributes that might be used as a primary key

SLIDE 31IS 202 – FALL 2003 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Discussion Questions

SLIDE 32IS 202 – FALL 2003 Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data Normalization is a multi-step process beginning with an “unnormalized” relation

SLIDE 33IS 202 – FALL 2003 Normal Forms First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)

SLIDE 34IS 202 – FALL 2003 Normalization Boyce- Codd and Higher Functional dependency of nonkey attributes on the primary key - Atomic values only Full Functional dependency of nonkey attributes on the primary key No transitive dependency between nonkey attributes All determinants are candidate keys - Single multivalued dependency

SLIDE 35IS 202 – FALL 2003 Unnormalized Relations First step in normalization is to convert the data into a two-dimensional table In unnormalized relations data can repeat within a column (The following is a highly contrived example that actually bears only a slight resemblance to the current implementation of the Phone/Photo project database)

SLIDE 36IS 202 – FALL 2003 Unnormalized Relations

SLIDE 37IS 202 – FALL 2003 First Normal Form To move to First Normal Form a relation must contain only atomic values at each row and column –No repeating groups –A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation

SLIDE 38IS 202 – FALL 2003 First Normal Form

SLIDE 39IS 202 – FALL NF Storage Anomalies Insertion: A new person has not yet taken a picture -- hence no Picture # -- Since Picture # is part of the key we can’t insert Insertion: If People is are known and likely to be photographed, but haven’t been yet -- there is be no way to include that person in the database Update: If a Person changes status (e.g. Mary Jones becomes a Student) we have to change multiple rows in the database Deletion (type 1): Deleting a Person record may also delete all info about People in the pictures Deletion (type 2): When there are functional dependencies (like Object and Object_features) changing one item eliminates other information

SLIDE 40IS 202 – FALL 2003 Second Normal Form A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key –That is, every nonkey attribute needs the full primary key for unique identification

SLIDE 41IS 202 – FALL 2003 Second Normal Form Person Table

SLIDE 42IS 202 – FALL 2003 Second Normal Form People Table

SLIDE 43IS 202 – FALL 2003 Second Normal Form Picture Table

SLIDE 44IS 202 – FALL NF Storage Anomalies Removed Insertion: Can now enter new Persons who haven’t yet taken pictures Insertion: Can now enter People who haven’t been photographed Deletion (type 1): If Charles Brown withdraws his photos the corresponding tuples from Person and Picture tables can be deleted without losing information on David Rosen Update: If John White takes a third picture, and has changed status (e.g., graduate), we only need to change the Person table in one place

SLIDE 45IS 202 – FALL NF Storage Anomalies Insertion: Cannot enter the fact that a particular object has a particular feature unless it is associated with a particular picture Deletion: If John White describes some other object that Beth Little has while shopping, we lose the fact that the bookbag is blue Update: If the features of an object change change we have to update multiple occurrences of object features

SLIDE 46IS 202 – FALL 2003 Third Normal Form A relation is said to be in Third Normal Form if there are no transitive functional dependencies between nonkey attributes –When one nonkey attribute can be determined with one or more nonkey attributes there is said to be a transitive functional dependency The Obect_Feature column in the Picture table is determined by the Object –Object_Feature is transitively functionally dependent on Object so Picture is not 3NF

SLIDE 47IS 202 – FALL 2003 Third Normal Form Picture Table

SLIDE 48IS 202 – FALL 2003 Third Normal Form Object Table

SLIDE 49IS 202 – FALL NF Storage Anomalies Removed Insertion: We can now enter the fact that an object has a particular feature Deletion: If John White describes some other object that Beth Little has while shopping, we don’t lose the fact that the bookbag is blue Update: The features for each object appear only once

SLIDE 50IS 202 – FALL 2003 Boyce-Codd Normal Form Most 3NF relations are also BCNF relations A 3NF relation is NOT in BCNF if: –Candidate keys in the relation are composite keys (they are not single attributes) –There is more than one candidate key in the relation, and –The keys are not disjoint, that is, some attributes in the keys are common

SLIDE 51IS 202 – FALL 2003 Most 3NF Relations Are Also BCNF – Is This One?

SLIDE 52IS 202 – FALL 2003 BCNF Relations

SLIDE 53IS 202 – FALL 2003 Additional Issues Why separate Person and People? –They are really all People/Persons in different roles Shouldn’t a picture have a unique ID regardless of Who is in it? Can’t we have multiple people in the same picture, multiple objects, etc.? Can’t objects have multiple characteristics?

SLIDE 54IS 202 – FALL 2003 BCNF Relations

SLIDE 55IS 202 – FALL 2003 BCNF Added Capabilities Can now have a picture with no (identified) people in it Can have multiple objects, activities, and people associated with each picture

SLIDE 56IS 202 – FALL 2003 Fourth Normal Form Any relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial Eliminate non-trivial multivalued dependencies by projecting into simpler tables

SLIDE 57IS 202 – FALL 2003 Fifth Normal Form A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation Implies that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation

SLIDE 58IS 202 – FALL 2003 Fifth Normal Form Relations People Table

SLIDE 59IS 202 – FALL 2003 Normalizing to Death Normalization splits database information across multiple tables To retrieve complete information from a normalized database, the JOIN operation must be used JOIN tends to be expensive in terms of processing time, and very large joins are very expensive

SLIDE 60IS 202 – FALL 2003 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Discussion Questions

SLIDE 61IS 202 – FALL 2003 Questions: Brooke Maury Discussion Questions on Hoffer & McFadden: If the goal of the relational database model is to encode a ‘conceptual’ design into a logical design, is it possible that improved technology and the development of new modeling techniques will supplant the RDBMS? Specifically, what impact will XML and the development of document engineering have on organizing information in multiple normalized tables? Conversely, what does the relational model have that would be lost if a conceptual design was encoded in another model?

SLIDE 62IS 202 – FALL 2003 Questions: Brooke Maury The drive to develop the RDBM was in part motivated by a need to minimize the space required and improve the performance of database systems by removing redundancies. What impact will very inexpensive data storage and computing power have on the relational database model and the third normal form especially?

SLIDE 63IS 202 – FALL 2003 Questions: Shane Ahern Discussion Questions for "Logical Database Design and the Relational Model" Is the normalization process described really necessary? When I design a database schema, I find that by thinking of tables in terms of they entities they represent (employees, sales, events), I avoid most of the problems of normalization that the process seeks to address (i.e. salesperson and region in Sales table, salesperson is clearly a distinct entity from sales). If the formal process described in the article is not followed, are there potential pitfalls that might lead to problems with your database schema?

SLIDE 64IS 202 – FALL 2003 Questions: Shane Ahern The article points out that "the relational model does not yet directly support supertype/subtype relationships." Once the tables in a relational database have been decomposed to third normal form, the database is efficient from systems point-of-view, but the tables no longer represent a representation of the data that is intuitive to humans. The object-oriented model more accurately mirrors the way we think about the concepts that we wish to store in databases. So perhaps object-oriented database systems are worth considering. What about XML databases?

SLIDE 65IS 202 – FALL 2003 Questions: Arthur Law The three models that we have been presented with, Entity Relationship Model, NIAM Model, and Object Oriented Model all enforce a specific thought process in the organization and relationship between items in a database. With all of our recent discussion of computers understanding natural language are these methods now out of date with how we should be organizing information? Should we use artificial intelligence or learning algorithms to statistically determine the relationship between entities or is there still value in using these models?

SLIDE 66IS 202 – FALL 2003 Questions: Arthur Law Each model is approximately one decade apart in development and a quick Google search shows that companies are using databases with one of the three models. However, as new models arise there doesn't seem too much interest in migrating from one data model to another. Which makes sense given that an organization using a given model probably finds that it works. Now with the proliferation of XML, we see more information being shared between organizations, so are we fated for an expensive and lengthy translation process between databases? Or should all DB administrators be responsible for upgrading to the latest model?

SLIDE 67IS 202 – FALL 2003 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Discussion Questions Next Time/Readings

SLIDE 68IS 202 – FALL 2003 Next Time Guest Lecture – Bob Glushko on XML and “Document Engineering” Readings on Class website No assigned discussion questions (but bring your questions on the readings)