Presentation is loading. Please wait.

Presentation is loading. Please wait.

2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

Similar presentations


Presentation on theme: "2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational."— Presentation transcript:

1 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational Model

2 2005.10.11 - SLIDE 2IS 202 – FALL 2005 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams –Database Design Relational Operations Normalization Discussion Questions

3 2005.10.11 - SLIDE 3IS 202 – FALL 2005 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams –Database Design Relational Operations Normalization Discussion Questions

4 2005.10.11 - SLIDE 4IS 202 – FALL 2005 Models (1) Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

5 2005.10.11 - SLIDE 5IS 202 – FALL 2005 Database System Life Cycle Growth, Change, & Maintenance 6 Operations 5 Integration 4 Design 1 Conversion 3 Physical Creation 2

6 2005.10.11 - SLIDE 6IS 202 – FALL 2005 Another View of the Life Cycle Operations 5 Conversion 3 Physical Creation 2 Growth, Change 6 Integration 4 Design 1

7 2005.10.11 - SLIDE 7IS 202 – FALL 2005 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

8 2005.10.11 - SLIDE 8IS 202 – FALL 2005 Entity An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information –Persons (e.g.: customers in a business, employees, authors) –Things (e.g.: purchase orders, meetings, parts, companies) Employee

9 2005.10.11 - SLIDE 9IS 202 – FALL 2005 Attributes Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (This is the Metadata for the entities) Employee Last Middle First Name SSN Age Birthdate Projects

10 2005.10.11 - SLIDE 10IS 202 – FALL 2005 Relationships Relationships are the associations between entities They can involve one or more entities and belong to particular relationship types –One to One –One to Many –Many to Many

11 2005.10.11 - SLIDE 11IS 202 – FALL 2005 Relationships Class Attends Student Part Supplies project parts Supplier Project

12 2005.10.11 - SLIDE 12IS 202 – FALL 2005 Types of Relationships Concerned only with cardinality of relationship Truck Assigned EmployeeProject Assigned EmployeeProject Assigned Employee 11 n n 1 m Chen ER notation

13 2005.10.11 - SLIDE 13IS 202 – FALL 2005 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

14 2005.10.11 - SLIDE 14IS 202 – FALL 2005 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

15 2005.10.11 - SLIDE 15IS 202 – FALL 2005 Requirements Analysis Conceptual Requirements –Systems Analysis Process Examine all of the information sources used in existing applications Identify the characteristics of each data element –Numeric –Text –Date/time –Etc. Examine the tasks carried out using the information Examine results or reports created using the information

16 2005.10.11 - SLIDE 16IS 202 – FALL 2005 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

17 2005.10.11 - SLIDE 17IS 202 – FALL 2005 Conceptual Design Conceptual Model –Merge the collective needs of all applications –Determine what Entities are being used Some object about which information is to maintained –What are the Attributes of those entities? Properties or characteristics of the entity What attributes uniquely identify the entity –What are the Relationships between entities How the entities interact with each other?

18 2005.10.11 - SLIDE 18IS 202 – FALL 2005 Developing a Conceptual Model Overall view of the database that integrates all the needed information discovered during the requirements analysis Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details Can also be represented using other modeling tools (such as UML)

19 2005.10.11 - SLIDE 19IS 202 – FALL 2005 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

20 2005.10.11 - SLIDE 20IS 202 – FALL 2005 Logical Design Logical Model –How is each entity and relationship represented in the Data Model of the DBMS Hierarchic? Network? Relational? Object-Oriented?

21 2005.10.11 - SLIDE 21IS 202 – FALL 2005 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

22 2005.10.11 - SLIDE 22IS 202 – FALL 2005 Physical Design Internal Model –Choices of index file structure –Choices of data storage formats –Choices of disk layout

23 2005.10.11 - SLIDE 23IS 202 – FALL 2005 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

24 2005.10.11 - SLIDE 24IS 202 – FALL 2005 Database Application Design External Model –User views of the integrated database –Making the old (or updated) applications work with the new database design

25 2005.10.11 - SLIDE 25IS 202 – FALL 2005 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams –Database Design Relational Operations Normalization Discussion Questions

26 2005.10.11 - SLIDE 26IS 202 – FALL 2005 Relational Algebra Operations Restrict Project Product Union Intersect Difference Join Divide

27 2005.10.11 - SLIDE 27IS 202 – FALL 2005 Restrict Extracts specified tuples (rows) from a specified relation (table) –Restrict is AKA “Select”

28 2005.10.11 - SLIDE 28IS 202 – FALL 2005 Project Extracts specified attributes(columns) from a specified relation.

29 2005.10.11 - SLIDE 29IS 202 – FALL 2005 Join Builds a relation from two specified relations consisting of all possible concatenated pairs, one from each of the two relations, such that in each pair the two tuples satisfy some condition. (E.g., equal values in a given col.) A1 B1 A2 B1 A3 B2 B1 C1 B2 C2 B3 C3 A1 B1 C1 A2 B1 C1 A3 B2 C2 (Natural or Inner) Join

30 2005.10.11 - SLIDE 30IS 202 – FALL 2005 ER Diagram: Acme Widget Co. Contains Part Part#Count Price Customer Quantity Orders Cust# Invoice Writes Sales-Rep Invoice# Sales Rep# Line-Item Contains Part# Invoice# Cust# Hourly Employee ISA Emp# Wage

31 2005.10.11 - SLIDE 31IS 202 – FALL 2005 Join Items for Relational DB Line_itemParts Customer Invoice

32 2005.10.11 - SLIDE 32IS 202 – FALL 2005 Relational Operations What is the name of the customer who ordered Large Red Widgets? –Restrict “large red widget” row from Part as temp1 –Join temp1 with Line-item on Part # as temp2 –Join temp2 with Invoice on Invoice # as temp3 –Join temp3 with Customer on cust # as temp4 –Project Company from temp4 as answer

33 2005.10.11 - SLIDE 33IS 202 – FALL 2005 SQL Database Definition and Querying –Can be used as an interactive query language –Can be imbedded in programs Relational Calculus combines Restrict, Project and Join operations in a single command: SELECT

34 2005.10.11 - SLIDE 34IS 202 – FALL 2005 SELECT Syntax: SELECT [DISTINCT] attr1, attr2,…, attr3 FROM rel1 r1, rel2 r2,… rel3 r3 WHERE condition1 {AND | OR} condition2 ORDER BY attr1 [DESC], attr3 [DESC]

35 2005.10.11 - SLIDE 35IS 202 – FALL 2005 SQL SELECT SELECT c.COMPANY FROM Customer c, Parts p, Invoice i, Line_Items z WHERE c.Cust# = i.Cust# AND i.Invoice# = z.Invoice# AND z.Part# = p.Part# AND p.Name = “large red widget”;

36 2005.10.11 - SLIDE 36IS 202 – FALL 2005 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams –Database Design Relational Operations Normalization Discussion Questions

37 2005.10.11 - SLIDE 37IS 202 – FALL 2005 Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data Normalization is a multi-step process beginning with an “unnormalized” relation

38 2005.10.11 - SLIDE 38IS 202 – FALL 2005 Normal Forms First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)

39 2005.10.11 - SLIDE 39IS 202 – FALL 2005 Normalization Boyce- Codd and Higher Functional dependency of nonkey attributes on the primary key - Atomic values only Full Functional dependency of nonkey attributes on the primary key No transitive dependency between nonkey attributes All determinants are candidate keys - Single multivalued dependency

40 2005.10.11 - SLIDE 40IS 202 – FALL 2005 Unnormalized Relations First step in normalization is to convert the data into a two-dimensional table In unnormalized relations data can repeat within a column (The following is a highly contrived example that has only a very vague resemblance to the implementation of the Phone/Photo project database from IS202 in 2004 …)

41 2005.10.11 - SLIDE 41IS 202 – FALL 2005 Unnormalized Relations

42 2005.10.11 - SLIDE 42IS 202 – FALL 2005 First Normal Form To move to First Normal Form a relation must contain only atomic values at each row and column –No repeating groups –A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation

43 2005.10.11 - SLIDE 43IS 202 – FALL 2005 First Normal Form

44 2005.10.11 - SLIDE 44IS 202 – FALL 2005 1NF Storage Anomalies Insertion: A new person has not yet taken a picture -- hence no Picture # -- Since Picture # is part of the key we can’t insert Insertion: If a Person is are known and likely to be photographed, but haven’t been yet -- there is be no way to include that person in the database Update: If a Person changes status (e.g. Mary Jones becomes a Student) we have to change multiple rows in the database Deletion (type 1): Deleting a Person record may also delete all info about People in the pictures Deletion (type 2): When there are functional dependencies (like Object and Object_features) changing one item eliminates other information

45 2005.10.11 - SLIDE 45IS 202 – FALL 2005 Second Normal Form A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key –That is, every nonkey attribute needs the full primary key for unique identification

46 2005.10.11 - SLIDE 46IS 202 – FALL 2005 Second Normal Form Person Table

47 2005.10.11 - SLIDE 47IS 202 – FALL 2005 Second Normal Form People Table

48 2005.10.11 - SLIDE 48IS 202 – FALL 2005 Second Normal Form Picture Table

49 2005.10.11 - SLIDE 49IS 202 – FALL 2005 1NF Storage Anomalies Removed Insertion: Can now enter new Persons who haven’t yet taken pictures Insertion: Can now enter People who haven’t been photographed Deletion (type 1): If Charles Brown withdraws his photos the corresponding tuples from Person and Picture tables can be deleted without losing information on David Rosen Update: If John White takes a third picture, and has changed status (e.g., graduate), we only need to change the Person table in one place

50 2005.10.11 - SLIDE 50IS 202 – FALL 2005 2NF Storage Anomalies Insertion: Cannot enter the fact that a particular object has a particular feature unless it is associated with a particular picture Deletion: If John White describes some other object that Beth Little has while shopping, we lose the fact that the bookbag is blue Update: If the features of an object change change we have to update multiple occurrences of object features

51 2005.10.11 - SLIDE 51IS 202 – FALL 2005 Third Normal Form A relation is said to be in Third Normal Form if there are no transitive functional dependencies between nonkey attributes –When one nonkey attribute can be determined with one or more nonkey attributes there is said to be a transitive functional dependency The Obect_Feature column in the Picture table is determined by the Object –Object_Feature is transitively functionally dependent on Object so Picture is not 3NF

52 2005.10.11 - SLIDE 52IS 202 – FALL 2005 Third Normal Form Picture Table

53 2005.10.11 - SLIDE 53IS 202 – FALL 2005 Third Normal Form Object Table

54 2005.10.11 - SLIDE 54IS 202 – FALL 2005 2NF Storage Anomalies Removed Insertion: We can now enter the fact that an object has a particular feature Deletion: If John White describes some other object that Beth Little has while shopping, we don’t lose the fact that the bookbag is blue Update: The features for each object appear only once

55 2005.10.11 - SLIDE 55IS 202 – FALL 2005 Boyce-Codd Normal Form Most 3NF relations are also BCNF relations A 3NF relation is NOT in BCNF if: –Candidate keys in the relation are composite keys (they are not single attributes) –There is more than one candidate key in the relation, and –The keys are not disjoint, that is, some attributes in the keys are common

56 2005.10.11 - SLIDE 56IS 202 – FALL 2005 Most 3NF Relations Are Also BCNF – Is This One?

57 2005.10.11 - SLIDE 57IS 202 – FALL 2005 BCNF Relations

58 2005.10.11 - SLIDE 58IS 202 – FALL 2005 Additional Issues Why separate Person and People? –They are really all People/Persons in different roles Shouldn’t a picture have a unique ID regardless of Who is in it? Can’t we have multiple people in the same picture, multiple objects, etc.? Can’t objects have multiple characteristics?

59 2005.10.11 - SLIDE 59IS 202 – FALL 2005 BCNF Relations

60 2005.10.11 - SLIDE 60IS 202 – FALL 2005 BCNF Added Capabilities Can now have a picture with no (identified) people in it Can have multiple objects, activities, and people associated with each picture

61 2005.10.11 - SLIDE 61IS 202 – FALL 2005 Fourth Normal Form Any relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial Eliminate non-trivial multivalued dependencies by projecting into simpler tables

62 2005.10.11 - SLIDE 62IS 202 – FALL 2005 Fifth Normal Form A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation Implies that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation

63 2005.10.11 - SLIDE 63IS 202 – FALL 2005 Fifth Normal Form Relations People Table

64 2005.10.11 - SLIDE 64IS 202 – FALL 2005 Normalizing to Death Normalization splits database information across multiple tables To retrieve complete information from a normalized database, the JOIN operation must be used JOIN tends to be expensive in terms of processing time, and very large joins are very expensive

65 2005.10.11 - SLIDE 65IS 202 – FALL 2005 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams –Database Design Relational Operations Normalization Discussion


Download ppt "2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational."

Similar presentations


Ads by Google