Presentation is loading. Please wait.

Presentation is loading. Please wait.

2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

Similar presentations


Presentation on theme: "2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002"— Presentation transcript:

1 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002 http://www.sims.berkeley.edu/academics/courses/is202/f02/ SIMS 202: Information Organization and Retrieval Lecture 14: Database Design

2 2002.10.17 - SLIDE 2IS 202 – FALL 2002 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Web-Enabled Databases

3 2002.10.17 - SLIDE 3IS 202 – FALL 2002 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Web-Enabled Databases

4 2002.10.17 - SLIDE 4IS 202 – FALL 2002 Models (1) Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

5 2002.10.17 - SLIDE 5IS 202 – FALL 2002 Database System Life Cycle Growth, Change, & Maintenance 6 Operations 5 Integration 4 Design 1 Conversion 3 Physical Creation 2

6 2002.10.17 - SLIDE 6IS 202 – FALL 2002 Another View of the Life Cycle Operations 5 Conversion 3 Physical Creation 2 Growth, Change 6 Integration 4 Design 1

7 2002.10.17 - SLIDE 7IS 202 – FALL 2002 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

8 2002.10.17 - SLIDE 8IS 202 – FALL 2002 Entity An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information –Persons (e.g.: customers in a business, employees, authors) –Things (e.g.: purchase orders, meetings, parts, companies) Employee

9 2002.10.17 - SLIDE 9IS 202 – FALL 2002 Attributes Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (This is the Metadata for the entities) Employee Last Middle First Name SSN Age Birthdate Projects

10 2002.10.17 - SLIDE 10IS 202 – FALL 2002 Relationships Relationships are the associations between entities They can involve one or more entities and belong to particular relationship types –One to One –One to Many –Many to Many

11 2002.10.17 - SLIDE 11IS 202 – FALL 2002 Relationships Class Attends Student Part Supplies project parts Supplier Project

12 2002.10.17 - SLIDE 12IS 202 – FALL 2002 Types of Relationships Concerned only with cardinality of relationship Truck Assigned EmployeeProject Assigned EmployeeProject Assigned Employee 11 n n 1 m Chen ER notation

13 2002.10.17 - SLIDE 13IS 202 – FALL 2002 More Complex Relationships Project Evaluation Employee Manager 1/n/n 1/1/1 n/n/1 Project Assigned Employee 4(2-10) 1 SSNProjectDate Manages Employee Manages Is Managed By 1 n

14 2002.10.17 - SLIDE 14IS 202 – FALL 2002 Weak Entities Owe existence entirely to another entity Order-line Contains Order Invoice # Part# Rep# QuantityInvoice#

15 2002.10.17 - SLIDE 15IS 202 – FALL 2002 Supertype and Subtype Entities Clerk Is one of Sales-rep Invoice Other Employee Sold Manages

16 2002.10.17 - SLIDE 16IS 202 – FALL 2002 Many to Many Relationships Employee Project Is Assigned Project Assignment Assigned SSN Proj# SSN Proj# Hours

17 2002.10.17 - SLIDE 17IS 202 – FALL 2002 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Web-Enabled Databases

18 2002.10.17 - SLIDE 18IS 202 – FALL 2002 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

19 2002.10.17 - SLIDE 19IS 202 – FALL 2002 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

20 2002.10.17 - SLIDE 20IS 202 – FALL 2002 Requirements Analysis Conceptual Requirements –Systems Analysis Process Examine all of the information sources used in existing applications Identify the characteristics of each data element –Numeric –Text –Date/time –Etc. Examine the tasks carried out using the information Examine results or reports created using the information

21 2002.10.17 - SLIDE 21IS 202 – FALL 2002 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

22 2002.10.17 - SLIDE 22IS 202 – FALL 2002 Conceptual Design Conceptual Model –Merge the collective needs of all applications –Determine what Entities are being used Some object about which information is to maintained –What are the Attributes of those entities? Properties or characteristics of the entity What attributes uniquely identify the entity –What are the Relationships between entities How the entities interact with each other?

23 2002.10.17 - SLIDE 23IS 202 – FALL 2002 Developing a Conceptual Model Overall view of the database that integrates all the needed information discovered during the requirements analysis Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details Can also be represented using other modeling tools (such as UML)

24 2002.10.17 - SLIDE 24IS 202 – FALL 2002 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

25 2002.10.17 - SLIDE 25IS 202 – FALL 2002 Logical Design Logical Model –How is each entity and relationship represented in the Data Model of the DBMS Hierarchic? Network? Relational? Object-Oriented?

26 2002.10.17 - SLIDE 26IS 202 – FALL 2002 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

27 2002.10.17 - SLIDE 27IS 202 – FALL 2002 Physical Design Internal Model –Choices of index file structure –Choices of data storage formats –Choices of disk layout

28 2002.10.17 - SLIDE 28IS 202 – FALL 2002 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model

29 2002.10.17 - SLIDE 29IS 202 – FALL 2002 Database Application Design External Model –User views of the integrated database –Making the old (or updated) applications work with the new database design

30 2002.10.17 - SLIDE 30IS 202 – FALL 2002 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Web-Enabled Databases

31 2002.10.17 - SLIDE 31IS 202 – FALL 2002 Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data Normalization is a multi-step process beginning with an “unnormalized” relation –Hospital example from Atre, S. Data Base: Structured Techniques for Design, Performance, and Management

32 2002.10.17 - SLIDE 32IS 202 – FALL 2002 Normal Forms First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)

33 2002.10.17 - SLIDE 33IS 202 – FALL 2002 Normalization Boyce- Codd and Higher Functional dependency of nonkey attributes on the primary key - Atomic values only Full Functional dependency of nonkey attributes on the primary key No transitive dependency between nonkey attributes All determinants are candidate keys - Single multivalued dependency

34 2002.10.17 - SLIDE 34IS 202 – FALL 2002 Unnormalized Relations First step in normalization is to convert the data into a two-dimensional table In unnormalized relations data can repeat within a column

35 2002.10.17 - SLIDE 35IS 202 – FALL 2002 Unnormalized Relations

36 2002.10.17 - SLIDE 36IS 202 – FALL 2002 First Normal Form To move to First Normal Form a relation must contain only atomic values at each row and column –No repeating groups –A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation

37 2002.10.17 - SLIDE 37IS 202 – FALL 2002 First Normal Form

38 2002.10.17 - SLIDE 38IS 202 – FALL 2002 1NF Storage Anomalies Insertion: A new patient has not yet undergone surgery -- hence no surgeon # -- Since surgeon # is part of the key we can’t insert Insertion: If a surgeon is newly hired and hasn’t operated yet -- there will be no way to include that person in the database Update: If a patient comes in for a new procedure, and has moved, we need to change multiple address entries Deletion (type 1): Deleting a patient record may also delete all info about a surgeon Deletion (type 2): When there are functional dependencies (like side effects and drug) changing one item eliminates other information

39 2002.10.17 - SLIDE 39IS 202 – FALL 2002 Second Normal Form A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key –That is, every nonkey attribute needs the full primary key for unique identification

40 2002.10.17 - SLIDE 40IS 202 – FALL 2002 Second Normal Form

41 2002.10.17 - SLIDE 41IS 202 – FALL 2002 Second Normal Form

42 2002.10.17 - SLIDE 42IS 202 – FALL 2002 Second Normal Form

43 2002.10.17 - SLIDE 43IS 202 – FALL 2002 1NF Storage Anomalies Removed Insertion: Can now enter new patients without surgery Insertion: Can now enter Surgeons who haven’t operated Deletion (type 1): If Charles Brown dies the corresponding tuples from Patient and Surgery tables can be deleted without losing information on David Rosen Update: If John White comes in for third time, and has moved, we only need to change the Patient table

44 2002.10.17 - SLIDE 44IS 202 – FALL 2002 2NF Storage Anomalies Insertion: Cannot enter the fact that a particular drug has a particular side effect unless it is given to a patient Deletion: If John White receives some other drug because of the penicillin rash, and a new drug and side effect are entered, we lose the information that penicillin can cause a rash Update: If drug side effects change (a new formula) we have to update multiple occurrences of side effects

45 2002.10.17 - SLIDE 45IS 202 – FALL 2002 Third Normal Form A relation is said to be in Third Normal Form if there is no transitive functional dependency between nonkey attributes –When one nonkey attribute can be determined with one or more nonkey attributes there is said to be a transitive functional dependency The side effect column in the Surgery table is determined by the drug administered –Side effect is transitively functionally dependent on drug so Surgery is not 3NF

46 2002.10.17 - SLIDE 46IS 202 – FALL 2002 Third Normal Form

47 2002.10.17 - SLIDE 47IS 202 – FALL 2002 Third Normal Form

48 2002.10.17 - SLIDE 48IS 202 – FALL 2002 2NF Storage Anomalies Removed Insertion: We can now enter the fact that a particular drug has a particular side effect in the Drug relation Deletion: If John White recieves some other drug as a result of the rash from penicillin, but the information on penicillin and rash is maintained Update: The side effects for each drug appear only once

49 2002.10.17 - SLIDE 49IS 202 – FALL 2002 Boyce-Codd Normal Form Most 3NF relations are also BCNF relations A 3NF relation is NOT in BCNF if: –Candidate keys in the relation are composite keys (they are not single attributes) –There is more than one candidate key in the relation, and –The keys are not disjoint, that is, some attributes in the keys are common

50 2002.10.17 - SLIDE 50IS 202 – FALL 2002 Most 3NF Relations Are Also BCNF – Is This One?

51 2002.10.17 - SLIDE 51IS 202 – FALL 2002 BCNF Relations

52 2002.10.17 - SLIDE 52IS 202 – FALL 2002 Fourth Normal Form Any relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial Eliminate non-trivial multivalued dependencies by projecting into simpler tables

53 2002.10.17 - SLIDE 53IS 202 – FALL 2002 Fifth Normal Form A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation Implies that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation

54 2002.10.17 - SLIDE 54IS 202 – FALL 2002 Normalizing to Death Normalization splits database information across multiple tables To retrieve complete information from a normalized database, the JOIN operation must be used JOIN tends to be expensive in terms of processing time, and very large joins are very expensive

55 2002.10.17 - SLIDE 55IS 202 – FALL 2002 Lecture Overview Review –Databases and Database Design –Database Life Cycle –ER Diagrams Database Design Normalization Web-Enabled Databases

56 2002.10.17 - SLIDE 56IS 202 – FALL 2002 Overview Why use a database system for Web design and e-commerce? What systems are available? Pros and Cons of different Web database systems? Text retrieval in database systems Search engines for Intranet and Intrasite searching

57 2002.10.17 - SLIDE 57IS 202 – FALL 2002 Why Use a Database System? Simple Web sites with only a few pages don’t need much more than static HTML files

58 2002.10.17 - SLIDE 58IS 202 – FALL 2002 Simple Web Applications Server Web Server Internet Files Clients

59 2002.10.17 - SLIDE 59IS 202 – FALL 2002 Adding Dynamic Content to the Site Small sites can often use simple HTML and CGI scripts accessing data files to create dynamic content for small sites

60 2002.10.17 - SLIDE 60IS 202 – FALL 2002 Dynamic Web Applications 1 Server CGI Web Server Internet Files Clients

61 2002.10.17 - SLIDE 61IS 202 – FALL 2002 Issues For Scaling Up Web Applications Performance Scalability Maintenance Data integrity Transaction support

62 2002.10.17 - SLIDE 62IS 202 – FALL 2002 Why Use a Database System? Database systems have concentrated on providing solutions for all of these issues for scaling up Web applications –Performance –Scalability –Maintenance –Data integrity –Transaction support While systems differ in their support, most offer some support for all of these

63 2002.10.17 - SLIDE 63IS 202 – FALL 2002 Dynamic Web Applications 2 Server database CGI DBMS Web Server Internet Files Clients database

64 2002.10.17 - SLIDE 64IS 202 – FALL 2002 Server Interfaces Adapted from John P. Ashenfelter, Choosing a Database for Your Web Site DatabaseWeb Server Web Application Server Web DB App HTML JavaScript DHTML CGI Web Server API’s ColdFusion PhP Perl Java ASP SQL ODBC Native DB interfaces JDBC Native DB Interfaces

65 2002.10.17 - SLIDE 65IS 202 – FALL 2002 Web Application Server Software ColdFusion PHP ASP All of these are server-side scripting languages that embed code in HTML pages

66 2002.10.17 - SLIDE 66IS 202 – FALL 2002 ColdFusion Developing WWW sites typically involved a lot of programming to build dynamic sites –E.g., pages generated as a result of catalog searches, etc. ColdFusion was designed to permit the construction of dynamic Web sites with only minor extensions to HTML through a DBMS interface

67 2002.10.17 - SLIDE 67IS 202 – FALL 2002 What ColdFusion Is Good For Putting up databases onto the Web Handling dynamic databases (frequent updates, etc.) Making databases searchable and updateable by users

68 2002.10.17 - SLIDE 68IS 202 – FALL 2002 CFML ColdFusion Markup Language Read data from and update data to databases and tables Create dynamic data-driven pages Perform conditional processing Populate forms with live data Process form submissions Generate and retrieve email messages Perform HTTP and FTP function Perform credit card verification and authorization Read and write client-side cookies

69 2002.10.17 - SLIDE 69IS 202 – FALL 2002 Templates Assume we have a database named contents_of_my_shopping_cart.mdb -- single table called contents... Create an HTML page (uses extension.cfm), before... SELECT * FROM contents ;

70 2002.10.17 - SLIDE 70IS 202 – FALL 2002 Contents of My Shopping Cart Contents of My Shopping Cart #Item# #Date_of_item# $#Price# Templates (cont.)

71 2002.10.17 - SLIDE 71IS 202 – FALL 2002 Contents of My Shopping Cart Bouncy Ball with Psychedelic Markings 12 December 1998 $0.25 Shiny Blue Widget 14 December 1998 $2.53 Large Orange Widget 14 December 1998 $3.75 Templates (cont.)

72 2002.10.17 - SLIDE 72IS 202 – FALL 2002 CFIF and CFELSE Item: #Item#

73 2002.10.17 - SLIDE 73IS 202 – FALL 2002 Photo Browser The current photo browser uses a combination of –Javascript for expandable hierarchies –Database in MS Access –ColdFusion to search the database when one of the facets is selected The database design for the photo database currently looks like…

74 2002.10.17 - SLIDE 74IS 202 – FALL 2002 Photo Browser ER

75 2002.10.17 - SLIDE 75IS 202 – FALL 2002 Photo Database Lets look at the photo database in the Access interface –Multi-Facet queries –Queries for multiple descriptors in the same facet (harder)

76 2002.10.17 - SLIDE 76IS 202 – FALL 2002 Assignment 7 (Database Design) Involves –Examining a Web Site (probably) using a DBMS for E-commerce to sell books –Inferring the structure and kinds of entities and attributes used in that site (book info only) –Creating your own design using ER diagrams showing the entities and relationships that you inferred

77 2002.10.17 - SLIDE 77IS 202 – FALL 2002 Next Week Introduction to Information Retrieval


Download ppt "2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002"

Similar presentations


Ads by Google