Download presentation
Presentation is loading. Please wait.
Published byCody Bennett Modified over 9 years ago
1
Database Development and Data Normalization
2
2 What is a Database and a DBMS? Database A collection of data stored in a standardized format, designed to be shared by multiple users. Database Management System Software that defines a database, stores the data, supports a query language, produces reports, and creates data entry screens.
3
3 Systems Life Cycle Series of steps used to manage the phases of development for an information system Phases are not necessarily sequential Each phase has a specific outcome and deliverable Individual companies use customized life cycle Consists of four phases: Project Management and Planning Systems Analysis Systems Design Systems Implementation and Operation
4
4 Phases of System Life Cycle Application Development Project Management & Planning Identify scope, costs, and schedule Transfer data, install, train, review Implementation/Operation Create forms, reports, and help; test Design * Define tables, relationships, forms, reports Analysis* Gather information from users Tasks Time * Critical for database development
5
5 1.Identify the business rules 2.Define tables and relationships 3.Create input forms and reports 4.Combine into an application Database Management System Application Design Steps
6
6 Definitions Relational database: A set of separate, related tables, with data elements that can be combined for queries and reports. Table: A set of data elements (cells) that describe an entity. It is organized by a set number of columns (attributes) and an indeterminate number of rows. Property (AKA attribute): a characteristic of a class or entity Every table has a Primary Key: A primary key insures uniqueness to the row (e.g., CustomerID, Product #,...) EmployeeIDTaxpayerIDLastNameFirstNameHomePhoneAddress 12512888-22-5552CartomAbdul(603) 323-9893252 South Street 15293222-55-3737VenetiaanRoland(804) 888-6667937 Paramaribo Lane 22343293-87-4343JohnsonJohn(703) 222-9384234 Main Street 29387837-36-2933StenheimSusan(410) 330-98378934 W. Maple Employee Properties Rows/Objects Class: Employee Primary key
7
7 Keys Primary key Usually every table (object) has a primary key Uniquely identifies a row (one-to-one) Can be concatenated (or composite) key Consists of multiple columns Associated with repeating relationships (1 : M or M : N) We often create a primary key to ensure uniqueness (e.g., CustomerID, Product #,...) called a surrogate key Key columns are underlined First step Collect user documents Identify possible keys: unique or repeating relationships
8
8 Notation Table name Primary key is underlined Table columns Customer (CustomerID, Phone, Name, Address, City, State, ZipCode) CustomerIDPhoneLastNameFirstNameAddressCityStateZipcode 1502-666-7777JohnsonMartha125 Main StreetAlvatonKY42122 2502-888-6464SmithJack873 Elm StreetBowling GreenKY42101 3502-777-7575WashingtonElroy95 Easy StreetSmith’s GroveKY42171 4502-333-9494AdamsSamuel746 Brown DriveAlvatonKY42122 5502-474-4746RabitzVictor645 White AvenueBowling GreenKY42102 6616-373-4746SteinmetzSusan15 Speedway DrivePortlandTN37148 7615-888-4474LasaterLes67 S. Ray DrivePortlandTN37148 8615-452-1162JonesCharlie867 Lakeside DriveCastalian SpringsTN37031 9502-222-4351ChavezJuan673 Industry Blvd.CaneyvilleKY42721 10502-444-2512RojoMaria88 Main StreetCave CityKY42127 Does anyone see a difference between the Table and the Table Notation?
9
9 Identifying Key Columns Orders OrderItems OrderIDDateCustomer 83675-5-046794 83685-6-049263 OrderIDItemQuantity 83672292 83672534 83678761 83685554 83682291 Each order has only one customer. So Customer is not part of the key. Each order has many items. Each item can appear on many orders. So OrderID and Item are both part of the key.
10
10 Surrogate Keys Real world keys sometimes cause problems in a database. Example: Customer Avoid phone numbers: people may not notify you when numbers change. Avoid SSN (privacy and most businesses are not authorized to ask for verification, so you could end up with duplicate values) Often best to let the DBMS generate unique values Access: AutoNumber SQL Server: Identity Oracle: Sequences (but require additional programming) Drawback: Numbers are not related to any business data, so the application needs to hide them and provide other look up mechanisms.
11
11 What is, and Why Do We Need, Data Normalization? A process in which tables of data are broken down and reconstructed to: Reduce vulnerability to data anomalies Data redundancy – if data is stored in 2 different tables, and we only change one table – data is inconsistent, this is called an update anomaly
12
12 Client Billing Example Client Billing Client (ClientID, Name, Address, BusinessType) Partner (PartnerID, Name, Speciality, Office, Phone) PartnerAssignment (PartnerID, ClientID, DateAcquired) Billing (ClientID, PartnerID, Date/Time, Item, Description, Hours, AmountBilled) Each partner can be assigned many clients. Each client can be assigned to many partners. Generally a partner is assigned many clients; however, A client is not often assigned many partners. Because of this, the Billing table has been added.
13
13 Client Billing--Different Rules Client(ClientID, Name, Address, BusinessType) Partner(PartnerID, Name, Speciality, Office, Phone) PartnerAssignment(PartnerID, ClientID, DateAcquired) Billing(ClientID, PartnerID, Date/Time, Item, Description, Hours, AmountBilled) combine Each client is assigned to only one partner. Cannot key PartnerID in PartnerAssignment (Not > 1) So, we combine Client and PartnerAssignment tables, since they have the same key.
14
14 Client Billing--New Assumptions ClientIDPartnerIDDate/TimeItemDescriptionHoursAmountBilled 1159638-4-04 10:03967Stress analysis2$500 2959678-5-04 11:15754New Design3$750 1159638-8-04 09:30967Stress analysis2.5$650 Billing More realistic assumptions for a large firm: Each Partner may work with many clients. Each client may work with many partners. Each partner and client may work together many times. The identifying feature is the date/time of the service. What happens if you do not include Date/Time as a key?
15
15 Sample: Video Database Repeating section Possible Keys
16
16 Initial Objects Customers Key: Assign a CustomerID Sample Properties Name Address Phone Videos Key: Assign a VideoID Sample Properties Title RentalPrice Rating Description RentalTransaction Event/Relationship Key: Assign TransactionID Sample Properties CustomerID Date VideosRented Event/Repeating list Keys: TransactionID + VideoID Sample Properties VideoCopy#
17
17 Initial Form Evaluation Collect forms from users Write down properties Find repeating groups (...) Look for potential keys Identify computed values Notation makes it easier to identify and solve problems RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID, Copy#, Title, Rent ) )
18
18 Problems with Repeating Sections RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID,Copy#, Title, Rent ) ) TransIDRentDateCustomerIDLastNamePhoneAddressVideoIDCopy#TitleRent 14/18/043Washington502-777-757595 Easy Street122001: A Space Odyssey$1.50 14/18/04 3Washington502-777-757595 Easy Street63Clockwork Orange$1.50 24/30/04 7Lasater615-888-447467 S. Ray Drive81Hopscotch$1.50 24/30/04 7Lasater615-888-447467 S. Ray Drive21Apocalypse Now$2.00 24/30/04 7Lasater615-888-447467 S. Ray Drive61Clockwork Orange$1.50 34/18/048Jones615-452-1162867 Lakeside Drive91Luggage Of The Gods$2.50 34/18/04 8Jones615-452-1162867 Lakeside Drive151Fabulous Baker Boys$2.00 34/18/04 8Jones615-452-1162867 Lakeside Drive41Boy And His Dog$2.50 44/18/043Washington502-777-757595 Easy Street31Blues Brothers$2.00 44/18/04 3Washington502-777-757595 Easy Street81Hopscotch$1.50 44/18/04 3Washington502-777-757595 Easy Street131Surf Nazis Must Die$2.50 44/18/043Washington502-777-757595 Easy Street171Witches of Eastwick$2.00 Repeating Section Causes duplication Storing data in this raw form would not work very well. For example, repeating sections will cause problems. Note the duplication of data. Also, what if a customer has not yet checked out a movie--where do we store that customer’s data?
19
19 Problems with Repeating Sections Name Phone Address City State ZipCode VideoIDCopy#TitleRent 1. 61Clockwork Orange1.50 2. 82Hopscotch1.50 3. 4. 5. {Unused Space} Not in First Normal Form At design time, do you know how many videos a customer will rent at one time? Storing repeating data How much space to allocate? Too little – you’ll be short Too much - Wasted space Customer Rentals Let us now put this in First Normal Form
20
20 First Normal Form RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID, Copy#, Title, Rent ) ) RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) RentalLine(TransID, VideoID, Copy#, Title, Rent ) 1NF - Remove repeating sections Split into two tables Bring key from main and repeating section RentalLine(TransID, VideoID, Copy#,...) Each transaction can have many videos (key VideoID) Each video can be rented on many transactions (key TransID) For each TransID and VideoID, only one Copy# (no key on Copy#)
21
21 First Normal Form Problems (Data) TransIDRentDateCustIDPhoneLastNameFirstNameAddressCityStateZipCode 14/18/043502-777-7575WashingtonElroy95 Easy StreetSmith's GroveKY42171 24/30/047615-888-4474LasaterLes67 S. Ray DrivePortlandTN37148 34/18/048615-452-1162JonesCharlie867 Lakeside DriveCastalian SpringsTN37031 44/18/043502-777-7575WashingtonElroy95 Easy StreetSmith's GroveKY42171 TransIDVideoIDCopy#TitleRent 1122001: A Space Odyssey$1.50 163Clockwork Orange$1.50 281Hopscotch$1.50 221Apocalypse Now$2.00 261Clockwork Orange$1.50 391Luggage Of The Gods$2.50 3151Fabulous Baker Boys$2.00 341Boy And His Dog$2.50 431Blues Brothers$2.00 481Hopscotch$1.50 4131Surf Nazis Must Die$2.50 4171Witches of Eastwick$2.00 1NF splits repeating groups Still have problems Replication Hidden dependency: If a video has not been rented yet, then what is its title?
22
22 Second Normal Form Definition Each non-key column must depend on the entire key. Only applies to concatenated keys Some columns only depend on part of the key Split those into a new table. Dependence (definition) If given a value for the key you always know the value of the property in question, then that property is said to depend on the key. If you change part of a key and the questionable property does not change, then the table is not in 2NF. RentalLine(TransID, VideoID, Copy#, Title, Rent) Depend only on VideoID Depends on both TransID and VideoID
23
23 Second Normal Form Example Title depends only on VideoID Each VideoID can have only one title Rent depends on VideoID This statement is actually a business rule. It might be different at different stores. Some stores might charge a different rent for each video depending on the day (or time). Each non-key column depends on the whole key. RentalLine(TransID, VideoID, Copy#, Title, Rent) VideosRented(TransID, VideoID, Copy#) Videos(VideoID, Title, Rent)
24
24 Second Normal Form Example (Data) TransIDVideoIDCopy# 112 163 221 261 281 341 391 3151 431 481 4131 4171 VideoIDTitleRent 12001: A Space Odyssey$1.50 2Apocalypse Now$2.00 3Blues Brothers$2.00 4Boy And His Dog$2.50 5Brother From Another Planet$2.00 6Clockwork Orange$1.50 7Gods Must Be Crazy$2.00 8Hopscotch$1.50 VideosRented (TransID, VideoID, Copy#) Videos (VideoID, Title, Rent) RentalForm2 (TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) (Unchanged)
25
25 Second Normal Form Problems (Data) TransIDRentDateCustIDPhoneLastNameFirstNameAddressCityStateZipCode 14/18/043502-777-7575WashingtonElroy95 Easy StreetSmith's GroveKY42171 24/30/047615-888-4474LasaterLes67 S. Ray DrivePortlandTN37148 34/18/048615-452-1162JonesCharlie867 Lakeside DriveCastalian SpringsTN37031 44/18/0423502-777-7575WashingtonElroy95 Easy StreetSmith's GroveKY42171 RentalForm2 (TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) Even in 2NF, problems remain Replication Hidden dependency If a customer has not rented a video yet, where do we store their personal data? Solution: split table.
26
26 Third Normal Form Definition RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) Each non-key column must depend on nothing but the key. Some columns depend on columns that are not part of the key. Split those into a new table. Example: Customer name does not change for every transaction. Dependence (definition) If given a value for the key you always know the value of the property in question, then that property is said to depend on the key. If you change the key and the questionable property does not change, then the table is not in 3NF. Depend only on CustomerID Depend on TransID
27
27 Third Normal Form Example Customer attributes depend only on Customer ID Split them into new table (Customer) Remember to leave CustomerID in Rentals table. We need to be able to reconnect tables. 3NF is sometimes easier to see if you identify primary objects at the start--then you would recognize that Customer was a separate object. RentalForm2 (TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) Rentals (TransID, RentDate, CustomerID ) Customer (CustomerID, Phone, Name, Address, City, State, ZipCode )
28
28 Third Normal Form Example Data TransIDRentDateCustomerID 14/18/04 3 24/30/04 7 34/18/048 44/18/043 CustomerIDPhoneLastNameFirstNameAddressCityStateZipCode 1502-666-7777JohnsonMartha125 Main StreetAlvatonKY42122 2502-888-6464SmithJack873 Elm StreetBowling GreenKY42101 3502-777-7575WashingtonElroy95 Easy StreetSmith's GroveKY42171 4502-333-9494AdamsSamuel746 Brown DriveAlvatonKY42122 5502-474-4746RabitzVictor645 White AvenueBowling GreenKY42102 6615-373-4746SteinmetzSusan15 Speedway DrivePortlandTN37148 7615-888-4474LasaterLes67 S. Ray DrivePortlandTN37148 8615-452-1162JonesCharlie867 Lakeside DriveCastalian SpringsTN37031 9502-222-4351ChavezJuan673 Industry Blvd.CaneyvilleKY42721 10502-444-2512RojoMaria88 Main StreetCave CityKY42127 Rentals (TransID, RentDate, CustomerID ) Customer (CustomerID, Phone, Name, Address, City, State, ZipCode ) VideosRented (TransID, VideoID, Copy#) Videos (VideoID, Title, Rent) RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode
29
29 Third Normal Form Tables (3NF) Rentals (TransID, RentDate, CustomerID ) Customer (CustomerID, Phone, Name, Address, City, State, ZipCode ) VideosRented (TransID, VideoID, Copy#) Videos (VideoID, Title, Rent) CustomerID Phone LastName FirstName Address City State ZipCode Customer TransID RentDate CustomerID Rentals TransID VideoID Copy# VideosRented VideoID Title Rent Videos 1 * 1 * * 1
30
30 3NF Rules/Procedure Split out repeating sections Be sure to include a key from the parent section in the new piece so the two parts can be recombined. Verify that the keys are correct Is each row uniquely identified by the primary key? Are one-to-many and many-to-many relationships correct? Check “many” for keyed columns and “one” for non-key columns. Make sure that each non-key column depends on the whole key and nothing but the key. No hidden dependencies.
31
Database Development and Data Normalization Any Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.