Presentation is loading. Please wait.

Presentation is loading. Please wait.

VCE IT Theory Slideshows - ITA By Mark Kelly McKinnon Secondary College Vceit.com Updated by Jenny Gielb Chisholm Institute of TAFE, Dandenong Database.

Similar presentations


Presentation on theme: "VCE IT Theory Slideshows - ITA By Mark Kelly McKinnon Secondary College Vceit.com Updated by Jenny Gielb Chisholm Institute of TAFE, Dandenong Database."— Presentation transcript:

1 VCE IT Theory Slideshows - ITA By Mark Kelly McKinnon Secondary College Vceit.com Updated by Jenny Gielb Chisholm Institute of TAFE, Dandenong Database Normalisation Version 1

2 Contents What is normalisation? Why normalise? Normal forms 1,2,3

3 What is normalisation? Organising the data in a relational database so… – Data repetition is minimised – Data access is maximised

4 Why normalise? Removing data repetition saves lots of storage space, speeds up data access and reduces errors. Changes need only be made in one place rather than in many places. More powerful data access is possible Allows more information to be easily stored Allows users to get all sorts of information out of the stored data i.e. How many widgets did we sell last month?

5 The normal forms Are called 1NF (first normal form) to 5NF, but only 1-3 matter here. Are guidelines (not laws) for structuring database tables and fields. Note: they are often applied instinctively as part of skilled database design, and are not an extra step to do after databases are created. REMEMBER – 1 st and 2 nd normal forms are stage/steps to achieving the objective, which is 3 rd normal form

6 History of Data Storage Techniques Data first stored as records only, everything on one line, usually on a tape – Sequential To get to a certain record you had to read through all the other records first, and start at the beginning each time. Took forever!

7 History of Data Storage Techniques Hard disks, and indexing, allowed businesses to store data more effectively. The data can then be stored in different areas on the hard disk and an index used to access it

8 Database Indexes Indexes become very important An index is a list that records where everything is placed on the hard disks – The disk/platter – The track number – The section of track

9 Database Indexes This meant that data could be stored anywhere on hard disks, it didn’t have to all be together The Index would find the required data no matter what information you entered Also, computers were getting much faster, so accessing this data was much faster and easier so they could make more complex indexes Have you ever looked up the index of a recipe book? You can look up Chocolate Sponge cake under Chocolate and Cakes

10 Database Indexes

11 Hierarchical databases The first types of databases

12 Hierarchical databases Data flowed from top to bottom. To get the price of cucumber, you had to know that it was Produce. Slow, could only answer a few questions and needed complex programs to use them Could not answer the question ‘What aisle are the lettuces in? Quicker to go find a shelf- packer.

13 Relational Databases Then someone (Edgar F Codd?) invented a more complex indexing system that: allowed access to all the data from any angle, used codes to link tables together, used ‘relationships’ to show the links between tables

14 Relational Databases To answer the question – What aisle are Lebanese cucumbers in? The database uses the Item Type Code to look for the Contents Code to get the answer – Produce in Aisle 1

15 The Challenge The challenge is to get data into these meaningful, organised groupings The data that you, as a programmer, will be presented with, will be in a mess! If you are lucky important information will be in spread sheets, but it could be in files, hand written on scraps of paper, stuck on the side of the filing cabinet, even on the back of the office toilet door!

16 Steps 1.Collect all the data 2.Find out what information the users want from the data 3.Design the database 4.Organise the data: – Break it down into meaningful groups of data – Work out your linking codes so that each table points to another one – Work out which data is being changed all the time and which data is changed rarely As you organise the data, you usually go through stages – these are called normalising the data.

17 The Normal Forms First Normal Form (1NF) – fields split up properly Second Normal Form (2NF) – first stage of breaking up the data into meaningful groupings called tables, some codes used Third Normal Form (3NF) – data completely broken up into tables and linked by codes

18 1NF

19 First Normal Form - sets the most basic rules for an organised database The 1NF guidelines are common sense. 1.Eliminate duplicate data where possible 2.Break up fields so only one data item is in each field 3.Convert any data into correct format 4.Start to organise the data into meaningful groupings

20 Things 1NF wants No duplicate rows (records). Each row must be unique in some way. Each field entry can only contain one piece of data. – A name field containing “Fred Smith” has surname and first name, violating 1NF. – A phone number field with more than one phone number entered for a person

21 Things 1NF wants Each field entry can only contain one piece of data. Why? You cannot easily access the data embedded in the single field (e.g. grab a postcode) You can’t use embedded data for sorting You can’t use data like “2kg” as a number for calculations, sorting, summaries etc.

22 Your turn… repair this! Customer IDNamePhone 111Fred Smith4566 3456 222Mary Jones4567 8900 333Tim Blogs3254 5676

23 Repaired! Customer IDFirstNameSurname 111FredSmith 222MaryJones 333TimBlogs Now, customers can be sorted and searched by first name and/or surname separately. Also, the names can be used individually, like “Dear Fred” instead of “Dear Fred Smith”

24 Repair This! Product IDColourWeight A345Red4kg A568Blue300g B695White1.5kg

25 Repaired! Product IDColourWeight (g) A345Red4000 A568Blue300 B695White1500

26 Repair This! An address like “3 Fred St, Sale, 3586” has 3 pieces of data: street address, town, postcode. Customer IDAddress 11166 Lake Rd, Mentone, 3198 2222/45 Richmond Lane, Richmond, 3121 333135 Spring St, Melbourne, 3000

27 Repaired! Now each field can be searched & sorted and used individually (e.g. addressing envelopes) Customer IDStreetSuburbPostcode 11166 Lake RdMentone3198 2222/45 Richmond LaneRichmond3121 333135 Spring StMelbourne3000

28 2NF

29 2NF – Second Normal Form Achieving 2NF means 1NF has already been achieved Each normal form builds on the previous forms Removes more duplicate data. Deals with design problems that could threaten data integrity.

30 2NF – Second Normal Form Remove subsets of data that apply to multiple rows of a table and place them in separate tables. Create relationships between these new tables and their predecessors using unique keys.

31 CUSTOMER Customer IDNamePhone 111Fred Smith4566 3456 222Mary Jones4567 8900 (BH) 3456 2314 (AH) 333Tim Blogs3254 5676 0402 697 495 Raw data

32 First normal form… Repetition removed Fields broken up but… Customer IDLast NameFirst NamePhone1Phone2 111SmithFred4566 3456 222 JonesMary4567 89003456 2314 333BlogsTim3254 56760402 697 495

33 Problems: Trouble querying the table: “Which customer has phone # 3456 2314?” Have to search more than 1 field… messy. Can’t enforce validation rules to prevent duplicate phone #s Can’t enter three or more phone numbers Waste of space for all people with only 1 number If Mary Jones got married and changed her name, changes would need to be made in more than one record. If one change were missed, the integrity of the data would be damaged. Making multiple changes like this is also time-consuming and repetitious, thereby eating up storage space. Solution: Put the phone numbers into their own table as there can be more than one phone number for each name.

34 2 nd Normal Form (2NF) CUSTOMER PHONE TABLE Customer IDPhone 1114566 3456 2224567 8900 2223456 2314 3333254 5676 3330402 697 495 Customer IDLast NameFirst Name 111SmithFred 222 JonesMary 333BlogsTim Relationship Called a ‘1 to many relationship’ One customer record to many phone numbers Also written as 1:many or 1: ∞

35 Database Design 1 ∞ The design would be drawn like this

36 Benefits: Name changes now only need to be made once. Unlimited phone numbers for everyone! No need to search multiple Phone fields No need to search through all text to extract a particular phone number All we need is a 1:many relationship between customer name table and customer phone table using the Customer ID as the key field. 2 nd Normal Form (2NF)

37 Without 2NF: flat fileWith 2NF: relational Department data is only stored once. So: Less storage space required Department changes now only made once, not once for each worker in that dept! Another example

38 2NF The table above is a problem. Let’s say {Model Full Name} is the primary key. The {Manufacturer Country} field is based on the {Manufacturer} field, and will need to be constantly updated if manufacturers change their location. To be properly 2NF, you’d need to do this… Electric Toothbrush Models ManufacturerModelModel Full NameManufacturer Country ForteX-PrimeForte X-PrimeItaly ForteUltracleanForte UltracleanItaly Dent-o-FreshEZBrushDent-o-Fresh EZBrushUSA KobayashiSR=60Koboyashi ST-60Japan HochToothmasterHoch ToothmasterGermany HochX-PrimeHoch X-PrimeGermany

39 2NF ManufacturerManufacturer Country ForteItaly ForteItaly Dent-o-FreshUSA KobayashiJapan HochGermany HochGermany ModelModelFullName X-PrimeForte X-Prime UltracleanForte Ultraclean EZBrushDent-o-Fresh EZBrush ST-60Koboyashi ST-60 ToothmasterHoch Toothmaster X-PrimeHoch X-Prime Now the data is grouped – Manufacturer details in one table, Model details in the other, BUT how do you know which manufacturer makes which model now?

40 2NF Make the same key fields in each table ManufacturerManufacturer Country ForteItaly Dent-o-FreshUSA KobayashiJapan HochGermany ManufacturerModelModelFullName ForteX-PrimeForte X-Prime ForteUltracleanForte Ultraclean Dent-o-FreshEZBrushDent-o-Fresh EZBrush KobayashiSR=60Koboyashi ST-60 HochToothmasterHoch Toothmaster HochX-PrimeHoch X-Prime Set up the relationship between the key fields in each table

41 3NF

42 Third normal form (3NF) goes one step further Use codes to minimize the amount of storage Use codes as links to other tables so can find any information Sets up relationships between tables In each table only need to have fields that are dependant on the primary key Also divides data as reference and transaction data.

43 Using the previous example - 2NF ManufacturerManufacturer Country ForteItaly Dent-o-FreshUSA KobayashiJapan HochGermany ManufacturerModelModel Full Name ForteX-PrimeForte X-Prime ForteUltracleanForte Ultraclean Dent-o-FreshEZBrushDent-o-Fresh EZBrush KobayashiSR=60Koboyashi ST-60 HochToothmasterHoch Toothmaster HochX-PrimeHoch X-Prime

44 3NF To get it to 3 rd normal form, replace repeating data with codes. MCodeManufacturerManufacturer Country 1ForteItaly 2Dent-o-FreshUSA 3KobayashiJapan 4HochGermany MCodeModelModelFullName 1X-PrimeForte X-Prime 1UltracleanForte Ultraclean 2EZBrushDent-o-Fresh EZBrush 3ST-60Koboyashi ST-60 4ToothmasterHoch Toothmaster 4X-PrimeHoch X-Prime

45 Reference and Transaction Data All data can be classified as either reference data or transaction data Reference Data is data that rarely changes and is ‘referred’ to (or used in lookups): people’s names addresses Products Starts with a unique code that is used in other tables

46 Reference and Transaction Data Transaction Data is data that is regularly changed (edit, add or delete) when a customer buys something, when someone withdraws money, when someone wins a tournament. Usually has a unique code, a date, and information about the transaction, i.e. the purchase price and who made the purchase. Uses the codes set up in Reference Data tables

47 3NF Field name underlining indicates key fields. You may have a gut feeling that this table is not good. But why?

48 3NF Each attribute (‘field’) should be giving information about the key field (a particular tournament + year).

49 3NF This is wrong because the DOB does not describe the key field (tournament). It describes a looked-up value (the tournament’s winner).

50 3NF FAIL It’s like your mum keeping her knickers in your sock drawer because you’re related to her. They don’t belong there!

51

52 Raw Data

53 1NF First NameLast NameDOBTournamentYear ChipMasterton14/03/1977Indiana Invitational1999 AlFredrickson21/07/1975Indiana Invitational1998 BobAlbertson28/09/1968Cleveland Open1999 AlFredrickson21/07/1975Des Moines Masters1999 Data broken up into separate fields Date of birth converted into proper format

54 2NF Data grouped but … Data is still repeated Player CodeFirst NameLast NameDOB 1ChipMasterton14/03/1977 2AlFredrickson21/07/1975 3BobAlbertson28/09/1968 Player Phone Numbers Tournament Winners Player CodeFirst NameLast NameTournamentYear 1ChipMastertonIndiana Invitational1999 2AlFredricksonIndiana Invitational1998 3BobAlbertsonCleveland Open1999 2AlFredricksonDes Moines Masters1999

55 3NF Data grouped meaningfully - Tournaments, Players, Winners No repeating data Codes used to link tables Relationships created TournamentCodeTournament 1Indiana Invitational 2Cleveland Open 3Des Moines Masters Player CodeTournamentCodeYear 111999 211998 231999 32 Player CodeFirst NameLast NameDOB 1ChipMasterton14/03/1977 2AlFredrickson21/07/1975 3BobAlbertson28/09/1968 Tournaments Players Tournament Winners

56 Reference and Transaction Data Transaction Data – The Tournaments Winners tables is regularly updated, every time someone wins a tournament Reference Data – The Players table only changes when someone else joins or leaves a tournament – The Tournaments table changes when the tournament name changes or new tournaments are added or deleted.

57 Reference Data TournamentCodeTournament 1Indiana Invitational 2Cleveland Open 3Des Moines Masters Player CodeFirst NameLast NameDOB 1ChipMasterton14/03/1977 2AlFredrickson21/07/1975 3BobAlbertson28/09/1968 Player CodeTournamentCodeYear 111999 211998 231999 32 Transaction Data Unique code Lookup data Changed rarely Uses codes from reference data Has extra information about event Changes frequently Tournaments Players Tournament Winners table

58 Entering the data Don’t worry about the logistics of putting the codes into the data yet. This is dealt with later in the program.

59 Normalise this data Bounces Online Books NameAddressBook purchasedItem CostDate of purchaseQuantityTotal Cost Tom Jones56 Latrobe Street,Melbourne, VIC 3000The Girl in the Hornet's Nest$24.9508/03/20111$24.95 Tom Jones65 Latrobe Street,Melbourne, VIC 3000Curiosity Killed the Cat$14.9508/03/20111$14.95 Mary Small236 Smith Street, Collingwood VIC 3002Lord of the Necklaces$18.9510/03/20112$37.90 Mary Small237 Smith Street, Collingwood VIC 3002The Girl in the Hornet's Nest$24.9510/03/20111$24.95 Fred Blogs45 High Street, Sydney, NSW, 2000The Hobby$13.9512/03/20112$27.90 Fred Blogs45 High Street, Sydney, NSW, 2000Lord of the Necklaces$24.9512/03/20111$24.95 Fred Blogs45 High Street, Newcastle, NSW, 2000The Girl in the Hornet's Nest$24.9512/03/20111$24.95

60 First stage - 1NF First Name Last NameAddress1 Addre ss2SuburbStatePostcodeBook purchasedItem Cost Date of purchaseQuantityTotal Cost Tom Jones 56 Latrobe Street MelbourneVIC3000 The Girl in the Hornet's Nest$24.9508/03/20111$24.95 Tom Jones 65 Latrobe Street MelbourneVIC3000 Curiosity Killed the Cat$14.9508/03/20111$14.95 Mary Small 236 Smith StreetCollingwoodVIC3002Lord of the Necklaces$18.9510/03/20112$37.90 Mary Small 236 Smith StreetCollingwoodVIC3002 The Girl in the Hornet's Nest$24.9510/03/20111$24.95 Fred Blogs45 High StreetSydneyNSW2000The Hobby$13.9512/03/20112$27.90 Fred Blogs45 High StreetSydneyNSW2000Lord of the Necklaces$24.9512/03/20111$24.95 Fred Blogs45 High StreetSydneyNSW2000 The Girl in the Hornet's Nest$24.9512/03/20111$24.95

61 Second Stage – 2NF CustomerCode First NameLast NameAddress1Address2SuburbStatePostcode 116Tom Jones 56 Latrobe Street MelbourneVIC3000 457Mary Small236 Smith StreetCollingwoodVIC3002 890Fred Blogs45 High StreetSydneyNSW2000 CustomerCodeBook purchasedItem Cost Date of purchaseQuantityTotal Cost 116 The Girl in the Hornet's Nest$24.9508/03/20111$24.95 116Curiosity Killed the Cat$14.9508/03/20111$14.95 457Lord of the Necklaces$18.9510/03/20112$37.90 457 The Girl in the Hornet's Nest$24.9510/03/20111$24.95 890The Hobby$13.9512/03/20112$27.90 890Lord of the Necklaces$24.9512/03/20111$24.95 890 The Girl in the Hornet's Nest$24.9512/03/20111$24.95 Customer table Books Purchased table

62 Third Stage - 3NF Customer Table CustomerCode First NameLast NameAddress1Address2SuburbStatePostcode 116Tom Jones56 Latrobe Street MelbourneVIC3000 457Mary Small236 Smith StreetCollingwoodVIC3002 890Fred Blogs45 High StreetSydneyNSW2000 Purchases Table CustomerCodeBookCodeDate of purchaseQuantityTotal 116108/03/20111$24.95 1161508/03/20111$14.95 4573610/03/20112$37.90 457110/03/20111$24.95 890412/03/20112$27.95 8903612/03/20111$28.95 890112/03/20111$24.95 Books Table BookCodeBook NameGenreItem Cost 1The Girl in the Hornet's NestMurder Mystery$24.95 15Curiosity Killed the CatRomance$14.95 36Lord of the NecklacesFantasy$18.95 4The HobbyFantasy$13.95

63 Reference and Transaction Data Which tables are Reference Data tables? – Customer table – Book table Which table is a Transaction data table? – Purchases table

64 The front-end screen would look something like this: Purchases data entered into the Transaction table, with drop-down lists which use data from the Reference Data tables

65 In other words Let X → A be a nontrivial FD (i.e. one where X does not contain A) and let A be a non-key attribute. Also let Y be a key of R. Then Y → X. Therefore A is not transitively dependent on Y if and only if X → Y, that is, if and only if X is a superkey. ’kay?

66 By Mark Kelly McKinnon Secondary College vceit.com These slideshows may be freely used, modified or distributed by teachers and students anywhere on the planet (but not elsewhere). They may NOT be sold. They must NOT be redistributed if you modify them. VCE IT THEORY SLIDESHOWS


Download ppt "VCE IT Theory Slideshows - ITA By Mark Kelly McKinnon Secondary College Vceit.com Updated by Jenny Gielb Chisholm Institute of TAFE, Dandenong Database."

Similar presentations


Ads by Google