VCE IT Theory Slideshows - ITA By Mark Kelly McKinnon Secondary College Vceit.com Updated by Jenny Gielb Chisholm Institute of TAFE, Dandenong Database.

Slides:



Advertisements
Similar presentations
Microsoft® Access® 2010 Training
Advertisements

Organisation Of Data (1) Database Theory
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Topic Database Normalisation S McKeever Advanced Databases 1.
Normalisation Ensuring data integrity in database design 1.
VCE IT Theory Slideshows - ITA By Mark Kelly McKinnon Secondary College Vceit.com Database Normalisation Version 1.
Topic Denormalisation S McKeever Advanced Databases 1.
PowerPoint Presentation for Dennis, Wixom & Tegarden Systems Analysis and Design Copyright 2001 © John Wiley & Sons, Inc. All rights reserved. Slide 1.
The Relational Database Model:
Database Design Concepts Info 1408 Lecture 2 An Introduction to Data Storage.
Database Design Concepts Info 1408 Lecture 2 An Introduction to Data Storage.
Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.
1 DATABASE TECHNOLOGIES BUS Abdou Illia, Fall 2007 (Week 3, Tuesday 9/4/2007)
VCE IT Theory Slideshows
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
Cambridge TEC - Level 3 Certificate/Diploma IT. ICT Dept ScenarioLO1LO2LO3.
Microsoft Access Understanding Relationships Academic Health Center Training (352)
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
U3O2: Structure & Role of Relational Databases
PHP meets MySQL.
Data and its manifestations. Storage and Retrieval techniques.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design Copyright 2000 © John Wiley & Sons, Inc. All rights reserved. Slide 1 Systems.
Module III: The Normal Forms. Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form. The database.
Database Normalization Lynne Weldon July 17, 2000.
CORE 2: Information systems and Databases NORMALISING DATABASES.
Access 2013 Microsoft Access 2013 is a database application that is ideal for gathering and understanding data that’s been collected on just about anything.
Copyright © 2005 Ed Lance Fundamentals of Relational Database Design By Ed Lance.
Database What is a database? A database is a collection of information that is typically organized so that it can easily be storing, managing and retrieving.
What's a Database A Database Primer Let’s discuss databases n Why they are hard n Why we need them.
Databases. What is a database?  A database is used to store data. The word DATA is actually Latin for FACTS. A database is, therefore, a place, or thing.
IT Applications Theory Slideshows By Mark Kelly Vceit.com Last modified : 6 Dec 2013 Databases II: Structure, naming, data types, data formats.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
1 U3O2: Structure & Role of Relational Databases Flat File Vs Relational Database Refer to the software as “database management system (DBMS)” –E.g. Access,
Database collection of related information stored in an organized form Database program software tool for storage & retrieval of that information.
M1G Introduction to Database Development 4. Improving the database design.
Maintaining a Database Access Project 3. 2 What is Database Maintenance ?  Maintaining a database means modifying the data to keep it up-to-date. This.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
IT Applications Theory Slideshows Databases II: Structure, Naming, data types, data formats.
A337 - Reed Smith1 Structure What is a database? –Table of information Rows are referred to as records Columns are referred to as fields Record identifier.
Use of ICT in Data Management AS Applied ICT. Back to Contents Back to Contents.
Databases 101 © Dolinski What you will learn How relational databases work What are the components that make up a database How to create each component.
VCE IT Theory Slideshows by Mark Kelly study design By Mark Kelly, vceit.com, Begin.
Normalisation RELATIONAL DATABASES.  Last week we looked at elements of designing a database and the generation of an ERD  As part of the design and.
Flat Files Relational Databases
Normalisation 1NF to 3NF Ashima Wadhwa. In This Lecture Normalisation to 3NF Data redundancy Functional dependencies Normal forms First, Second, and Third.
Databases Flat Files & Relational Databases. Learning Objectives Describe flat files and databases. Explain the advantages that using a relational database.
Software. Because databases can get very big, it is important to decide exactly what is going to be stored in each field. Fields can be text, number,
VCE IT Theory Slideshows by Mark Kelly study design By Mark Kelly, vceit.com, Begin.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
1 Files and databases Suppose a school stores information about its students on record cards. Each student has their own card; this is their record. Record.
Normalisation FORM RULES 1NF 2NF 3NF. What is normalisation of data? The process of Normalisation organises your database to: Reduce or minimise redundant.
Database Normalization. What is Normalization Normalization allows us to organize data so that it: Normalization allows us to organize data so that it:
NORMALISATION OF DATABASES. WHAT IS NORMALISATION? Normalisation is used because Databases need to avoid have redundant data, which makes it inefficient.
N5 Databases Notes Information Systems Design & Development: Structures and links.
VCE IT Theory Slideshows
U3O2: Structure & Role of Relational Databases
Database Normalization
Database Normalisation VCE IT Theory Slideshows - Informatics
VCE IT Theory Slideshows by Mark Kelly study design
Chapter 4.1 V3.0 Napier University Dr Gordon Russell
Relational Database Model
Databases Software This icon indicates the slide contains activities created in Flash. These activities are not editable. For more detailed instructions,
VCE IT Theory Slideshows
Flat Files & Relational Databases
Spreadsheets, Modelling & Databases
Normalisation 1 Unit 3.1 Dr Gordon Russell, Napier University
Presentation transcript:

VCE IT Theory Slideshows - ITA By Mark Kelly McKinnon Secondary College Vceit.com Updated by Jenny Gielb Chisholm Institute of TAFE, Dandenong Database Normalisation Version 1

Contents What is normalisation? Why normalise? Normal forms 1,2,3

What is normalisation? Organising the data in a relational database so… – Data repetition is minimised – Data access is maximised

Why normalise? Removing data repetition saves lots of storage space, speeds up data access and reduces errors. Changes need only be made in one place rather than in many places. More powerful data access is possible Allows more information to be easily stored Allows users to get all sorts of information out of the stored data i.e. How many widgets did we sell last month?

The normal forms Are called 1NF (first normal form) to 5NF, but only 1-3 matter here. Are guidelines (not laws) for structuring database tables and fields. Note: they are often applied instinctively as part of skilled database design, and are not an extra step to do after databases are created. REMEMBER – 1 st and 2 nd normal forms are stage/steps to achieving the objective, which is 3 rd normal form

History of Data Storage Techniques Data first stored as records only, everything on one line, usually on a tape – Sequential To get to a certain record you had to read through all the other records first, and start at the beginning each time. Took forever!

History of Data Storage Techniques Hard disks, and indexing, allowed businesses to store data more effectively. The data can then be stored in different areas on the hard disk and an index used to access it

Database Indexes Indexes become very important An index is a list that records where everything is placed on the hard disks – The disk/platter – The track number – The section of track

Database Indexes This meant that data could be stored anywhere on hard disks, it didn’t have to all be together The Index would find the required data no matter what information you entered Also, computers were getting much faster, so accessing this data was much faster and easier so they could make more complex indexes Have you ever looked up the index of a recipe book? You can look up Chocolate Sponge cake under Chocolate and Cakes

Database Indexes

Hierarchical databases The first types of databases

Hierarchical databases Data flowed from top to bottom. To get the price of cucumber, you had to know that it was Produce. Slow, could only answer a few questions and needed complex programs to use them Could not answer the question ‘What aisle are the lettuces in? Quicker to go find a shelf- packer.

Relational Databases Then someone (Edgar F Codd?) invented a more complex indexing system that: allowed access to all the data from any angle, used codes to link tables together, used ‘relationships’ to show the links between tables

Relational Databases To answer the question – What aisle are Lebanese cucumbers in? The database uses the Item Type Code to look for the Contents Code to get the answer – Produce in Aisle 1

The Challenge The challenge is to get data into these meaningful, organised groupings The data that you, as a programmer, will be presented with, will be in a mess! If you are lucky important information will be in spread sheets, but it could be in files, hand written on scraps of paper, stuck on the side of the filing cabinet, even on the back of the office toilet door!

Steps 1.Collect all the data 2.Find out what information the users want from the data 3.Design the database 4.Organise the data: – Break it down into meaningful groups of data – Work out your linking codes so that each table points to another one – Work out which data is being changed all the time and which data is changed rarely As you organise the data, you usually go through stages – these are called normalising the data.

The Normal Forms First Normal Form (1NF) – fields split up properly Second Normal Form (2NF) – first stage of breaking up the data into meaningful groupings called tables, some codes used Third Normal Form (3NF) – data completely broken up into tables and linked by codes

1NF

First Normal Form - sets the most basic rules for an organised database The 1NF guidelines are common sense. 1.Eliminate duplicate data where possible 2.Break up fields so only one data item is in each field 3.Convert any data into correct format 4.Start to organise the data into meaningful groupings

Things 1NF wants No duplicate rows (records). Each row must be unique in some way. Each field entry can only contain one piece of data. – A name field containing “Fred Smith” has surname and first name, violating 1NF. – A phone number field with more than one phone number entered for a person

Things 1NF wants Each field entry can only contain one piece of data. Why? You cannot easily access the data embedded in the single field (e.g. grab a postcode) You can’t use embedded data for sorting You can’t use data like “2kg” as a number for calculations, sorting, summaries etc.

Your turn… repair this! Customer IDNamePhone 111Fred Smith Mary Jones Tim Blogs

Repaired! Customer IDFirstNameSurname 111FredSmith 222MaryJones 333TimBlogs Now, customers can be sorted and searched by first name and/or surname separately. Also, the names can be used individually, like “Dear Fred” instead of “Dear Fred Smith”

Repair This! Product IDColourWeight A345Red4kg A568Blue300g B695White1.5kg

Repaired! Product IDColourWeight (g) A345Red4000 A568Blue300 B695White1500

Repair This! An address like “3 Fred St, Sale, 3586” has 3 pieces of data: street address, town, postcode. Customer IDAddress Lake Rd, Mentone, /45 Richmond Lane, Richmond, Spring St, Melbourne, 3000

Repaired! Now each field can be searched & sorted and used individually (e.g. addressing envelopes) Customer IDStreetSuburbPostcode Lake RdMentone /45 Richmond LaneRichmond Spring StMelbourne3000

2NF

2NF – Second Normal Form Achieving 2NF means 1NF has already been achieved Each normal form builds on the previous forms Removes more duplicate data. Deals with design problems that could threaten data integrity.

2NF – Second Normal Form Remove subsets of data that apply to multiple rows of a table and place them in separate tables. Create relationships between these new tables and their predecessors using unique keys.

CUSTOMER Customer IDNamePhone 111Fred Smith Mary Jones (BH) (AH) 333Tim Blogs Raw data

First normal form… Repetition removed Fields broken up but… Customer IDLast NameFirst NamePhone1Phone2 111SmithFred JonesMary BlogsTim

Problems: Trouble querying the table: “Which customer has phone # ?” Have to search more than 1 field… messy. Can’t enforce validation rules to prevent duplicate phone #s Can’t enter three or more phone numbers Waste of space for all people with only 1 number If Mary Jones got married and changed her name, changes would need to be made in more than one record. If one change were missed, the integrity of the data would be damaged. Making multiple changes like this is also time-consuming and repetitious, thereby eating up storage space. Solution: Put the phone numbers into their own table as there can be more than one phone number for each name.

2 nd Normal Form (2NF) CUSTOMER PHONE TABLE Customer IDPhone Customer IDLast NameFirst Name 111SmithFred 222 JonesMary 333BlogsTim Relationship Called a ‘1 to many relationship’ One customer record to many phone numbers Also written as 1:many or 1: ∞

Database Design 1 ∞ The design would be drawn like this

Benefits: Name changes now only need to be made once. Unlimited phone numbers for everyone! No need to search multiple Phone fields No need to search through all text to extract a particular phone number All we need is a 1:many relationship between customer name table and customer phone table using the Customer ID as the key field. 2 nd Normal Form (2NF)

Without 2NF: flat fileWith 2NF: relational Department data is only stored once. So: Less storage space required Department changes now only made once, not once for each worker in that dept! Another example

2NF The table above is a problem. Let’s say {Model Full Name} is the primary key. The {Manufacturer Country} field is based on the {Manufacturer} field, and will need to be constantly updated if manufacturers change their location. To be properly 2NF, you’d need to do this… Electric Toothbrush Models ManufacturerModelModel Full NameManufacturer Country ForteX-PrimeForte X-PrimeItaly ForteUltracleanForte UltracleanItaly Dent-o-FreshEZBrushDent-o-Fresh EZBrushUSA KobayashiSR=60Koboyashi ST-60Japan HochToothmasterHoch ToothmasterGermany HochX-PrimeHoch X-PrimeGermany

2NF ManufacturerManufacturer Country ForteItaly ForteItaly Dent-o-FreshUSA KobayashiJapan HochGermany HochGermany ModelModelFullName X-PrimeForte X-Prime UltracleanForte Ultraclean EZBrushDent-o-Fresh EZBrush ST-60Koboyashi ST-60 ToothmasterHoch Toothmaster X-PrimeHoch X-Prime Now the data is grouped – Manufacturer details in one table, Model details in the other, BUT how do you know which manufacturer makes which model now?

2NF Make the same key fields in each table ManufacturerManufacturer Country ForteItaly Dent-o-FreshUSA KobayashiJapan HochGermany ManufacturerModelModelFullName ForteX-PrimeForte X-Prime ForteUltracleanForte Ultraclean Dent-o-FreshEZBrushDent-o-Fresh EZBrush KobayashiSR=60Koboyashi ST-60 HochToothmasterHoch Toothmaster HochX-PrimeHoch X-Prime Set up the relationship between the key fields in each table

3NF

Third normal form (3NF) goes one step further Use codes to minimize the amount of storage Use codes as links to other tables so can find any information Sets up relationships between tables In each table only need to have fields that are dependant on the primary key Also divides data as reference and transaction data.

Using the previous example - 2NF ManufacturerManufacturer Country ForteItaly Dent-o-FreshUSA KobayashiJapan HochGermany ManufacturerModelModel Full Name ForteX-PrimeForte X-Prime ForteUltracleanForte Ultraclean Dent-o-FreshEZBrushDent-o-Fresh EZBrush KobayashiSR=60Koboyashi ST-60 HochToothmasterHoch Toothmaster HochX-PrimeHoch X-Prime

3NF To get it to 3 rd normal form, replace repeating data with codes. MCodeManufacturerManufacturer Country 1ForteItaly 2Dent-o-FreshUSA 3KobayashiJapan 4HochGermany MCodeModelModelFullName 1X-PrimeForte X-Prime 1UltracleanForte Ultraclean 2EZBrushDent-o-Fresh EZBrush 3ST-60Koboyashi ST-60 4ToothmasterHoch Toothmaster 4X-PrimeHoch X-Prime

Reference and Transaction Data All data can be classified as either reference data or transaction data Reference Data is data that rarely changes and is ‘referred’ to (or used in lookups): people’s names addresses Products Starts with a unique code that is used in other tables

Reference and Transaction Data Transaction Data is data that is regularly changed (edit, add or delete) when a customer buys something, when someone withdraws money, when someone wins a tournament. Usually has a unique code, a date, and information about the transaction, i.e. the purchase price and who made the purchase. Uses the codes set up in Reference Data tables

3NF Field name underlining indicates key fields. You may have a gut feeling that this table is not good. But why?

3NF Each attribute (‘field’) should be giving information about the key field (a particular tournament + year).

3NF This is wrong because the DOB does not describe the key field (tournament). It describes a looked-up value (the tournament’s winner).

3NF FAIL It’s like your mum keeping her knickers in your sock drawer because you’re related to her. They don’t belong there!

Raw Data

1NF First NameLast NameDOBTournamentYear ChipMasterton14/03/1977Indiana Invitational1999 AlFredrickson21/07/1975Indiana Invitational1998 BobAlbertson28/09/1968Cleveland Open1999 AlFredrickson21/07/1975Des Moines Masters1999 Data broken up into separate fields Date of birth converted into proper format

2NF Data grouped but … Data is still repeated Player CodeFirst NameLast NameDOB 1ChipMasterton14/03/1977 2AlFredrickson21/07/1975 3BobAlbertson28/09/1968 Player Phone Numbers Tournament Winners Player CodeFirst NameLast NameTournamentYear 1ChipMastertonIndiana Invitational1999 2AlFredricksonIndiana Invitational1998 3BobAlbertsonCleveland Open1999 2AlFredricksonDes Moines Masters1999

3NF Data grouped meaningfully - Tournaments, Players, Winners No repeating data Codes used to link tables Relationships created TournamentCodeTournament 1Indiana Invitational 2Cleveland Open 3Des Moines Masters Player CodeTournamentCodeYear Player CodeFirst NameLast NameDOB 1ChipMasterton14/03/1977 2AlFredrickson21/07/1975 3BobAlbertson28/09/1968 Tournaments Players Tournament Winners

Reference and Transaction Data Transaction Data – The Tournaments Winners tables is regularly updated, every time someone wins a tournament Reference Data – The Players table only changes when someone else joins or leaves a tournament – The Tournaments table changes when the tournament name changes or new tournaments are added or deleted.

Reference Data TournamentCodeTournament 1Indiana Invitational 2Cleveland Open 3Des Moines Masters Player CodeFirst NameLast NameDOB 1ChipMasterton14/03/1977 2AlFredrickson21/07/1975 3BobAlbertson28/09/1968 Player CodeTournamentCodeYear Transaction Data Unique code Lookup data Changed rarely Uses codes from reference data Has extra information about event Changes frequently Tournaments Players Tournament Winners table

Entering the data Don’t worry about the logistics of putting the codes into the data yet. This is dealt with later in the program.

Normalise this data Bounces Online Books NameAddressBook purchasedItem CostDate of purchaseQuantityTotal Cost Tom Jones56 Latrobe Street,Melbourne, VIC 3000The Girl in the Hornet's Nest$ /03/20111$24.95 Tom Jones65 Latrobe Street,Melbourne, VIC 3000Curiosity Killed the Cat$ /03/20111$14.95 Mary Small236 Smith Street, Collingwood VIC 3002Lord of the Necklaces$ /03/20112$37.90 Mary Small237 Smith Street, Collingwood VIC 3002The Girl in the Hornet's Nest$ /03/20111$24.95 Fred Blogs45 High Street, Sydney, NSW, 2000The Hobby$ /03/20112$27.90 Fred Blogs45 High Street, Sydney, NSW, 2000Lord of the Necklaces$ /03/20111$24.95 Fred Blogs45 High Street, Newcastle, NSW, 2000The Girl in the Hornet's Nest$ /03/20111$24.95

First stage - 1NF First Name Last NameAddress1 Addre ss2SuburbStatePostcodeBook purchasedItem Cost Date of purchaseQuantityTotal Cost Tom Jones 56 Latrobe Street MelbourneVIC3000 The Girl in the Hornet's Nest$ /03/20111$24.95 Tom Jones 65 Latrobe Street MelbourneVIC3000 Curiosity Killed the Cat$ /03/20111$14.95 Mary Small 236 Smith StreetCollingwoodVIC3002Lord of the Necklaces$ /03/20112$37.90 Mary Small 236 Smith StreetCollingwoodVIC3002 The Girl in the Hornet's Nest$ /03/20111$24.95 Fred Blogs45 High StreetSydneyNSW2000The Hobby$ /03/20112$27.90 Fred Blogs45 High StreetSydneyNSW2000Lord of the Necklaces$ /03/20111$24.95 Fred Blogs45 High StreetSydneyNSW2000 The Girl in the Hornet's Nest$ /03/20111$24.95

Second Stage – 2NF CustomerCode First NameLast NameAddress1Address2SuburbStatePostcode 116Tom Jones 56 Latrobe Street MelbourneVIC Mary Small236 Smith StreetCollingwoodVIC Fred Blogs45 High StreetSydneyNSW2000 CustomerCodeBook purchasedItem Cost Date of purchaseQuantityTotal Cost 116 The Girl in the Hornet's Nest$ /03/20111$ Curiosity Killed the Cat$ /03/20111$ Lord of the Necklaces$ /03/20112$ The Girl in the Hornet's Nest$ /03/20111$ The Hobby$ /03/20112$ Lord of the Necklaces$ /03/20111$ The Girl in the Hornet's Nest$ /03/20111$24.95 Customer table Books Purchased table

Third Stage - 3NF Customer Table CustomerCode First NameLast NameAddress1Address2SuburbStatePostcode 116Tom Jones56 Latrobe Street MelbourneVIC Mary Small236 Smith StreetCollingwoodVIC Fred Blogs45 High StreetSydneyNSW2000 Purchases Table CustomerCodeBookCodeDate of purchaseQuantityTotal /03/20111$ /03/20111$ /03/20112$ /03/20111$ /03/20112$ /03/20111$ /03/20111$24.95 Books Table BookCodeBook NameGenreItem Cost 1The Girl in the Hornet's NestMurder Mystery$ Curiosity Killed the CatRomance$ Lord of the NecklacesFantasy$ The HobbyFantasy$13.95

Reference and Transaction Data Which tables are Reference Data tables? – Customer table – Book table Which table is a Transaction data table? – Purchases table

The front-end screen would look something like this: Purchases data entered into the Transaction table, with drop-down lists which use data from the Reference Data tables

In other words Let X → A be a nontrivial FD (i.e. one where X does not contain A) and let A be a non-key attribute. Also let Y be a key of R. Then Y → X. Therefore A is not transitively dependent on Y if and only if X → Y, that is, if and only if X is a superkey. ’kay?

By Mark Kelly McKinnon Secondary College vceit.com These slideshows may be freely used, modified or distributed by teachers and students anywhere on the planet (but not elsewhere). They may NOT be sold. They must NOT be redistributed if you modify them. VCE IT THEORY SLIDESHOWS