Database table design Single table vs. multiple tables Sen Zhang.

Slides:



Advertisements
Similar presentations
1 Class Agenda (04/03 and 04/08)  Review and discuss HW #8 answers  Present normalization process Enhance conceptual knowledge of database design. Improve.
Advertisements

Normalization of Database Tables
Microsoft Access Removing Redundancy in a Database.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Hashing General idea: Get a large array
Chapter 5 Normalization of Database Tables
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Department of Computer Science and Engineering, HKUST Slide 1 Finding All the Keys Computationally, finding all the keys can be done by exhaustive search:
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Introduction to Schema Refinement. Different problems may arise when converting a relation into standard form They are Data redundancy Update Anomalies.
1 DATABASE TECHNOLOGIES BUS Abdou Illia, Fall 2007 (Week 3, Tuesday 9/4/2007)
Normalization. Introduction Badly structured tables, that contains redundant data, may suffer from Update anomalies : Insertions Deletions Modification.
Week 6 Lecture Normalization
DATABASE MANAGEMENT SYSTEMS BASIC CONCEPTS 1. What is a database? A database is a collection of data which can be used: alone, or alone, or combined /
DATABASE MANAGEMENT SYSTEMS BASIC CONCEPTS 1. What is a database? A database is a collection of data which can be used: alone, or alone, or combined /
Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College.
Relational Database Concepts. Let’s start with a simple example of a database application Assume that you want to keep track of your clients’ names, addresses,
1 Class Agenda (11/07 and 11/12)  Review HW #8 answers  Present normalization process Enhance conceptual knowledge of database design. Improve practical.
Concepts and Terminology Introduction to Database.
FILE VS. DATABASES Let’s examine some basic principles about how data are stored in computer systems. – An entity is anything about which the organization.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Data and its manifestations. Storage and Retrieval techniques.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 6 Normalization of Database Tables.
Database Normalization Lynne Weldon July 17, 2000.
Schema Refinement and Normal Forms 20131CS3754 Class Notes #7, John Shieh.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
IST 210: ORGANIZATION OF DATA Chapter 1. Getting Started IST210 1.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
DAY 12: DATABASE CONCEPT Tazin Afrin September 26,
Natural vs. Generated Keys. Definitions Natural key—a key that occurs in the data, that uniquely identifies rows. AKA candidate key. Generated key—a key.
Normalization Information Systems II Ioan Despi. Informal approach Building a database structure : A process of examining the data which is useful & necessary.
Introduction to Databases Trisha Cummings. What is a database? A database is a tool for collecting and organizing information. Databases can store information.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
U:/msu/course/cse/103 Day 06, Slide 1 CSE students: Do not log in yet. Review Day 6 in your textbook. Think about.
M1G Introduction to Database Development 5. Doing more with queries.
Lecture No 14 Functional Dependencies & Normalization ( III ) Mar 04 th 2011 Database Systems.
M1G Introduction to Database Development 4. Improving the database design.
Component 4: Introduction to Information and Computer Science Unit 6a Databases and SQL.
Chapter 10 Normalization Pearson Education © 2009.
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
A337 - Reed Smith1 Structure What is a database? –Table of information Rows are referred to as records Columns are referred to as fields Record identifier.
Database Management Supplement 1. 2 I. The Hierarchy of Data Database File (Entity, Table) Record (info for a specific entity, Row) Field (Attribute,
IS 320 Notes for April 15, Learning Objectives Understand database concepts. Use normalization to efficiently store data in a database. Use.
1 Class Agenda (04/06/2006 and 04/11/2006)  Discuss use of Visio for ERDs  Learn concepts and ERD notation for data generalization  Introduce concepts.
Ch 7: Normalization-Part 1
CSCI 6962: Server-side Design and Programming Shopping Carts and Databases.
Databases Flat Files & Relational Databases. Learning Objectives Describe flat files and databases. Explain the advantages that using a relational database.
NORMALIZATION Handout - 4 DBMS. What is Normalization? The process of grouping data elements into tables in a way that simplifies retrieval, reduces data.
Logical Database Design and Relational Data Model Muhammad Nasir
Database Planning Database Design Normalization.
Chapter 1. Getting Started IST 210: Organization of Data IST2101.
1 CS122A: Introduction to Data Management Lecture #4 (E-R  Relational Translation) Instructor: Chen Li.
Chapter 15 1 Functional Dependencies and Normalization for Relational Databases تنبيه : شرائح العرض (Slides) هي وسيلة لتوضيح الدرس واداة من الادوات في.
Database Normalization. What is Normalization Normalization allows us to organize data so that it: Normalization allows us to organize data so that it:
1 CS490 Database Management Systems. 2 CS490 Database Normalization.
1 Normalization David J. Stucki. Outline Informal Design Guidelines Normal Forms  1NF  2NF  3NF  BCNF  4NF 2.
Decision Analysis Fall Term 2015 Marymount University School of Business Administration Professor Suydam Week 10 Access Basics – Tutorial B; Introduction.
Relational Databases Chapter 4.
CIS 155 Table Relationship
Translation of ER-diagram into Relational Schema
Accounting System Design
Teaching slides Chapter 8.
Functional Dependencies and Normalization
Accounting System Design
Flat Files & Relational Databases
Instructor: Mohamed Eltabakh
Chapter 19 (part 1) Functional Dependencies
Database Normalization
Presentation transcript:

Database table design Single table vs. multiple tables Sen Zhang

Why does any nontrivial relational database have many tables? –E-R model design: multiple constructs on ERD will be mapped to multiple tables. –This slides will give you a less formal, but more intuitive explanation.

Good things for single table design! We would like to consider to organize all information into one single flat table. The good things are obvious for such a single table design: –Simple and straightforward: one for all. –Everything can be found within one table. You can issue simple select command to retrieve almost all information you need based on this one big single table. Of course, it can be easily done, because you do not need to look at another table. So, single table design simplifies query answering.

Are there any disadvantages with this single table design? SsnNamecourseidCourse nameinstructor instructorID 8766John109C languageJames John242DatabaseStephen Bill242DatabaseStephen david230Image processing James Bill242DatabaseStephen 112 What this table tells us is the information about which courses have been enrolled by which students; other fields tell us other satellite information such as instructor’s name for each course. Underscored fields SSN and courseid together indicates a compound key of the table!

Why does the primary key consist of two fields? Neither SSN nor COURSEID alone can suffice to serve as the primary key of the table. –SSN alone does not have a unique value for each row. –COURSEID alone neither. –So we have to find a new primary key - in this case it has to be a compound key since no single attribute can uniquely identify a row. The new primary key is a compound key (COURSEID + SSN).

Redundancy! First of all, it contains redundant data. For example, not only a student’s ID information, but also his/her name has to be repeated for every course he/she enrolls. Similarly, the same situation exists for every course.

Problems due to Redundancy Every repetition of the same information is wasting storage space and liable to produce inconsistencies. –Waste space: it is obvious! The wasted space can be easily calculated! –Easily cause inconsistencies. For example, C Language could appear as c language in a different row, but they are supposed to be exactly same. Since same information appear in multiple places, it demands more effort to keep the same information consistent! –Also waste time!

Why waste time? OLTPs are designed for optimal transaction speed. When a consumer makes a purchase online, they expect the transactions to occur instantaneously. A database design should record the new data, changes by affecting the least information.

Update anomalies Furthermore, redundant data is the main cause of insertion, deletion, and updating anomalies, what together are called update anomalies. Update anomalies are problems that arise when information is inserted, deleted, or updated. –Insertion anomaly –Update anomaly –Deletion anomaly

INSERTION anomaly With the primary key including courseid, we cannot enter a new student until they have at least one course to study. We are not allowed NULLs in the primary key so we must have a pair of key value in both SSN and COUSEID before we can create a new record. –For example, a new student (1234) who just enrolls in the college but has not registered with any courses yet cannot be added into the table until he/she registers the first course. The primary key is a compound key (ssn# + courseid#). This is known as the insertion anomaly. It is difficult to insert new records into the database. On a practical level, it also means that it is difficult to keep the data up to date.

Deletion anomaly If a course is enrolled by only students, and that student needs to be deleted, then not only is the information about student but also information about the course will disappear. (But what we desire is that, any course should be recorded somewhere, even no any student enrolls it.) For example, If all of the records for student `8766 ' were deleted from the table, we would inadvertently lose all of the information on the course ‘109’ C Language. Because the only student who registers 109 is 8766, if 8766 is deleted, 109 disappears. Again this problem arises from the need to have a compound primary key. Because we cannot simply keep 109 by replacing 8766 with NULL, remember, SSN and course ID both contribute to key, which does not allow NULL. This would be the same for any student who was studying only one course and the course was deleted, the student which is supposed to be kept in the table has to be deleted.

Update anomaly –If the student 8766’s name was misspelled and we want to update his name, multiple rows (depends how many courses he/she has enrolled with wrong names!) would have to be updated with this new information.

Update anomalies –Traditionally, “update” is an umbrella name for update as well as insert and delete. –The above anomalies are mainly analyzed from the point of view of the needs to update students information. –All the above anomalies happen as well if it is courses that are of concern.

To summarize Why there are insert anomaly? it may not be possible to store some information unless some other information is stored as well (because both keys have to be known together, but usually not!). Why there are delete anomaly? It may not be possible to delete some information without losing some other information as well. Why there are update anomaly? if one copy of such repeated data is updated, an inconsistency is created unless all copies are similarly updated.

How to address these issues? To minimize redundancy and address all issues due to redundancy, we can consider to decompose the relation into multiple relations: –Split the table into multiple tables (three table here: one for student, one for course, another one for enrollment, possibly the fourth table for instructor depending on how refined the database design is aimed at.) –the fields contributing to the primary key for the original relation are included in the new separate relations to serve their primary keys respectively. –Primary key of the original table remains to be primary key in new table, which now has less fields! –Enforce foreign keys.

How to break information into multiple tables? –For trivial problem such as the table we are discussing, you can achieve the goal following you intuition. –For nontrivial problems involving too many fields and information, you might want to follow some established table design model at conceptual level, usually, you should use E-R model.

Any information loss due to decomposing of a single big relation into multiple relations? No information loss as long as the multiple tables design is reasonably constructed. You can still retrieve all interesting information from multiple tables. It simply requires you know how to write nontrivial select statements.

Welcome to CSCI342 In order to make sure no information loss, more advanced and useful techniques need to be studied, which will be discussed in CSCI342, not in this course! –Functional dependencies –Normal forms