Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database table design Single table vs. multiple tables Sen Zhang.

Similar presentations


Presentation on theme: "Database table design Single table vs. multiple tables Sen Zhang."— Presentation transcript:

1 Database table design Single table vs. multiple tables Sen Zhang

2 Why does any nontrivial relational database have many tables? –E-R model design: multiple constructs on ERD will be mapped to multiple tables. –This slides will give you a less formal, but more intuitive explanation.

3 Good things for single table design! We would like to consider to organize all information into one single flat table. The good things are obvious for such a single table design: –Simple and straightforward: one for all. –Everything can be found within one table. You can issue simple select command to retrieve almost all information you need based on this one big single table. Of course, it can be easily done, because you do not need to look at another table. So, single table design simplifies query answering.

4 Are there any disadvantages with this single table design? SsnNamecourseidCourse nameinstructor instructorID 8766John109C languageJames 111 8766John242DatabaseStephen 112 1334Bill242DatabaseStephen 112 1111david230Image processing James111 1345Bill242DatabaseStephen 112 What this table tells us is the information about which courses have been enrolled by which students; other fields tell us other satellite information such as instructor’s name for each course. Underscored fields SSN and courseid together indicates a compound key of the table!

5 Why does the primary key consist of two fields? Neither SSN nor COURSEID alone can suffice to serve as the primary key of the table. –SSN alone does not have a unique value for each row. –COURSEID alone neither. –So we have to find a new primary key - in this case it has to be a compound key since no single attribute can uniquely identify a row. The new primary key is a compound key (COURSEID + SSN).

6 Redundancy! First of all, it contains redundant data. For example, not only a student’s ID information, but also his/her name has to be repeated for every course he/she enrolls. Similarly, the same situation exists for every course.

7 Problems due to Redundancy Every repetition of the same information is wasting storage space and liable to produce inconsistencies. –Waste space: it is obvious! The wasted space can be easily calculated! –Easily cause inconsistencies. For example, C Language could appear as c language in a different row, but they are supposed to be exactly same. Since same information appear in multiple places, it demands more effort to keep the same information consistent! –Also waste time!

8 Why waste time? OLTPs are designed for optimal transaction speed. When a consumer makes a purchase online, they expect the transactions to occur instantaneously. A database design should record the new data, changes by affecting the least information.

9 Update anomalies Furthermore, redundant data is the main cause of insertion, deletion, and updating anomalies, what together are called update anomalies. Update anomalies are problems that arise when information is inserted, deleted, or updated. –Insertion anomaly –Update anomaly –Deletion anomaly

10 INSERTION anomaly With the primary key including courseid, we cannot enter a new student until they have at least one course to study. We are not allowed NULLs in the primary key so we must have a pair of key value in both SSN and COUSEID before we can create a new record. –For example, a new student (1234) who just enrolls in the college but has not registered with any courses yet cannot be added into the table until he/she registers the first course. The primary key is a compound key (ssn# + courseid#). This is known as the insertion anomaly. It is difficult to insert new records into the database. On a practical level, it also means that it is difficult to keep the data up to date.

11 Deletion anomaly If a course is enrolled by only students, and that student needs to be deleted, then not only is the information about student but also information about the course will disappear. (But what we desire is that, any course should be recorded somewhere, even no any student enrolls it.) For example, If all of the records for student `8766 ' were deleted from the table, we would inadvertently lose all of the information on the course ‘109’ C Language. Because the only student who registers 109 is 8766, if 8766 is deleted, 109 disappears. Again this problem arises from the need to have a compound primary key. Because we cannot simply keep 109 by replacing 8766 with NULL, remember, SSN and course ID both contribute to key, which does not allow NULL. This would be the same for any student who was studying only one course and the course was deleted, the student which is supposed to be kept in the table has to be deleted.

12 Update anomaly –If the student 8766’s name was misspelled and we want to update his name, multiple rows (depends how many courses he/she has enrolled with wrong names!) would have to be updated with this new information.

13 Update anomalies –Traditionally, “update” is an umbrella name for update as well as insert and delete. –The above anomalies are mainly analyzed from the point of view of the needs to update students information. –All the above anomalies happen as well if it is courses that are of concern.

14 To summarize Why there are insert anomaly? it may not be possible to store some information unless some other information is stored as well (because both keys have to be known together, but usually not!). Why there are delete anomaly? It may not be possible to delete some information without losing some other information as well. Why there are update anomaly? if one copy of such repeated data is updated, an inconsistency is created unless all copies are similarly updated.

15 How to address these issues? To minimize redundancy and address all issues due to redundancy, we can consider to decompose the relation into multiple relations: –Split the table into multiple tables (three table here: one for student, one for course, another one for enrollment, possibly the fourth table for instructor depending on how refined the database design is aimed at.) –the fields contributing to the primary key for the original relation are included in the new separate relations to serve their primary keys respectively. –Primary key of the original table remains to be primary key in new table, which now has less fields! –Enforce foreign keys.

16 How to break information into multiple tables? –For trivial problem such as the table we are discussing, you can achieve the goal following you intuition. –For nontrivial problems involving too many fields and information, you might want to follow some established table design model at conceptual level, usually, you should use E-R model.

17 Any information loss due to decomposing of a single big relation into multiple relations? No information loss as long as the multiple tables design is reasonably constructed. You can still retrieve all interesting information from multiple tables. It simply requires you know how to write nontrivial select statements.

18 Welcome to CSCI342 In order to make sure no information loss, more advanced and useful techniques need to be studied, which will be discussed in CSCI342, not in this course! –Functional dependencies –Normal forms


Download ppt "Database table design Single table vs. multiple tables Sen Zhang."

Similar presentations


Ads by Google