Physical Database Design CIT 381 - alternate keys - named constraints - indexes.

Physical Database Design CIT 381 - alternate keys - named constraints - indexes

Constraints We have seen primary key constraints and not null constraints. We can name the constraint: CREATE TABLE Student ( STUD_NUMinteger, STUD_FNAMECHAR(10), STUD_LNAMECHAR(20), STUD_ADDRESSCHAR(30), STUD_DEPT_IDINTEGER, CONSTRAINT stud_pk PRIMARY KEY(STUD_NUM), CONSTRAINT stud_ln NOT NULL (STUD_LNAME) )

Why name constraints? For easier control: DROP CONSTRAINT stud_ln; - easy to remove a constraint without rebuilding table SET CONSTRAINTS stud_pk DEFERRED; - this says do not enforce constraint until transaction is complete (Informix)

UNIQUE A way to specify alternate keys. Let’s add such a constraint to the Student table - say the student name forms (another, or candidate) key. ALTER TABLE Student ADD CONSTRAINT stud_name_key UNIQUE (STUD_FNAME, STUD_LNAME);

Also Foreign Key Constraint ALTER TABLE Student ADD CONSTRAINT stud_fk1 FOREIGN KEY STUD_DEPT_ID REFERENCES Department (DEPT_ID); Of course, these constraints can be declared when the table is created (or added in the Relationship View of Access). Naming the constraint is optional.

Physical Design There are four main aspects to physical design: ER Model to Relational Model mapping Denormalization Indexing Physical storage issues (such as fragmentation)

Relational Mapping Here we convert entity-relationship diagrams to relations (=tables) Entities become tables Relationships become foreign keys, except,… Many-to-many (non-specific) relationships become tables Data types get set, depending on chosen DBMS (MySQL, Oracle, Access, etc.)

Denormalization From ER Studio user guide Denormalization is an unavoidable part of designing databases. No matter how elegant a logical design can appear on paper, it often breaks down in practice because of the complex or expensive queries required to make it work. Sometimes, the best remedy for the performance problems is to depart from the blueprint, the logical design. Indeed, denormalization is perhaps the most important reason for separating logical and physical designs - you need not compromise your blueprint while still addressing real-world performance problems.

Indexing An index is a data structure associated with a table allowing faster look-up access to that table. -Usually they are a B-tree - Others: hash table (common), R-tree (not common) -Note: in DB-speak, the plural of index is indexes, not the usual indices.

Creating an index CREATE INDEX stud_idx1 ON Student (STUD_NUM); This will create an index on the primary key. Usually this is done by default. If you expect queries to look at that field in descending order, consider CREATE INDEX stud_idx1 ON Student (STUD_NUM DESC);

Secondary Indexes If we expect many queries on the student last name CREATE INDEX stud_idx2 ON Student (STUD_LNAME);

… or if we have many queries on the (lastName, firstName) pair CREATE INDEX stud_idx3 ON Student (STUD_LNAME, STUD_FNAME); If we did not have the UNIQUE constraint, we could have enforced it through the index: CREATE UNIQUE INDEX stud_idx3 ON Student (STUD_LNAME, STUD_FNAME);

B Trees The most common indexing structure, using a tree structure: - each node is set to be a disk block - hence smaller search keys increase fan-out Root 1724 30 2* 3*5* 7*14*16* 19*20*22*24*27* 29*33*34* 38* 39* 13

Use of Indexes Speed up many sorts of queries Assist in computation of join operations Used in sorting a table (for ORDER BY or GROUP BY) Downsides: table updates now become slow - an insertion into a table requires insertion of search key into each of its indexes Indexes can use a lot of space - often more than the table

From ER Studio user guide “One purpose of indexes is to improve performance by providing a more efficient mechanism for locating data. Indexes work like a card catalog in a library: instead of searching every shelf for a book, you can find a reference to the book in the card catalog, which directs you to the book’s specific location. Logical indexes store pointers to data so that a search of all of the underlying data is not necessary. Indexes are one of the most important mechanisms for improving query performance.”

“However, injudiciously using indexes can negatively affect performance. You must determine the optimal number of indexes to place on a table, the index type and their placement in order to maximize query efficiency.”

Index Number (from guide) “While indexes can improve read (query) performance, they slow write (insert, update, and delete) performance. This is because the indexes themselves are modified whenever the data is modified. As a result, you must be judicious in the use of indexes. If you know that a table is subject to a high level of insert, update and delete activity, you should limit the number of indexes placed on the table. Conversely, if a table is basically static, like most lookup tables, then a high number of indexes should not impair overall performance.”

Index Type (from guide) “Generally, there are two types of queries: point queries, which return a narrow data set, and range queries, which return a larger data set. For those databases that support them, clustered indexes are better suited to satisfying range queries, or a set of index columns that have a relatively low cardinality. Non-clustered indexes are well suited to satisfying point queries.”

Bulk Loading To insert a large amount of data into a table 1.Drop all indexes 2.Sort the data to be inserted 3.Insert the data (sorting helps disk blocks line up) 4.Rebuild indexes reconstruction from scratch is often faster than one-by-one insertion

Fragmentation Split the contents of the table … into separate locations on disk onto several disks Problem: disk i/o is slow Two types: vertical fragmentation some columns here, some there horizontal fragmentation some rows here, some there

Physical Placement Put frequently joined tables on separate hard drives. This yields parallel i/o. Alternately, very frequently joined tables should be merged (denormalized). Note: about 80% of cpu cycles are spent performing joins.

From ER Studio guide Two key concerns of every database administrator are free space management and data fragmentation. If you do not properly plan for the volume and growth of your tables and indexes, these two administrative issues could severely impact system availability and performance. Therefore, when designing your physical model, you should consider the initial extent size and logical partition size.

Physical Database Design CIT 381 - alternate keys - named constraints - indexes.

Similar presentations

Presentation on theme: "Physical Database Design CIT 381 - alternate keys - named constraints - indexes."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Physical Database Design CIT 381 - alternate keys - named constraints - indexes.

Similar presentations

Presentation on theme: "Physical Database Design CIT 381 - alternate keys - named constraints - indexes."— Presentation transcript:

Similar presentations

About project

Feedback