Physical Database Design

Physical Database Design
MGIS 641 Koç University

Learning objectives Define the terms: Physical database design, and its major objectives. Describe major components of physical DB design. Describe the purpose of data volume and usage analysis. Describe basic purposes of indexes. Query Optimization Storage Technology RAID Describe basic data distribution strategies.

Introduction Physical database design is the last stage of DB design process. Major objective is to implement the the DB as a set of stored records, files, indexes etc that will provide adequate performance and ensure DB integrity, security and recoverability.

DB development process
Planning Enterprise data model Analysis Conceptual data model Logical DB design Logical data model Physical DB design Technology model Implementation DB & repositories

Physical Design Process
Normalized relations Volume estimates Attribute definitions Response time expectations Data security needs Backup/recovery needs Integrity expectations DBMS technology used Inputs Attribute data types Physical record descriptions File organizations Indexes Query optimization Leads to Decisions

Figure 6.1 - Composite usage map
(Pine Valley Furniture Company)

(Pine Valley Furniture Company) Data volumes

(Pine Valley Furniture Company) Access Frequencies (per hour)

Figure 6.1 - Composite usage map (Pine Valley Furniture Company)
Usage analysis: 200 purchased parts accessed per hour  80 quotations accessed from these purchased part accesses  70 suppliers accessed from these 80 quotation accesses

Figure 6.1 - Composite usage map (Pine Valley Furniture Company)
Usage analysis: 75 suppliers accessed per hour  40 quotations accessed from these 75 supplier accesses  40 purchased parts accessed from these 40 quotation accesses

Volume Analysis TREATMENT PHYSICIAN PATIENT 4000 50 1000 (20) (4) (10)
CHARGE PATIENT PHYSICIAN ITEM TREATMENT 1000 50 500 4000 10,000 (4) (20) (10) Estimated No. of Entity instances After an Avg. period of 30 days patient records are archived. There are 100 beds on Avg. A patient stays for 3 days. 4 treatments during stay. 10 charges. No. of charges outstanding for an item is 20. A physician treats 20 patients in 30 days.

Data Usage Analysis Identify major transactions required
Analyse transactions to determine access paths and usage frequency TRANSACTION ANALYSIS FORM Transaction No : xxx-555 Date :3/5/2005 Transaction Name : Create Patient Bill Transaction Volume : Avg : 2/Hr Max : 10/Hr PATIENT 1 Transaction Map : Diagram showing the sequence of logical DB accesses. 1000 2 3 ITEM CHARGE 500 10,000 No Name Type of No of References Access Per Trans. Per Period 1 Entry-PATIENT R 1 10 2 PATIENT-CHARGE R 3 CHARGE-ITEM R

Composite Usage Map (75) (25) (100) (30) (60) (25) TREATMENT PATIENT
PHYSICIAN 4000 1000 50 (25) (50) (125) (50) (250) ITEM CHARGE Composite Usage Map : Concise reference to the estimated volume and usage of data in the DB 500 10,000 Composite Usage Map including other types of transactions

Designing Fields Field: smallest unit of data in database Field design
Choosing data type Coding, compression, encryption Controlling data integrity

Choosing Data Types CHAR – fixed-length character
VARCHAR2 – variable-length character (memo) LONG – large number NUMBER – positive/negative number DATE – actual date BLOB – binary large object (good for graphics, sound clips, etc.)

Example code-look-up table (Pine Valley Furniture Company)
Figure 6.2 Example code-look-up table (Pine Valley Furniture Company) Code saves space, but costs an additional lookup to obtain actual value.

Field Data Integrity Default value - assumed value if no explicit value Range control – allowable value limitations (constraints or validation rules) Null value control – allowing or prohibiting empty fields Referential integrity –for foreign-key to primary-key match-ups

Handling Missing Data Substitute an estimate of the missing value (e.g. using a formula) Construct a report listing missing values In programs, ignore missing data unless the value is significant Triggers can be used to perform these operations

File Organisation File organisation : A technique for physically arranging the records of a file on secondary storage Important factors to consider: Fast data retrieval High throuhput for processing data input and maintenance transactions Efficient usage of storage space Protection from failures or data loss Minimizing the need for reorganization Accomodating growth Security from unauthorized access

Indexes Index : A table or other data structure that is used to determine the locations of rows in a table that satisfy a condition. PRODUCT PRODUCT NO RECORD NO 50 3 Index tables are small and can be kept in the main memory.

When to use Indexes Larger tables Primary key, foreign key
Specify index for nonkey attributes that are referred to in qualification, sorting, and grouping commands (e.g. WHERE, ORDER BY, GROUP BY) Check DBMS for the limit

Query Optimization Wise use of indexes Compatible data types
Simple queries Avoid query nesting Temporary tables for query groups Select only needed columns No sort without index

File Access-Storage Technology RAID
RAID An array of physical disk drives to the database user as if they form one logical storage unit.

RAID-0

RAID-1

RAID-5

Definitions Distributed Database: A single logical database that is spread physically across computers in multiple locations that are connected by a data communications link Decentralized Database: A collection of independent databases on non-networked computers They are NOT the same thing!

Reasons for Distributed Database
Business unit autonomy and distribution Data sharing Data communication costs Data communication reliability and costs Multiple application vendors Database recovery Transaction and analytic processing

Data Distribution Strategy
Basic data distribution strategies : Centralised : All data are centralised at a single site. Access problems for for remote users, data comms. cost, DB fails when central system fails. Replicated : A full copy of DB is assigned to more than one site. Maximise local access, update problems. Partitioned : DB is partitioned into is divided into non-overlapping partitions and assigned to a particular site. Data closer to local users Hybrid: DB is fragmented into critical and non-critical fragments, non-critical fragments stored at only one site others at multiple sites etc.

Partitioning Horizontal Partitioning: Distributing the rows of a table into several separate files Useful for situations where different users need access to different rows Vertical Partitioning: Distributing the columns of a table into several separate files Useful for situations where different users need access to different columns The primary key must be repeated in each file Combinations of Horizontal and Vertical Partitions often correspond with User Schemas (user views)

Partitioning Advantages of Partitioning:
Records used together are grouped together Each partition can be optimized for performance Security, recovery Partitions stored on different disks: contention Take advantage of parallel processing capability Disadvantages of Partitioning: Slow retrievals across partitions Complexity

Data Replication Purposely storing the same data in multiple locations of the database Improves performance by allowing multiple users to access the same data at the same time with minimum contention Sacrifices data integrity due to data duplication

Physical Database Design

Similar presentations

Presentation on theme: "Physical Database Design"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Physical Database Design

Similar presentations

Presentation on theme: "Physical Database Design"— Presentation transcript:

Similar presentations

About project

Feedback