DATABASE PHYSICAL DESIGN Chandra S. Amaravadi 1. INTRODUCTION 2.

Slides:



Advertisements
Similar presentations
Advance Database System
Advertisements

9/26/2000SIMS 257: Database Management Physical Database Design University of California, Berkeley School of Information Management and Systems SIMS 257:
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
Efficient Storage and Retrieval of Data
Modern Systems Analysis and Design Third Edition
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Team Dosen UMN Physical DB Design Connolly Book Chapter 18.
1 Lecture 7: Data structures for databases I Jose M. Peña
Chapter 6 Physical Database Design. Introduction The purpose of physical database design is to translate the logical description of data into the technical.
Chapter 10 Storage and File Structure Yonsei University 2 nd Semester, 2013 Sanghyun Park.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering.
1 © Prentice Hall, 2002 Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott,
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
DATABASE MGMT SYSTEM (BCS 1423) Chapter 5: Methodology – Conceptual Database Design.
External data structures
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
 2001 Prentice Hall Business Publishing, Accounting Information Systems, 8/E, Bodnar/Hopwood A field may be a single character or number, or it.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Chapter 10 Designing the Files and Databases. SAD/CHAPTER 102 Learning Objectives Discuss the conversion from a logical data model to a physical database.
Methodology – Physical Database Design for Relational Databases.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Chapter 9 Designing Databases 9.1.
Disk storage systems Question#1 (True/False) A track is divided into multiple units called sectors.
Chapter 5 Record Storage and Primary File Organizations
Storage and File Organization
Hashing (part 2) CSE 2011 Winter March 2018.
Module 11: File Structure
Indexing Structures for Files and Physical Database Design
CHP - 9 File Structures.
Record Storage, File Organization, and Indexes
Chapter 11: File System Implementation
Physical Database Design
LEARNING OBJECTIVES O(1), O(N) and O(LogN) access times. Hashing:
Ch. 8 File Structures Sequential files. Text files. Indexed files.
ITD1312 Database Principles Chapter 5: Physical Database Design
COMP 430 Intro. to Database Systems
Methodology – Physical Database Design for Relational Databases
Database Management System
Database Management Systems (CS 564)
Physical Database Design for Relational Databases Step 3 – Step 8
Modern Systems Analysis and Design Third Edition
Review Graph Directed Graph Undirected Graph Sub-Graph
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Hash Table.
Hash Table.
Disk Storage, Basic File Structures, and Hashing
國立臺北科技大學 課程:資料庫系統 fall Chapter 18
Database Implementation Issues
Physical Database Design
Chapter 12 Designing Databases
Chapter 6: Physical Database Design and Performance
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Advance Database System
Variable Length Data and Records
File Storage and Indexing
DATABASE IMPLEMENTATION ISSUES
Database Implementation Issues
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
DBMS Physical Design Physical design is concerned with the placement of data and selection of access methods for efficiency and ongoing maintenance.
Database Implementation Issues
Presentation transcript:

DATABASE PHYSICAL DESIGN Chandra S. Amaravadi 1

INTRODUCTION 2

PHYSICAL DATABASE DESIGN Physical database design is concerned with issues revolving around data base implementation: Implementation design Database storage, access & location File organization & constraints 3

Conceptual/ Base table THE THREE FORMS OF DATA External Internal/ Hardware level These three levels provide logical and physical data independence 4 Cust#NameAddressBalance 100Gordon110 Oak Street $ Prasad 22 Birch place$ ……….………………

Create table Alter table Create index drop index Facilities ConceptualConceptual InternalInternal ExternalExternal Models Schemas File Organizations Views THE THREE TYPES OF MODELS Create view Drop view 5

DATABASE PHYSICAL DESIGN Inputs? 6

COMPONENTS OF PHYSICAL DESIGN 1.Implementation design 2.Storage, access & distribution strategies 3.File organizations 4.Specifications for integrity constraints (later) 7

IMPLEMENTATION DESIGN Decide on tables (de-normalization) Decide on primary and cross reference keys (not discussed further) Decide on attribute data types (not discussed further) E.g. fixed vs variable length fields integer vs double integer Design reports and forms (not discussed further) Concerned with taking the results of normalization and designing tables, attributes, data types for implementation. 8 Field NameData typeDescriptionLengthDecimals Prod#NumericUnique prod code60 DescrTextShort prod description 250 PriceCurrencyProduct price62

Denormalization Example (for 1:1) Parts(Part#, PartName, ) Container (ContainerID, #fin, #needed, Part#) Parts(Part#, PartName, ContainerID, #fin, #needed) DECIDING ON TABLES 9 Denormalization is going back in the normal forms to reduce schema overhead

DECIDING ON TABLES.. Denormalization Example (for M:N) ORDERS PRODUCTSAre for Ord# Ord_dt Qty Prod# Descr. What tables does normalization result in? 10

Orders(ord#, ord_dt,..) Product(prod.#, descr,..) Orders for prod (prod.#, ord#, qty) DENORMALIZATION Orders(ord#, ord_dt,..) Product(prod.#, ord#, descr., qty..) 11

COMPONENTS OF PHYSICAL DESIGN.. 1. Implementation design 2. Storage and access strategies 3. Distribution strategies 4. File organizations 5. Specifications for integrity constraints (later) 12

STORAGE & ACCESS STRATEGIES Estimate storage requirements (Volume analysis) Determine media to be used (not discussed) Study how data is being acccessed (Usage analysis) Use these to develop file organization (later) OBJECTIVES 13 ALSO CALLED VOLUME & USAGE ANALYSIS Volume and Usage analysis is carried out with a composite usage map.

COMPOSITE USAGE MAP Used for volume & usage analysis  file org. Superimposed on ER Chart Attributes are not shown Shows estimated number of records (volume) Shows type of access (dotted lines ) A composite usage map is simply an ER chart (without attr), that shows the number of records, and the frequency/pattern with which they are accessed. 14

VOLUME & USAGE ANALYSIS 15 Equipment, Parts and PE tables Equipment: 100; Parts:12,000; PE: 10, inquiries per hour to Equipment 300 inquiries per hour on Parts table 70% of these inquiries also need to know Equipment info. Draw a composite usage map, estimate storage requirements and develop a suitable file organization

COMPOSITE USAGE MAP EQUIPMENT PARTS ARE FOR (100) (12,000) PE (10,000) 20 ???? ??? 16

FOR DISCUSSION How can one estimate the size of a database? 17

ESTIMATING STORAGE REQMTS. FOR PARTS AND EQUIPMENT EQUIPMENT (Model#, Descr, Mfr., Price, HP, WT) PARTS(Part#, Descr, Mfr, Price) PE (Model#, Part#, Qty) 18 Equipment table: = 33 bytes/record Parts table: ?? PE table: ?? Total storage requirements = ??

STORAGE REQUIREMENTS RECORD SIZE: 33 Bytes # OF RECORDS: 100 FILE SIZE: 33 * 100 = 3,300 Bytes EQUIPMENT TABLE: 19 PARTS TABLE: RECORD SIZE: 25 Bytes # OF RECORDS: 12,000 FILE SIZE: 25 * = 300,000 Bytes PE TABLE: RECORD SIZE: 10 Bytes (approx) # OF RECORDS: 10,000 FILE SIZE: 10 * = 100,000 Bytes TOTAL STORAGE: ??????

A MORE ELABORATE EXAMPLE Parts are manufactured parts and purchased parts Parts: 1,000; Suppliers:50; Quotations: 2,500 Total of 200 parts inquiries 60 direct inquiries to purchased parts Of the purchased parts inquiries, 80 are also to quotation Of these 80, 70 are to supplier as well. 75 direct queries to supplier Of these 40 are for quotation All of these are also for parts 40% 70% 20

ANOTHER EXAMPLE.. PART MANU- FACTURED PURCH- ASED SUPPLIER QUOTA- TION Is -a (1000) (400) (700) 40%70% (2500) (50) A COMPOSITE USAGE MAP Note: # of records are in red; the # of accesses are in blue

COMPONENTS OF PHYSICAL DESIGN.. 1. Implementation design 2. Storage & access strategies 3. Distribution strategies 4. File organizations 5. Specifications for integrity constraints (later) 22

1. Centralized 2. Distributed Replicated (not discussed) Partitioned DISTRIBUTION STRATEGIES Distribution strategies are concerned with where the files are physically located. 23

DISTRIBUTION STRATEGIES Centralized -- All the data is stored in one physical location. Distributed -- The data is stored in multiple physical locations. Replicated -- The database is duplicated in multiple locations. Partitioned -- The database is divided into “fragments” and each fragment is stored in a different location. 24

CENTRALIZED VS DISTRIBUTED Which is bottleneck? Which causes security problems? Which method may be required for business reasons? In which setup is data more accessible? Which provides better performance? 25

CENTRALIZED STRATEGY Maximize local access, minimize remote access General Principle: S1 S2 S WHERE SHOULD WE LOCATE THE DATABASE? S1, S2 or S3 26

This slide is blank

DISTRIBUTED DATABASE EIDNameCity 2356ArmstrongLA 3286NickersonSF 3356ForresterMPLS LA SF MPLS partitioning

COMPONENTS OF PHYSICAL DESIGN.. 1. Implementation design 2. Storage & access strategies 3. Distribution strategies 4. File organizations 5. Specifications for integrity constraints (later) 29

FILE ORGANIZATION Tracks Sectors File 1 Rec. 1,2.. How records are arranged and retrieved from secondary storage or mapping between ____ and ______? 30

DATA ACCESS (FYI) Hard drive IOP FAT/NTFS O/SDBMS Requests Consults Directory tables Generates instructions to IOP Partition RAM 31 Database storage User

FILE ORGANIZATION Retrieval time (disk access) Access type (direct, sequential) Storage space Maintenance effort Selection Criteria 32

OVERVIEW OF FILE ORGANIZATIONS Sequential Hashed Indexed ISAM VSAM 33

OVERVIEW OF FILE ORGANIZATIONS.. Sequential -- Records are stored one after another in pkey sequence. Hashed --Record address is determined by subjecting pkey to hashing algorithm. Indexed --Same as sequential except that there is an index file which places keys into a separate file for ease of searching. 34

THE SEQUENTIAL ORGANIZATION Records in Pkey sequence Access only sequential Insertions/Deletions in sequential order Simple organization good for batch updates Part#Descr. 100Aux. motors 120Scrapers 124Rotors

THE HASHING ORGANIZATION A type of file organization where record addresses are generated by subjecting primary keys to a hashing routine, usually by dividing by a prime# Hashing Algorithm PkeyHash Address = REM [(Pkey)/(Prime#)] + Address of Starting Block

HASHING CONCEPTS Hashing algorithm Hash address Buckets & Bucket size Slots Collisions/overflows Load factor Search length n Record address = hash address + physical addr 37 Following are important concepts in hashing: 3432 Pkey = 43 Hash address = (43 remainder 7) = 1 Record address = = File space

HASHING CONCEPTS.. Hashing algorithm – the formula used to calculate a record address Hash address – an address (within block) where a hashed record is stored Buckets – storage area for a group of records; bucket size refers to # of slots. Slots – storage area for an individual record Collision – when two records hash to the same address Load factor – is the ratio of # of records to the total space allocated Average search length – is the time it takes to retrieve a record on the avg. (usually expressed in terms of disk accesses) Disk access – every time a disk is accessed for getting a record (if the record is stored in its hardware address, one access otherwise it depends on record location) 38

HASHING ALGORITHM Choose load factor Identify # of buckets to be allocated Select a prime# close to this number Divide each pkey by prime# Remainder = record address Sequentially number the buckets Place each record to its address If there are overflows, use Open 39

HASHING CONCEPTS n Collision: When two keys hash to the same address Open overflow (store in unallocated slots) Chained overflow (a separate area) OVERFLOWS 40

HASHING EXAMPLE Given Part#s: 100Gears 120Scrapers 130Aux motors 140Crankshafts 145Cylinder heads 150Pistons 100 Mod 7 = Mod 7 = Mod 7 = Mod 7 = Mod 7 = Mod 7 = 3 assume 8 buckets (0..7) assume 1 slot per bucket assume disk access time of 20 ms 41

HASHING EXAMPLE Gears 120 Scrapers 130 Aux. motor Crankshaft 145 Cylinders FILE LOADINGS 150 Pistons 6 Insert: 135 Shovel? 135 Mod 7 = 2 Average search length? 6 records -> 1 access 1 record -> 2 accesses 7 Load factor: ? Bucket size = ? 42

THE HASHING ORGANIZATION H(pkey) --> record address Records in hash sequence Need to allocate extra space Load factor between 60-80% Good for low activity (FAR) files Real-time and OO applns. EVALUATION 43

DISCUSSION A parts file with Part# as the pkey includes records with the following part# values: 23,37,46,48, 56,18, 10, 71, 16, 24, 39, 47 and 69. The file uses 8 buckets numbered 0 to 7. Each bucket holds two records. Load these records into the file in the given order using the hash function h(K) = K mod 8. Calculate the average search length in terms of # of disk accesses. Assume 20ms disk access. 44

INDEXED ORGANIZATION Primary key Secondary key Clustered A method of file organization where a subset of key values are stored in an index. Types are: 45

Records are in pkey sequence (master file) But are organized into groups Grouping information is stored in index file Records can be inserted at random Records can be accessed in sequence or at random THE INDEXED ORGANIZATION (ISAM) 46

……… Index file (index set) Master file (sequence set) Emp ID Angela108 Scott104 Becky103 Jacob101 name ID# THE INDEXED ORGANIZATION 47

THE INDEXED ORGANIZATION TRACKS CYLINDER1 48 CYLINDER2 CYLINDER1 CYLINDER2

THE ISAM ORGANIZATION Cylinder index Track index Overflow tracks Sequence Set CYLINDER1 CYLINDER N.. Index Set … …. … Note: Assume that the corresponding HW addresses are stored along with the pkeys 49

INSERTIONS IN ISAM Identify track where record needs to be inserted If the track is full, insert in overflow area If the track has room insert pkey in sequence Update track index and cylinder index if necessary 50

ISAM: ADVANTAGES AND DISADVANTAGES Access is direct or sequential? Access time dependent on? Rewrite sequentially Retrieval time uniform Suitable for volatile files? Workhorse organization used in most apps. 51

SECONDARY KEY INDEX REC# E_SSN E_NAME E_TITLE E_SALARY Smith Developer $35, Johnson Analyst $27, Weintraub Developer $60, Dickson Manager $64, HollandAnalyst $47, Rao Analyst $71, McDonald Manager $85,000 EMPLOYEE E_TITLE REC# Analyst 2,5,6 Manager 4,7 Developer 1,3 52

CLUSTERED INDEX Address e_ssn e_name e_title e_salary Johnson Analyst $27, HollandAnalyst $47, Rao Analyst $71, Dickson Manager $64, McDonald Manager $85, Weintraub Programmer $60, Smith Programmer $35,000 EMPLOYEE E_title Address Analyst 1 Manager 4 Developer 6 Also known as Inverted file organization 53

INDEXING STRATEGIES Index if you must Index on pkey Index on foreign keys Index on secondary key (depending on query frequency) 54

DISCUSSION What activities are part of identifying storage strategies? How is denormalization carried out for M:N relationships? How many indexes can you have per table? How many clustered indexes? Can we sequentially update all records in a) hashing organization? b) in indexing? Is indexing suitable for volatile files? If an index consists of 3 levels of indexes with the main index in RAM, and a disk access time of 20 MS, how long on the average does it take to retrieve a record? What problems do overflow records cause in hashing? A file is required to store 60,000 records; how much space is required in order to store the records using a hashing organization? 55

THE END! 56