Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATABASE PHYSICAL DESIGN Chandra S. Amaravadi 1. INTRODUCTION 2.

Similar presentations


Presentation on theme: "DATABASE PHYSICAL DESIGN Chandra S. Amaravadi 1. INTRODUCTION 2."— Presentation transcript:

1 DATABASE PHYSICAL DESIGN Chandra S. Amaravadi 1

2 INTRODUCTION 2

3 PHYSICAL DATABASE DESIGN Physical database design is concerned with issues revolving around data base implementation: Implementation design Database storage, access & location File organization & constraints 3

4 Conceptual/ Base table THE THREE FORMS OF DATA External 100... 200... 300... Internal/ Hardware level These three levels provide logical and physical data independence 4 Cust#NameAddressBalance 100Gordon110 Oak Street $400 200Prasad 22 Birch place$2500 300 ……….……………….......

5 Create table Alter table Create index drop index Facilities ConceptualConceptual InternalInternal ExternalExternal Models Schemas File Organizations Views THE THREE TYPES OF MODELS Create view Drop view 5

6 DATABASE PHYSICAL DESIGN Inputs? 6

7 COMPONENTS OF PHYSICAL DESIGN 1.Implementation design 2.Storage, access & distribution strategies 3.File organizations 4.Specifications for integrity constraints (later) 7

8 IMPLEMENTATION DESIGN Decide on tables (de-normalization) Decide on primary and cross reference keys (not discussed further) Decide on attribute data types (not discussed further) E.g. fixed vs variable length fields integer vs double integer Design reports and forms (not discussed further) Concerned with taking the results of normalization and designing tables, attributes, data types for implementation. 8 Field NameData typeDescriptionLengthDecimals Prod#NumericUnique prod code60 DescrTextShort prod description 250 PriceCurrencyProduct price62

9 Denormalization Example (for 1:1) Parts(Part#, PartName, ) Container (ContainerID, #fin, #needed, Part#) Parts(Part#, PartName, ContainerID, #fin, #needed) DECIDING ON TABLES 9 Denormalization is going back in the normal forms to reduce schema overhead

10 DECIDING ON TABLES.. Denormalization Example (for M:N) ORDERS PRODUCTSAre for Ord# Ord_dt Qty Prod# Descr. What tables does normalization result in? 10

11 Orders(ord#, ord_dt,..) Product(prod.#, descr,..) Orders for prod (prod.#, ord#, qty) DENORMALIZATION Orders(ord#, ord_dt,..) Product(prod.#, ord#, descr., qty..) 11

12 COMPONENTS OF PHYSICAL DESIGN.. 1. Implementation design 2. Storage and access strategies 3. Distribution strategies 4. File organizations 5. Specifications for integrity constraints (later) 12

13 STORAGE & ACCESS STRATEGIES Estimate storage requirements (Volume analysis) Determine media to be used (not discussed) Study how data is being acccessed (Usage analysis) Use these to develop file organization (later) OBJECTIVES 13 ALSO CALLED VOLUME & USAGE ANALYSIS Volume and Usage analysis is carried out with a composite usage map.

14 COMPOSITE USAGE MAP Used for volume & usage analysis  file org. Superimposed on ER Chart Attributes are not shown Shows estimated number of records (volume) Shows type of access (dotted lines ) A composite usage map is simply an ER chart (without attr), that shows the number of records, and the frequency/pattern with which they are accessed. 14

15 VOLUME & USAGE ANALYSIS 15 Equipment, Parts and PE tables Equipment: 100; Parts:12,000; PE: 10,000 20 inquiries per hour to Equipment 300 inquiries per hour on Parts table 70% of these inquiries also need to know Equipment info. Draw a composite usage map, estimate storage requirements and develop a suitable file organization

16 COMPOSITE USAGE MAP EQUIPMENT PARTS ARE FOR (100) (12,000) PE (10,000) 20 ???? ??? 16

17 FOR DISCUSSION How can one estimate the size of a database? 17

18 ESTIMATING STORAGE REQMTS. FOR PARTS AND EQUIPMENT 7 10 12 2 1 1 EQUIPMENT (Model#, Descr, Mfr., Price, HP, WT) 1 10 12 2 PARTS(Part#, Descr, Mfr, Price) 7 1 1 PE (Model#, Part#, Qty) 18 Equipment table: 7+10+12+2+1+1 = 33 bytes/record Parts table: ?? PE table: ?? Total storage requirements = ??

19 STORAGE REQUIREMENTS RECORD SIZE: 33 Bytes # OF RECORDS: 100 FILE SIZE: 33 * 100 = 3,300 Bytes EQUIPMENT TABLE: 19 PARTS TABLE: RECORD SIZE: 25 Bytes # OF RECORDS: 12,000 FILE SIZE: 25 * 12000 = 300,000 Bytes PE TABLE: RECORD SIZE: 10 Bytes (approx) # OF RECORDS: 10,000 FILE SIZE: 10 * 10000 = 100,000 Bytes TOTAL STORAGE: ??????

20 A MORE ELABORATE EXAMPLE Parts are manufactured parts and purchased parts Parts: 1,000; Suppliers:50; Quotations: 2,500 Total of 200 parts inquiries 60 direct inquiries to purchased parts Of the purchased parts inquiries, 80 are also to quotation Of these 80, 70 are to supplier as well. 75 direct queries to supplier Of these 40 are for quotation All of these are also for parts 40% 70% 20

21 ANOTHER EXAMPLE.. PART MANU- FACTURED PURCH- ASED SUPPLIER QUOTA- TION Is -a (1000) (400) (700) 40%70% (2500) (50) 200 140 60 A COMPOSITE USAGE MAP 75 4080 70 40 21 80 Note: # of records are in red; the # of accesses are in blue

22 COMPONENTS OF PHYSICAL DESIGN.. 1. Implementation design 2. Storage & access strategies 3. Distribution strategies 4. File organizations 5. Specifications for integrity constraints (later) 22

23 1. Centralized 2. Distributed Replicated (not discussed) Partitioned DISTRIBUTION STRATEGIES Distribution strategies are concerned with where the files are physically located. 23

24 DISTRIBUTION STRATEGIES Centralized -- All the data is stored in one physical location. Distributed -- The data is stored in multiple physical locations. Replicated -- The database is duplicated in multiple locations. Partitioned -- The database is divided into “fragments” and each fragment is stored in a different location. 24

25 CENTRALIZED VS DISTRIBUTED Which is bottleneck? Which causes security problems? Which method may be required for business reasons? In which setup is data more accessible? Which provides better performance? 25

26 CENTRALIZED STRATEGY Maximize local access, minimize remote access General Principle: S1 S2 S3 100 500 600 WHERE SHOULD WE LOCATE THE DATABASE? S1, S2 or S3 26

27 This slide is blank

28 DISTRIBUTED DATABASE EIDNameCity 2356ArmstrongLA 3286NickersonSF 3356ForresterMPLS LA SF MPLS partitioning

29 COMPONENTS OF PHYSICAL DESIGN.. 1. Implementation design 2. Storage & access strategies 3. Distribution strategies 4. File organizations 5. Specifications for integrity constraints (later) 29

30 FILE ORGANIZATION Tracks Sectors File 1 Rec. 1,2.. How records are arranged and retrieved from secondary storage or mapping between ____ and ______? 30

31 DATA ACCESS (FYI) Hard drive IOP FAT/NTFS O/SDBMS Requests Consults Directory tables Generates instructions to IOP Partition RAM 31 Database storage User

32 FILE ORGANIZATION Retrieval time (disk access) Access type (direct, sequential) Storage space Maintenance effort Selection Criteria 32

33 OVERVIEW OF FILE ORGANIZATIONS Sequential Hashed Indexed ISAM VSAM 33

34 OVERVIEW OF FILE ORGANIZATIONS.. Sequential -- Records are stored one after another in pkey sequence. Hashed --Record address is determined by subjecting pkey to hashing algorithm. Indexed --Same as sequential except that there is an index file which places keys into a separate file for ease of searching. 34

35 THE SEQUENTIAL ORGANIZATION Records in Pkey sequence Access only sequential Insertions/Deletions in sequential order Simple organization good for batch updates Part#Descr. 100Aux. motors 120Scrapers 124Rotors................. 35

36 THE HASHING ORGANIZATION A type of file organization where record addresses are generated by subjecting primary keys to a hashing routine, usually by dividing by a prime# Hashing Algorithm PkeyHash Address = REM [(Pkey)/(Prime#)] + Address of Starting Block 36 3432

37 HASHING CONCEPTS Hashing algorithm Hash address Buckets & Bucket size Slots Collisions/overflows Load factor Search length 1 2 3 4 5 6 7.. n Record address = hash address + physical addr 37 Following are important concepts in hashing: 3432 Pkey = 43 Hash address = (43 remainder 7) = 1 Record address = 3432 + 1 = 3433 43 File space

38 HASHING CONCEPTS.. Hashing algorithm – the formula used to calculate a record address Hash address – an address (within block) where a hashed record is stored Buckets – storage area for a group of records; bucket size refers to # of slots. Slots – storage area for an individual record Collision – when two records hash to the same address Load factor – is the ratio of # of records to the total space allocated Average search length – is the time it takes to retrieve a record on the avg. (usually expressed in terms of disk accesses) Disk access – every time a disk is accessed for getting a record (if the record is stored in its hardware address, one access otherwise it depends on record location) 38

39 HASHING ALGORITHM Choose load factor Identify # of buckets to be allocated Select a prime# close to this number Divide each pkey by prime# Remainder = record address Sequentially number the buckets Place each record to its address If there are overflows, use Open 39

40 HASHING CONCEPTS.. 1234567..n Collision: When two keys hash to the same address Open overflow (store in unallocated slots) Chained overflow (a separate area) OVERFLOWS 40

41 HASHING EXAMPLE Given Part#s: 100Gears 120Scrapers 130Aux motors 140Crankshafts 145Cylinder heads 150Pistons 100 Mod 7 = 2 120 Mod 7 = 1 130 Mod 7 = 4 140 Mod 7 = 0 145 Mod 7 = 5 150 Mod 7 = 3 assume 8 buckets (0..7) assume 1 slot per bucket assume disk access time of 20 ms 41

42 HASHING EXAMPLE.. 0 1 2 3 4 100 Gears 120 Scrapers 130 Aux. motor 5 140 Crankshaft 145 Cylinders FILE LOADINGS 150 Pistons 6 Insert: 135 Shovel? 135 Mod 7 = 2 Average search length? 6 records -> 1 access 1 record -> 2 accesses 7 Load factor: ? Bucket size = ? 42

43 THE HASHING ORGANIZATION H(pkey) --> record address Records in hash sequence Need to allocate extra space Load factor between 60-80% Good for low activity (FAR) files Real-time and OO applns. EVALUATION 43

44 DISCUSSION A parts file with Part# as the pkey includes records with the following part# values: 23,37,46,48, 56,18, 10, 71, 16, 24, 39, 47 and 69. The file uses 8 buckets numbered 0 to 7. Each bucket holds two records. Load these records into the file in the given order using the hash function h(K) = K mod 8. Calculate the average search length in terms of # of disk accesses. Assume 20ms disk access. 44

45 INDEXED ORGANIZATION Primary key Secondary key Clustered A method of file organization where a subset of key values are stored in an index. Types are: 45

46 Records are in pkey sequence (master file) But are organized into groups Grouping information is stored in index file Records can be inserted at random Records can be accessed in sequence or at random THE INDEXED ORGANIZATION (ISAM) 46

47 ………. 104-108 100-103 Index file (index set) Master file (sequence set) Emp ID Angela108 Scott104 Becky103 Jacob101 name ID# THE INDEXED ORGANIZATION 47

48 THE INDEXED ORGANIZATION TRACKS CYLINDER1 48 CYLINDER2 CYLINDER1 CYLINDER2

49 THE ISAM ORGANIZATION 87 189 300 Cylinder index 43 69 87 136 172 189 250 300 24 32 43 45 62 69 Track index Overflow tracks Sequence Set 122 136 141 150 172 CYLINDER1 CYLINDER N.. Index Set 74 77 87 175 181 189 278 281 300 … …. … Note: Assume that the corresponding HW addresses are stored along with the pkeys 49

50 INSERTIONS IN ISAM Identify track where record needs to be inserted If the track is full, insert in overflow area If the track has room insert pkey in sequence Update track index and cylinder index if necessary 50

51 ISAM: ADVANTAGES AND DISADVANTAGES Access is direct or sequential? Access time dependent on? Rewrite sequentially Retrieval time uniform Suitable for volatile files? Workhorse organization used in most apps. 51

52 SECONDARY KEY INDEX REC# E_SSN E_NAME E_TITLE E_SALARY 1.456-34-8895 Smith Developer $35,000 2.459-66-6785 Johnson Analyst $27,000 3.467-89-8898 Weintraub Developer $60,000 4.478-64-8005 Dickson Manager $64,000 5.489-12-5575 HollandAnalyst $47,000 6.492-93-4438 Rao Analyst $71,000 7.537-89-8898 McDonald Manager $85,000 EMPLOYEE E_TITLE REC# Analyst 2,5,6 Manager 4,7 Developer 1,3 52

53 CLUSTERED INDEX Address e_ssn e_name e_title e_salary 1.459-66-6785 Johnson Analyst $27,000 2.489-12-5575 HollandAnalyst $47,000 3.492-93-4438 Rao Analyst $71,000 4.478-64-8005 Dickson Manager $64,000 5.467-89-8898 McDonald Manager $85,000 6.467-89-8898 Weintraub Programmer $60,000 7.456-34-8895 Smith Programmer $35,000 EMPLOYEE E_title Address Analyst 1 Manager 4 Developer 6 Also known as Inverted file organization 53

54 INDEXING STRATEGIES Index if you must Index on pkey Index on foreign keys Index on secondary key (depending on query frequency) 54

55 DISCUSSION What activities are part of identifying storage strategies? How is denormalization carried out for M:N relationships? How many indexes can you have per table? How many clustered indexes? Can we sequentially update all records in a) hashing organization? b) in indexing? Is indexing suitable for volatile files? If an index consists of 3 levels of indexes with the main index in RAM, and a disk access time of 20 MS, how long on the average does it take to retrieve a record? What problems do overflow records cause in hashing? A file is required to store 60,000 records; how much space is required in order to store the records using a hashing organization? 55

56 THE END! 56


Download ppt "DATABASE PHYSICAL DESIGN Chandra S. Amaravadi 1. INTRODUCTION 2."

Similar presentations


Ads by Google