1 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation & Data Access Methods By Dr. Akhtar Ali
2 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 1. Storage Allocation
3 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation – Logical and Physical View
4 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation - Physical Files Allocation How will the database be physically stored ? n One physical file or many ? –e.g. all data in one physical file ? –or each table or record type in its own physical file ? –data definitions (metadata) or indexes in separate files ? n On one disk or over several ? »or even distributed across a network ? n What is the optimum block size for each file ? –large block size allows more records to be read together in one physical read »useful for sequential access or when related records are stored together –small block size is more efficient if records are accessed in a random manner – block size should be chosen to accommodate the most frequently accessed physical groups of records »usually operation system specific - e.g. x*512 bytes for small up to 4k for large blocks - Windows NT, 2k for small and up to 32k for large - UNIX
5 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation – Physical Data Distribution
6 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation – Physical Memory Allocation
7 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Example – File Allocation in Oracle SQL n CREATE DATABASE DATAFILE... »specifies a.CTL file to hold all control data »specifies also several system files containing all table data unless storage areas are explicitly specified n CREATE [TEMPORARY] TABLESPACE DATAFILE... »used to create separate storage for system operations or database data »physical file will be automatically mapped by the DBMS to »the can include full path allowing using network files
8 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 File Allocation in Oracle SQL - continued n Databases with explicit clauses for datafile control: »controls the overall growth of the database for physical storage of data through a set of specified parameters n Datafile parameters –MAXDATAFILES - limits the number of datafiles which can be opened for one database –AUTOEXTEND (On or Off) - allows allocating additional memory for the next data segments after the file gets full –NEXT - the size of the next physical block for extending the file –MAXSIZE - controls the limit for extending of a datafile n Example CREATE DATABASE newtest DATAFILE 'diska:dbone.dat' SIZE 2M MAXDATAFILES 10 DATAFILE 'disk1:df1.dbf' AUTOEXTEND ON 'disk2:df2.dbf' AUTOEXTEND ON NEXT 10M MAXSIZE 128M
9 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation - Database Tables Storage How will the database tables be physically spread? n Entirely on the disk and/or in the cashed memory? »Frequent vs. infrequent data use n In one physical storage area (block) or in several? –All data is static, no growth of tables projected –Dynamic data, table growth predicted n What is the size for each physical and logical storage area to be used? –Initial storage size –Size and number of the automatics extensions –Limits for extending the storage area
10 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation – Database Tables storage - cntd
11 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Example - Table Storage in Oracle SQL - continued n Tables with clauses for explicit tablespace control »control the growth of the tablespace segments used for physical storage of database tables through a set of specified parameters n Tablespace parameters –INITIAL [K|M] - the original size of the tablespace –NEXT [K|M] - the size of the first physical block for extending the tablespace (extent) –MINEXTENTS - indicative number of extensions –MAXEXTENTS - limiting number of extensions –PCTINCREASE - the percentage of increase of NEXT –OPTIMAL [K|M] | NULL - recommended value for NEXT n Example CREATE TABLE salgrade (grade NUMBER CONSTRAINT pk_salgrad PRIMARY KEY, losal NUMBER, hisal NUMBER) TABLESPACE human_resource STORAGE (INITIAL 64 NEXT 64 MINEXTENTS 1 MAXEXTENTS 5)
12 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation - Records Placement For each record type, it is necessary to specify how and where it will be stored n Each record type should be stored in a way which gives best performance for the most important functions –the most frequent, on-line functions are likely to be most important –infrequent or off-line (batch) functions are probably less important –but also depends on the business perspective n Analyse the types of access required by these functions : –e.g. store new record ? –access an individual record directly via the primary key ? –access a range of records sequentially in primary key sequence ? –access a record or records from a related master record ? –access via a secondary key ? –access records in no particular sequence ?
13 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation - Records Placement - cntd n Records may be stored continuously, but record placement will also depend on the number of records –e.g. if there only a few records then they can be stored in one physical block –e.g. related records can be stored together, but not if the number is large n Records may be stored serially as they arrive –simply add new records to the end of the file, and extend file when full –a good method for storing transaction data or archiving »where the main overhead is storing new records »but the data is infrequently accessed n Records may be stored sequentially in primary key order for fast range search and direct match –allows sequential access for batch processing of similar data n Records may be stored randomly using prim. key algorithm –allows fast access for processing of single matching data
14 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation - Records Placement - cntd n Indexed Sequential - the most popular –the primary key index can be a very efficient ‘limit’ index »the index only needs to record the highest key value in each block »the index does not need updating when records are added or deleted –e.g. store Order records in Order number sequence to allow efficient production of pick lists, invoices etc. IndexB1R4 B2R10 B3R14 B4R20 Database File - blocks B1, B2 etc, containing data records R1, R3 etc. R1R3 R4 R6R7R11R14R15R16 R10 R18R20 B1B2 B3 B4 where will record R12 be stored ? where will record R5 be stored ?
15 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation - Records Placement - cntd n Records may be stored randomly using an algorithm on the primary key (hashing) –allows direct, fast access to individual records –no need to maintain or access an index –but sequential access will be very inefficient »it will require an index to be maintained, or the records sorted –e.g. store Customer records according to an algorithm on Cust ref –algorithm = divide key value by 1000 and use remainder as address B1B2 B3 B4 R1R1001 R3001 R2002R2R1003R4003R1004R3004 R2003 R4 no need for an indexwhere will record R5123 be stored ? how many blocks in file ?
16 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation - Records Placement - cntd n Records may be stored in physical groups of related records (clusters or partitions) –the master record can be stored as required - serial / sequential / random –the detail records are then stored in the same or adjacent block(s) –e.g. store Order Header and Order Item records together in same block(s) –related records can be read together in one physical read from disk –but if detail records need to be accessed independently of master then they will have to be indexed additionally n Both random and sequential storage require overflow facilities and periodic reorganisation B1B2 B3 B4 H23I23/1 I23/2 H92I92/1H16I16/1H74I74/1 I16/2 I92/2I92/4
17 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Clustered tables
18 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Example - Clustered tables in Oracle SQL n CREATE CLUSTER [.] ( ) [TABLESPACE ] … »clusters store records from different tables sharing the same cluster key »clusters can be sorted or hashed for fast information retrieval n Example: hashed cluster containing two tables CREATE CLUSTER personnel (deptno NUMBER(2), phoneno INTEGER) HASHKEYS 20; CREATE TABLE dept (deptno NUMBER(2), dname VARCHAR2(9),loc VARCHAR2(9)) CLUSTER personnel (deptno); CREATE TABLE emp (empno NUMBER(4), ename VARCHAR2 (30), phoneno INTEGER) CLUSTER personnel (deptno, phoneno) For physical grouping of records into single storage area
19 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Partitioned tables
20 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Example - Partitioned tables in Oracle SQL n Used for both table and index data storage n Both physical (e.g. size) and logical criteria for partitioning (e.g. interval of values) n Partitions are accessible by name directly in SQL n Example: table partitioning by the date values of an attribute CREATE TABLE xansactions (trade_date DATE, num_shares NUMBER(10), price NUMBER(5,2)…) STORAGE (INITIAL 100K NEXT 50K) LOGGING PARTITION BY RANGE (trade_date) (PARTITION sx1992 VALUES LESS THAN (TO_DATE('01-JAN-93','DD-MON-YY')) TABLESPACE ts0, PARTITION sx1993 VALUES LESS THAN (TO_DATE('01-JAN-94','DD-MON-YY')) TABLESPACE ts1, … For logical partitioning of physical storage area into parts
21 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Indexed-organized tables
22 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Example - Index organized tables in Oracle SQL n The primary key of the table is ordered for fast exact match and range search n All attributes are stored together with the primary key directly into the index space, so any new placements or updates do not require reordering CREATE TABLE docindex(token char(20), doc_id NUMBER, token_frequency NUMBER, token_offsets VARCHAR2(512), CONSTRAINT pk_idx PRIMARY KEY (token, doc_id)) ORGANIZATION INDEX TABLESPACE ind_tbs... For sequential ordering of the physical location of table records
23 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation - Records Placement - ctnd Record type : CUSTOMER Type of access Functions On-line/ StorePrimary key Direct Off-lineDirectSequentialCust name New CustomerOn100/day Place OrderOn1000/day Print InvoicesOff5000/week EnquiryOn200/day100/day Add other access types and functions as required It may be useful to analyse the record access requirements:
24 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation - Linking Related Records For each relationship type, how will the physical access path, from one record to its related records, be implemented ? n By physical grouping (i.e. clustering) –i.e. by storing records together as described above –a relationship where the master and its detail records are stored in the same physical group is called a ‘primary’ relationship in SSADM –other relationships, where the master and detail records are physically separated are known as ‘secondary’ relationships in SSADM n By logical separating (i.e. partitioning) –Storing records in subsequent partitions, i.e. splitting the year into monts –Each partition can be managed separately (storing, searching, backup, etc.) –Each partition can be also indexed and the indexes can be also partitioned
25 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation - Linking Related Records - cntd n Records may be stored in physical sequences (chains) by linked lists –the addresses of related records are stored with the data record itself »e.g. a Customer record might hold the address of the latest Order record for that Customer »each Order record could hold the address of the previous Order record for that Customer, and the address of the Customer record itself Customer record address of latest Order record address of previous Order address of Cust record Order record 1089 address of previous Order address of Cust record Order record 972
26 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage Allocation - Linking Related Records - cntd n By primary key ordering (i.e. record sorting) –requires an index on the foreign key in the detail record –gives a relatively inefficient access path for more records »the index will create an overhead whenever new detail record is added »to find a record from a secondary index may require several reads –but it is easy to add or change relationship types to database schema n By foreign key ordering (i.e. storage indexing) –the key values and address of detail records can be held in a small index stored directly with the master record; so they can be found quickly »e.g. for every Customer record create an index for their Order records –in a relational database, this could be done by creating a link table containing only the key values of the master and detail record : Link Table :MasterDetail M1 D2 M1 D9 M2 D5 M3 D1 etc.
27 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 2. Data Access Methods
28 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Data Access - Accessing Records How will records need to be accessed ? this will have been analysed already to determine the record placement n Individual, direct access using the primary key value ? –may be provided by algorithmic random or indexed sequential record placement –otherwise, create a hashed or sorted, unique primary key index n Via related records ? –master-detail and base-lookup relations –see ‘Linking Related Records’ above n Sequential access in primary key order? –may be provided by indexed sequential record placement –otherwise, create a sorted, unique primary key index to read indirectly n By secondary keys, in a group or individually? –create additional sorted indexes for each such key –create additional hashed indexes for any secondary keys where only individual, direct access is ever required
29 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Data Access - Index Types Indexing can be applied to both the data records (logical) and their storage (physical). There are usually two types of indexes: n Hashed indexes –the key values are stored within the index using a hashing algorithm »allows fast direct access to data records via the hash key »does not allow sequential access n Sorted indexes –the key values and record addresses are sorted into a key sequence –the index usually has a tree structure (B-tree index), but it can be also just simple enumeration –data records can be found fairly quickly directly –the index can be used to read the data records sequentially »but not as efficiently as with sequential record placement n Functional indexes –the key values are calculated using pre-specified function
30 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Data Access - Index Types - ctd n B-tree indexes –b-tree indexes are organized into ‘tables’ (of key values and addresses) –i.e. a tree structure of index levels from a ‘root’ through ‘branches’ to ‘leaves’ –the leaf tables contain the key values and addresses of the data records –the branch tables index the leaves or lower-level branches –to find a record, the root is checked, then the appropriate branches down the tree are read to find the index table containing the record address and hence the data record itself –as leaf tables fill up, they are split and the branch tables are updated –indexes need periodic rebuilding to minimise table-splitting –do not create unnecessary indexes root leaves branches records
31 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Data Access - Processing Indexed Data n Indexing the data records do not change the result of processing, but have substantial impact on the performance –database without indexes can work only when small number of records –data records may have more then one index for different operations –in principle, all the attributes in a data record could be indexed separately and/or jointly using composite indexes (fully indexed tables) n Secondary indexes will degrade performance for updates –the index must be updated every time a record is added or deleted or the key value amended –this may involve several physical updates of the index for each record update n Indexes can be processed as normal data records –i.e. partitioned data should have partitioned indexes as well n When loading data into a database –remove all indexes from the schema –load the data –rebuild the indexes
32 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Indexing options in Oracle SQL n CREATE [UNIQUE | BITMAP] INDEX ON ( ) [ ]... »Index the table using column directly selected from the indexed table n CREATE [UNIQUE | BITMAP] INDEX ON [ ] … »Index the table using column selected from a cluster of tables with common columns, in which the indexed table belongs For creating indexes, specifying different index clauses and options and allocating storage for them
33 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Indexing options in Oracle SQL - continued n index can be stored in the same or different physical files to data records (depending on the frequency of table updates) n index can be independent or functionally dependent on the indexed columns (index function) n record placement is defined by the type of index –a hashed index gives hashed record placement –a sorted index gives logically sequential record placement –bitmap indexes use physical storage locators for record placement n additional clauses allow records (rows) of the table to be distributed over more than one physical file, as well as their indexes –either ‘randomly’ (i.e. arbitrarily, not hashed) –or partitioned ‘horizontally’ by key value hashing
34 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Indexing options in Oracle SQL - continued n Example: hashed index CREATE INDEX sales_idx ON sales(item) STORE IN (tbs1, tbs2) n Example: bitmapped index (Oracle 8) CREATE BITMAP INDEX partno_ix ON lineitem (partno) TABLESPACE ts1 n Example: partitioned index (Oracle 8i) CREATE INDEX stock_ix ON stock (stock_symbol, stock_line) GLOBAL PARTITION BY RANGE (stock_symbol) PARTITION VALUES LESS THAN ('N') TABLESPACE ts3, PARTITION VALUES LESS THAN (MAXVALUE) TABLESPACE ts4)
35 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7