Alternative Storage Techniques
Objectives After completing this lesson, you should be able to identify the characteristics and benefits of: Index-organized tables External tables Objectives After completing this lesson, you should be able to identify and describe the benefits of: Index-organized tables External tables
Index-organized table Storing User Data Regular table External table OS file Storing User Data Oracle9i Server offers a wide range of techniques to store user data: regular tables, index-organized tables, partitioned tables, and external tables. Partitioning is not covered in this course. This lesson concentrates on the other, nonregular table storage techniques: index-organized tables and external tables. Changing the physical storage characteristics of tables sometimes results in significant performance improvements for certain SQL statements. However, be aware that there are usually negative side effects. Change regular tables into index-organized tables or external tables when the advantages outweigh the disadvantages. Note: For more information about partitioning, attend the class Oracle9i Database: Implement Partitioning. Index-organized table
Index-Organized Tables Indexed access on table Accessing index- organized table ROWID Non-key columns Index-Organized Tables: Overview An index-organized table (IOT) is like a regular table with a concatenated index on all of its columns. However, instead of maintaining two separate segments for the table and the B*-tree index, the Oracle server maintains a single B*-tree structure that contains: Primary key value Other (non-key) column values for the row The B*-tree structure, which is based on the primary key of the table, is organized like an index. The leaf blocks in this structure contain the rows instead of the ROWIDs. This means that the rows in the IOT are always maintained in the order of the primary key. You can create additional indexes on IOTs. Because large rows of an IOT can destroy the dense and efficient storage of the B*-tree structure, you can store part of the row in another segment, which is called an overflow area. This is discussed in the following slides. Key column Row header
IOT Performance Characteristics B*-tree storage Full row stored Sorted rows Logical ROWIDs Fast, key-based access to table data IOT Performance Characteristics You can access an IOT by using either the primary key or a combination of columns that constitute the leading part of the primary key. IOTs provide fast, key-based access for queries involving exact match (equality operator) or range searches on the primary key. Because the rows are ordered in the IOT, full scans on an IOT return rows in a primary key sequence. Because there is no duplication of primary key values (compared with regular tables: index segment and data segment), IOTs use less storage. Index organization is useful for a table that is frequently accessed using the primary key and has only a few, relatively short non-key columns.
IOT Requirements Must have a primary key Cannot be part of an index cluster or hash cluster Cannot contain LONG columns (although LOB columns are allowed) IOT Requirements Index-organized tables: Must have a primary key. This is the unique identifier and is used as the basis for ordering; there is no ROWID to act as a unique identifier in the B*-tree structure. Cannot be part of an index cluster or a hash cluster Cannot include LONG columns but can contain LOB columns Because IOTs are B*-tree structures, they are subject to fragmentation as a result of incremental updating. You can use the ALTER TABLE … MOVE command to rebuild the IOT: ALTER TABLE iot_tablename MOVE [OVERFLOW...]; Specifying the optional OVERFLOW clause causes the overflow segment to be rebuilt as well. Overflow segments are explained in the following slides.
Benefits of IOTs IOTs provide fast key-based access for queries involving exact match and range searches. DML causes only updates to index structure. Storage requirements are reduced. IOTs are useful in: Applications that retrieve data based on a primary key Applications that involve content-based information Benefits of IOTs Index-organized tables provide fast key-based access to table data for queries involving exact match and range searches. Changes to the table data result only in updating the index structure. Also, storage requirements are reduced because key columns are not duplicated in the table and index. The remaining non-key columns are stored in the index structure. IOTS are particularly useful when you are using applications that must retrieve data based on a primary key. IOTs are also suitable for modeling application-specific index structures. For example, content-based information retrieval applications containing text, image, and audio data require inverted indexes that can be effectively modeled using IOTs. No special considerations exist for using most SQL statements against an IOT. The structure of the table should be completely transparent to the user. The Oracle database utilities also support IOTs. SQL*Loader (using direct path mode) can load data directly to an IOT. This kind of loading can be quicker than loading to a standard table and then building the indexes.
Creating Index-Organized Tables CREATE TABLE table-name ( column_definitions [,constraint_definitions] ) ORGANIZATION INDEX [ TABLESPACE tablespace ] [ PCTTHRESHOLD integer [ INCLUDING column_name ] ] [ OVERFLOW segment_attr_clause ] Creating Index-Organized Tables The ORGANIZATION INDEX clause of the CREATE TABLE statement specifies that you create an IOT. You must specify a primary key constraint when creating IOTs. If you try to create an IOT without a primary key, the following error is generated: ORA-25175: no PRIMARY KEY constraint found PCTTHRESHOLD specifies the percentage of space reserved in a block for a single row in an IOT. If a row exceeds the size that is calculated based on this value, all columns after the column named in the INCLUDING clause are moved to the overflow segment. If OVERFLOW is not specified, then rows exceeding the threshold are rejected. PCTTHRESHOLD defaults to 50 and must be a value from 0 through 50. INCLUDING column_name specifies a column at which to divide an IOT row into index and overflow portions. All columns that follow column_name are stored in the overflow data segment. If a column is not specified and a row size exceeds PCTTHRESHOLD, all columns except the primary key columns are moved to the overflow area. The column is either the name of the last primary key column or any nonprimary key column. OVERFLOW specifies that IOT rows exceeding the specified threshold be placed in the data segment that is defined by segment_attr_clause, which specifies the tablespace, storage, and block utilization parameters.
Segment <PK name> type: index Segment SYS_IOT_OVER_n type: table IOT Row Overflow IOT tablespace Overflow tablespace Segment <PK name> type: index Segment SYS_IOT_OVER_n type: table Block Block Rows in PCTTHRESHOLD Overflow row piece IOT Row Overflow Large rows in an IOT can destroy the dense B*-tree storage. You can overcome this problem by using an overflow area. Note that you need additional I/Os to retrieve these large rows because the rows are stored in two pieces. This results in a decrease in performance compared to an IOT with shorter records that are mostly stored entirely in the IOT segment alone. When an IOT is created by specifying an OVERFLOW clause, the following three segments are created in the database: A logical table with the name defined in the CREATE TABLE clause An index with the same name as the primary key constraint A table to accommodate the overflow row pieces (The name of this table is SYS_IOT_OVER_n, where n is the OBJECT_ID of the IOT as seen from USER_OBJECTS.) Note: If you create an IOT without specifying an OVERFLOW clause, only the first two segments are created. Give the primary key constraint a name so that the index segment receives a meaningful name (rather than a system-generated name).
Retrieving IOT Information USER_TABLES USER_INDEXES TABLE_NAME IOT_TYPE IOT_NAME TABLESPACE_NAME TABLE_NAME INDEX_NAME INDEX_TYPE PCT_THRESHOLD INCLUDE_COLUMN Retrieving IOT Information from the Data Dictionary Use the following query to list the IOTs and information related to their structure: SQL> SELECT t.table_name as "IOT" 2 , o.table_name as "Overflow" 3 , i.index_name as "Index" 4 , o.tablespace_name as "Overflow TS" 5 , i.tablespace_name as "Index TS" 6 , i.pct_threshold as "Threshold" 7 FROM user_tables t 8 , user_tables o 9 , user_indexes i 10 WHERE t.table_name = o.iot_name 11 AND t.table_name = i.table_name;
External Tables SELECT * FROM ex_table; External table OS file External Tables: Overview Oracle9i Database provides a way to access data in external sources as if it were in a table in the database.The Oracle database allows you read-only access to data in external tables. External tables are defined as tables that do not reside in the database and can be in any format for which an access driver is provided. By providing the metadata describing an external table, the Oracle database is able to expose the data in the external table as if it were data residing in a regular database table. The external data can be queried directly and in parallel using SQL. You can join external table data to relational tables and perform sorts on external table data. You can also create views and synonyms for external tables. However, no DML operations (UPDATE, INSERT, or DELETE) are possible, and no indexes can be created on external tables.
External Tables: Performance Characteristics External tables are read-only tables. The metadata for an external table is created using a CREATE TABLE statement. Data can be stored outside the database as flat files. Indexes cannot be created on external tables. The DBMS_STATS package should be used for gathering statistics for external tables. Buffer cache is not used. External Tables: Performance Characteristics When you access the external table through a SQL statement, the fields of the external table can be used as you would use any column in a “normal” table. In particular, you can use the fields as arguments for any SQL built-in function, PL/SQL function, or Java function. You can thus manipulate data from the external source. For data warehousing, you can do more sophisticated transformations in this way than you can with simple data type conversions. You can also use this mechanism in data warehousing to do data cleansing. Although you can use external tables to access data that is stored in a file outside of the database, you cannot perform DML on that external data. Indexes are not supported on external tables. To gather statistics for an external table, use the DBMS_STATS package. The ANALYZE command is not supported. The buffer cache is not used to store data that is retrieved from external tables; therefore, repeated access of the external data does not benefit from caching.
Benefits of External Tables Provide a valuable means for performing basic extraction, transformation, and loading (ETL) Are transparent to users and applications Are a complement to the existing SQL*Loader functionality Are especially useful for environments in which: The complete external source must be joined with existing database objects and transformed in a complex manner The external data volume is large and used only once Benefits of External Tables External tables provide a valuable means for performing basic extraction, transformation, and loading (ETL) tasks that are common in data warehousing. They are especially useful for environments in which the complete external source must be joined with existing database objects and transformed in a complex manner, or environments in which the external data volume is large and used only once.
Creating External Tables CREATE TABLE table-name (column_definitions) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY dir_object_name ACCESS PARAMETERS (RECORDS DELIMITED BY newline BADFILE bad_dir:'bad_file_name.log' LOGFILE log_dir:'log_file_name.log' FIELDS TERMINATED BY ',' MISSING FIELD VALUES ARE null (column_names_in_file)) LOCATION ('file_name.dat')) PARALLEL REJECT LIMIT UNLIMITED; Creating External Tables The ORGANIZATION EXTERNAL clause of the CREATE TABLE statement specifies that you create an external table. There are several clauses that identify where the external file is located and how the external file is structured. A directory object is required before using the CREATE TABLE ... ORGANIZATION EXTERNAL statement. This directory object identifies the location of the external file on the operating system. Users need write access to the external table directory. Note: For details about how to create an external table, see the Oracle9i Database Administrator’s Guide Release 2 (9.2).
Retrieving External Tables Information TABLE_NAME LOCATION DIRECTORY_OWNER DIRECTORY_NAME TYPE_OWNER TYPE_NAME DEFAULT_DIRECTORY_OWNER DEFAULT_DIRECTORY_NAME REJECT_LIMIT ACCESS_TYPE ACCESS_PARAMETERS USER_EXTERNAL_TABLES USER_EXTERNAL_LOCATIONS Retrieving External Tables Information from the Data Dictionary Use the following query to list the external tables and information related to their structure: SQL> SELECT t.table_name 2 , t.type_name 3 , t.default_directory_name 4 , t.reject_limit 5 , t.access_type 6 , t.access_parameters 7 , l.location 8 , l.directory_name 9 FROM user_external_tables t 11 JOIN user_external_locations l 10 ON ( t.table_name = l.table_name);
Summary In this lesson, you should have learned how to: Create index-organized tables (IOTs) Identify performance characteristics Identify limitations Control row overflow Create external tables Summary This lesson introduced you to alternative storage techniques, including index-organized tables and external tables. An indexed-organized table (IOT) is like a regular table with an index on one or more of its columns. However, instead of maintaining two separate segments for the table and the B*-tree index, the Oracle server maintains a single B*-tree structure that contains the primary key value and other (non-key) column values for the row. An external table is a data source that is located outside of the Oracle database. After the external table is set up using the CREATE TABLE ... ORGANIZATION EXTERNAL syntax, you can query this data source using a SELECT statement.