Table Compression in Oracle9i R2 Plamen Zyumbyulev INSIDE OUT,, Let someone k n o w ”
Agenda Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion
Table Compression Facts Table compression is useful Everyone benefits from space saving It not only saves space but can increase performance It can’t be implemented everywhere
Why Table Compression ? Table Compression increases: –I/O-subsystem capacity –I/O throughput –query scan performance (mainly FTS) –buffer cache capacity Table Compression: –reduces cost of ownership –is easy to use –requires minimal table definition changes –is transparent to applications
Overview: Table Compression Compression algorithm is based on removing data redundancy Tables and Materialized Views can be compressed –Compression can also be specified at the partition level and tablespace level –Indexes and index-organized tables are not compressed with this method (there are other methods for index and IOT compression) Compression is dependent upon the actual data DDL/DML commands are supported on compressed tables Table columns cannot neither be added nor deleted from a compressed table.
Which Applications benefit from Table Compression? Table Compression targets read intensive applications such as Decision Support and OLAP All schema designs benefit from Compression
Agenda Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion
How does Table Compression work? Data is compressed by eliminating duplicate values in a database block First Name Last Name Scott Smith Henry Smith Henry Scott Henry-Scott McGryen Dictionary is built per block information to uncompress data is available in each block If column values from same or different columns have the same values, they share the same symbol table entry. Only entire column values are compressed. Sequences of columns are compressed as one entity if a sequence of column values occurs multiple times in many rows.
Block Level Compression Meyer 11 Homestead Rd Meyer11 Homestead Rd Meyer 11 Homestead Rd McGryen 3 Main Street McGryen 3 Main Street McGryen 3 Main Street 1.99 Non-Compressed Block Block Header Free Space Block Header Compressed Block Meyer 11 Homestead Rd 1.99 McGryen 3 Main Street Free Space Symbol Table Invoice CustName CustAddr Sales_amt Meyer 11 Homestead Rd Meyer 11 Homestead Rd Meyer 11 Homestead Rd McGryen 3 Main Street McGryen 3 Main Street McGryen 3 Main Street 1.99
How Table Compression works All columns are considered for compression Only worthwhile compression is performed Symbol table is created within each database block depending on block content –Self tuning symbol table is created automatically by the system –No explicit declaration of symbol table entries –Compression algorithm automatically adapts to changes in data distribution
Which data is compressed Compression occurs only when data is inserted with a bulk (direct-path) insert operation. –Direct Path SQL*Loader – insert /*+ append */ … – create table … as select … – alter table move … A table can consist of compressed and uncompressed blocks transparently. Any DML operation can be applied to a table storing compressed blocks. However, conventional DML operations cause records to be stored uncompressed*.
SQL Commands For a new table: –Create with compress attribute in table definition create table … compress For an existing table: 1.Alter table to add compress attribute only new rows are compressed alter table foo compress; 2.Compress table old and new rows are compressed alter table foo move compress;
Process of Compressing a Block
Deletes, Inserts and Updates Deletes, Inserts and Updates are possible but can cause fragmentation and waste disk space when modifying compressed data. Large PCTFREE will lead to low compression ratios. Setting PCTFREE to 0 (default) is recommended for all tables storing compressed data.
Updates When a column is updated the algorithm checks whether a symbol table entry for the new value exists. –If it exists, the reference of the updated column is modified to the new symbol table entry and its reference count is increased by one. At the same time the reference count of the old value is decreased by one. –If no symbol table entry exists for the new column value, that value is inserted non-compressed into the row.
UPDATE TABLE item SET i_color = ‘green’ WHERE i_color =’blue’ If the old column value (‘green’) was also compressed and its reference count after the update operation became zero, the old symbol table entry is replaced with a new symbol table entry without touching all rows of one block. Some update operations can take advantage of compression
Deletes During delete operations all references counters of the deleted rows are decreased by one. Once a reference counter becomes zero, the corresponding symbol table entry is purged. A symbol table is never deleted from a block even if no reference into it exists because the overhead of an empty symbol table is only 4 bytes.
Agenda Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion
Test Environment: One very big table – 2.3 TB Table is partitioned per day. One partition is around 3,2 GB Once the data is loaded and processed it becomes read only. Most of the table access is – FTS
Agenda Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion
Space Savings Table Compression significantly reduces disk and buffer cache requirements Compression results mostly depend on data content on block level Definitions: Compression Factor Space Savings Non Compressed Blocks Compressed Blocks CF= Non Compressed Blocks – Compressed Blocks Non-Compressed Blocks SS= x100
What affects Compression? Column lengthlongshort Number distinct valueslowhigh Block size large small Sorted datayes no Column sequenceyesno Modified datayesno Column lengthlongshort Number distinct valueslowhigh Block size large small Sorted datayes no Column sequenceyesno Modified datayesno Table Characteristic Compression Factor high low
Estimating CF by using data samples create function compression_ratio (tabname varchar2) return number is -- sample percentage pct number := ; -- original block count (should be less than 10k) blkcnt number := 0; -- compressed block count blkcntc number; begin execute immediate ' create table TEMP_UNCOMPRESSED pctfree 0 as select * from ' || tabname || ' where rownum < 1'; while ((pct < 100) and (blkcnt < 1000)) loop execute immediate 'truncate table TEMP_UNCOMPRESSED'; execute immediate 'insert into TEMP_UNCOMPRESSED select * from ' || tabname || ' sample block (' || pct || ',10)'; execute immediate 'select count(distinct(dbms_rowid.rowid_block_number(rowid))) from TEMP_UNCOMPRESSED' into blkcnt; pct := pct * 10; end loop; execute immediate 'create table TEMP_COMPRESSED compress as select * from TEMP_UNCOMPRESSED'; execute immediate 'select count(distinct(dbms_rowid.rowid_block_number(rowid))) from TEMP_COMPRESSED' into blkcntc; execute immediate 'drop table TEMP_COMPRESSED'; execute immediate 'drop table TEMP_UNCOMPRESSED'; return (blkcnt/blkcntc); end; /
Ordered vs. Not ordered The biggest CF increase comes from ordering the data
How Data volume affects CF Compression Factor 135 Days in one partition
Ordered Data Input Data 1234 Block1 5 Block2 4 rows per block compressed CF=2.5 Each value is compressed in one block Symbol table contains Block contains 20 values rows per block compressed CF=4 Block1 Not all values fit into first block Symbol tables contains Block contains 16 values Symbol tables contains only 5 Block contains 4 values 20 values Ordered 1 column row Sorting can also improve the clustering factor of your indexes.
Not Ordered Data Input Data 20 values Not ordered 1 column row rows per block compressed CF=4 Block1 CF= Block Block Block Block Block5 4 rows per block compressed
Choosing the columns to order by Sorting on fields with very low cardinality does not necessarily yield to better compression The optimal columns to sort on seem to be those that have a table/partition-wide cardinality equal to the number of rows per block Column correlation should be considered The process is iterative
Know your data Without a detailed understanding of the data distribution it is very difficult to predict the most optimal order. Table/partition statistics are useful –dba_tables –dba_tab_partitons Looking into a particular data block is very helpful –substr(rowid, 1, 15)
Improving ordering speed Set SORT_AREA_SIZE for the session as big as possible. Use dedicated temp tbs with big extent size (multiple of SORT_AREA_SIZE + 1 block) If the sort needs more space: The data is split into smaller sort runs; each piece is sorted individually. The server process writes pieces to temporary segments on disk; these segments hold intermediate sort run data while the server works on another sort run. The sorted pieces are merged to produce the final result. If SORT_AREA_SIZE is not large enough to merge all the runs at once, subsets of the runs are merged in a number of merge passes.
Agenda Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion
How CF affects FTS performance Queries are executed against compressed schema and non- compressed schema Overall query speedup 65%
Query Elapsed Time Speedup The larger the compression factor the larger the elapsed time speedup Query speedup results from reduction in I/O- operations required –Speedup depends on the weakness of the I/O- subsystem –Speedup depends on how sparse the blocks are that the query accesses
Performance impact on loads and DML On system with unlimited IO bandwidth, data load may be two times longer (even more if data need to be ordered). Bulk loads are IO-bound on many systems. Deleting compressed data is 10% faster. Inserting new data is as fast as inserting into non compressed table. UPDATE operations are 10-20% slower for compressed tables on average, mainly due to some complex optimizations that have been implemented for uncompressed tables, and not yet implemented for compressed tables.
Other Performance Tests Parallel load performance (CPU)
Delete operation CPU UtilizationUpdate operation CPU Utilization Delete/Update Performance
FTS Performance Parallel Full Table Scan CPU Utilization Parallel Full Table Scan IO Performance
Table Access by ROWID
Agenda Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion
Best Practices Use Compression in read intensive applications Execute bulk loads (SQLLDR and Parallel Insert) to compress rows Compress older data in large Data Warehouses –Integrate Table Compression into the ‘rolling window’ paradigm: Compress all but most recent partition Compress Materialized views Only compress infrequently updated tables
Data normalization and Table Compression “Normalize till it hurts, denormalize till it works” High normalization may result in a high number of table joins (bad performance) Both data normalization and table compression reduce redundancy
Conclusion Table Compression: –reduces costs by shrinking the database footprint on disk –is transparent to applications –often improves query performance due to reduced disk I/O –increases buffer cache efficiency
A Q & Q U E S T I O N S A N S W E R S