Table Compression in Oracle9i R2 Plamen Zyumbyulev INSIDE OUT,, Let someone k n o w ”

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
<Insert Picture Here>
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Performance And Tuning – Lecture 7 Copyright System Managers LLC 2007 all rights reserved.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Module 6 Implementing Table Structures in SQL Server ®2008 R2.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
A Guide to Oracle9i1 Advanced SQL And PL/SQL Topics Chapter 9.
Harvard University Oracle Database Administration Session 5 Data Storage.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
A Guide to SQL, Seventh Edition. Objectives Understand, create, and drop views Recognize the benefits of using views Grant and revoke user’s database.
Lecture 6 Indexing Part 2 Column Stores. Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)--
Backup & Recovery 1.
Oracle Database Administration Database files Logical database structures.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
7202ICT Database Administration Lecture 7 Managing Database Storage Part 2 Orale Concept Manuel Chapter 3 & 4.
Extents, segments and blocks in detail. Database structure Database Table spaces Segment Extent Oracle block O/S block Data file logical physical.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
9 Storage Structure and Relationships. 9-2 Objectives Listing the different segment types and their uses Controlling the use of extents by segments Stating.
1 Chapter 14 DML Tuning. 2 DML Performance Fundamentals DML Performance is affected by: – Efficiency of WHERE clause – Amount of index maintenance – Referential.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Views In some cases, it is not desirable for all users to see the entire logical model (that is, all the actual relations stored in the database.) In some.
Database structure and space Management. Database Structure An ORACLE database has both a physical and logical structure. By separating physical and logical.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
14 Copyright © 2006, Oracle. All rights reserved. Tuning Block Space Usage.
CS4432: Database Systems II Query Processing- Part 2.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
1 Chapter 13 Parallel SQL. 2 Understanding Parallel SQL Enables a SQL statement to be: – Split into multiple threads – Each thread processed simultaneously.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 5 : Integrity And Security  Domain Constraints  Referential Integrity  Security  Triggers  Authorization  Authorization in SQL  Views 
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
SCALING AND PERFORMANCE CS 260 Database Systems. Overview  Increasing capacity  Database performance  Database indexes B+ Tree Index Bitmap Index 
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Copyright © 2009 Rolta International, Inc., All Rights Reserved Michael R. Messina, Management Consultant Rolta-TUSC, Oracle Open World 2009 (60 min) ID#:
for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.
Unit 6 Seminar. Indexed Organized Tables Definition: Index Organized Tables are tables that, unlike heap tables, are organized like B*Tree indexes.
Table Structures and Indexing. The concept of indexing If you were asked to search for the name “Adam Wilbert” in a phonebook, you would go directly to.
CS4432: Database Systems II
SQL Basics Review Reviewing what we’ve learned so far…….
Select Operation Strategies And Indexing (Chapter 8)
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
SQL Server Statistics and its relationship with Query Optimizer
Practical Database Design and Tuning
Module 11: File Structure
Indexes By Adrienne Watt.
CS 540 Database Management Systems
Database Management System
SQL Implementation & Administration
Finding more space for your tight environment
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Chapter 15 QUERY EXECUTION.
Physical Database Design
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Practical Database Design and Tuning
Selected Topics: External Sorting, Join Algorithms, …
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Implementation of Relational Operations
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Table Compression in Oracle9i R2 Plamen Zyumbyulev INSIDE OUT,, Let someone k n o w ”

Agenda  Overview Table Compression  How does it work?  Test Environment  Space Savings  Query Performance  Conclusion

Table Compression Facts  Table compression is useful  Everyone benefits from space saving  It not only saves space but can increase performance  It can’t be implemented everywhere

Why Table Compression ?  Table Compression increases: –I/O-subsystem capacity –I/O throughput –query scan performance (mainly FTS) –buffer cache capacity  Table Compression: –reduces cost of ownership –is easy to use –requires minimal table definition changes –is transparent to applications

Overview: Table Compression  Compression algorithm is based on removing data redundancy  Tables and Materialized Views can be compressed –Compression can also be specified at the partition level and tablespace level –Indexes and index-organized tables are not compressed with this method (there are other methods for index and IOT compression)  Compression is dependent upon the actual data  DDL/DML commands are supported on compressed tables  Table columns cannot neither be added nor deleted from a compressed table.

Which Applications benefit from Table Compression?  Table Compression targets read intensive applications such as Decision Support and OLAP  All schema designs benefit from Compression

Agenda  Overview Table Compression  How does it work?  Test Environment  Space Savings  Query Performance  Conclusion

How does Table Compression work?  Data is compressed by eliminating duplicate values in a database block First Name Last Name Scott Smith Henry Smith Henry Scott Henry-Scott McGryen  Dictionary is built per block  information to uncompress data is available in each block  If column values from same or different columns have the same values, they share the same symbol table entry. Only entire column values are compressed.  Sequences of columns are compressed as one entity if a sequence of column values occurs multiple times in many rows.

Block Level Compression Meyer 11 Homestead Rd Meyer11 Homestead Rd Meyer 11 Homestead Rd McGryen 3 Main Street McGryen 3 Main Street McGryen 3 Main Street 1.99 Non-Compressed Block Block Header Free Space Block Header Compressed Block Meyer 11 Homestead Rd 1.99 McGryen 3 Main Street Free Space Symbol Table Invoice CustName CustAddr Sales_amt Meyer 11 Homestead Rd Meyer 11 Homestead Rd Meyer 11 Homestead Rd McGryen 3 Main Street McGryen 3 Main Street McGryen 3 Main Street 1.99

How Table Compression works  All columns are considered for compression  Only worthwhile compression is performed  Symbol table is created within each database block depending on block content –Self tuning symbol table is created automatically by the system –No explicit declaration of symbol table entries –Compression algorithm automatically adapts to changes in data distribution

Which data is compressed  Compression occurs only when data is inserted with a bulk (direct-path) insert operation. –Direct Path SQL*Loader – insert /*+ append */ … – create table … as select … – alter table move …  A table can consist of compressed and uncompressed blocks transparently.  Any DML operation can be applied to a table storing compressed blocks. However, conventional DML operations cause records to be stored uncompressed*.

SQL Commands  For a new table: –Create with compress attribute in table definition create table … compress  For an existing table: 1.Alter table to add compress attribute  only new rows are compressed alter table foo compress; 2.Compress table  old and new rows are compressed alter table foo move compress;

Process of Compressing a Block

Deletes, Inserts and Updates  Deletes, Inserts and Updates are possible but can cause fragmentation and waste disk space when modifying compressed data.  Large PCTFREE will lead to low compression ratios. Setting PCTFREE to 0 (default) is recommended for all tables storing compressed data.

Updates  When a column is updated the algorithm checks whether a symbol table entry for the new value exists. –If it exists, the reference of the updated column is modified to the new symbol table entry and its reference count is increased by one. At the same time the reference count of the old value is decreased by one. –If no symbol table entry exists for the new column value, that value is inserted non-compressed into the row.

UPDATE TABLE item SET i_color = ‘green’ WHERE i_color =’blue’  If the old column value (‘green’) was also compressed and its reference count after the update operation became zero, the old symbol table entry is replaced with a new symbol table entry without touching all rows of one block. Some update operations can take advantage of compression

Deletes  During delete operations all references counters of the deleted rows are decreased by one. Once a reference counter becomes zero, the corresponding symbol table entry is purged.  A symbol table is never deleted from a block even if no reference into it exists because the overhead of an empty symbol table is only 4 bytes.

Agenda  Overview Table Compression  How does it work?  Test Environment  Space Savings  Query Performance  Conclusion

Test Environment:  One very big table – 2.3 TB  Table is partitioned per day.  One partition is around 3,2 GB  Once the data is loaded and processed it becomes read only.  Most of the table access is – FTS

Agenda  Overview Table Compression  How does it work?  Test Environment  Space Savings  Query Performance  Conclusion

Space Savings  Table Compression significantly reduces disk and buffer cache requirements  Compression results mostly depend on data content on block level  Definitions: Compression Factor Space Savings Non Compressed Blocks Compressed Blocks CF= Non Compressed Blocks – Compressed Blocks Non-Compressed Blocks SS= x100

What affects Compression? Column lengthlongshort Number distinct valueslowhigh Block size large small Sorted datayes no Column sequenceyesno Modified datayesno Column lengthlongshort Number distinct valueslowhigh Block size large small Sorted datayes no Column sequenceyesno Modified datayesno Table Characteristic Compression Factor high low

Estimating CF by using data samples create function compression_ratio (tabname varchar2) return number is -- sample percentage pct number := ; -- original block count (should be less than 10k) blkcnt number := 0; -- compressed block count blkcntc number; begin execute immediate ' create table TEMP_UNCOMPRESSED pctfree 0 as select * from ' || tabname || ' where rownum < 1'; while ((pct < 100) and (blkcnt < 1000)) loop execute immediate 'truncate table TEMP_UNCOMPRESSED'; execute immediate 'insert into TEMP_UNCOMPRESSED select * from ' || tabname || ' sample block (' || pct || ',10)'; execute immediate 'select count(distinct(dbms_rowid.rowid_block_number(rowid))) from TEMP_UNCOMPRESSED' into blkcnt; pct := pct * 10; end loop; execute immediate 'create table TEMP_COMPRESSED compress as select * from TEMP_UNCOMPRESSED'; execute immediate 'select count(distinct(dbms_rowid.rowid_block_number(rowid))) from TEMP_COMPRESSED' into blkcntc; execute immediate 'drop table TEMP_COMPRESSED'; execute immediate 'drop table TEMP_UNCOMPRESSED'; return (blkcnt/blkcntc); end; /

Ordered vs. Not ordered  The biggest CF increase comes from ordering the data

How Data volume affects CF Compression Factor 135 Days in one partition

Ordered Data Input Data 1234 Block1 5 Block2 4 rows per block compressed CF=2.5  Each value is compressed in one block  Symbol table contains  Block contains 20 values rows per block compressed CF=4 Block1  Not all values fit into first block  Symbol tables contains   Block contains 16 values  Symbol tables contains only 5  Block contains 4 values  20 values  Ordered  1 column row Sorting can also improve the clustering factor of your indexes.

Not Ordered Data Input Data  20 values  Not ordered  1 column row rows per block compressed CF=4 Block1 CF= Block Block Block Block Block5 4 rows per block compressed

Choosing the columns to order by  Sorting on fields with very low cardinality does not necessarily yield to better compression  The optimal columns to sort on seem to be those that have a table/partition-wide cardinality equal to the number of rows per block  Column correlation should be considered  The process is iterative

Know your data  Without a detailed understanding of the data distribution it is very difficult to predict the most optimal order.  Table/partition statistics are useful –dba_tables –dba_tab_partitons  Looking into a particular data block is very helpful –substr(rowid, 1, 15)

Improving ordering speed  Set SORT_AREA_SIZE for the session as big as possible. Use dedicated temp tbs with big extent size (multiple of SORT_AREA_SIZE + 1 block) If the sort needs more space:  The data is split into smaller sort runs; each piece is sorted individually.  The server process writes pieces to temporary segments on disk; these segments hold intermediate sort run data while the server works on another sort run.  The sorted pieces are merged to produce the final result.  If SORT_AREA_SIZE is not large enough to merge all the runs at once, subsets of the runs are merged in a number of merge passes.

Agenda  Overview Table Compression  How does it work?  Test Environment  Space Savings  Query Performance  Conclusion

How CF affects FTS performance  Queries are executed against compressed schema and non- compressed schema  Overall query speedup 65%

Query Elapsed Time Speedup  The larger the compression factor the larger the elapsed time speedup  Query speedup results from reduction in I/O- operations required –Speedup depends on the weakness of the I/O- subsystem –Speedup depends on how sparse the blocks are that the query accesses

Performance impact on loads and DML  On system with unlimited IO bandwidth, data load may be two times longer (even more if data need to be ordered).  Bulk loads are IO-bound on many systems.  Deleting compressed data is 10% faster.  Inserting new data is as fast as inserting into non compressed table.  UPDATE operations are 10-20% slower for compressed tables on average, mainly due to some complex optimizations that have been implemented for uncompressed tables, and not yet implemented for compressed tables.

Other Performance Tests Parallel load performance (CPU)

Delete operation CPU UtilizationUpdate operation CPU Utilization Delete/Update Performance

FTS Performance Parallel Full Table Scan CPU Utilization Parallel Full Table Scan IO Performance

Table Access by ROWID

Agenda  Overview Table Compression  How does it work?  Test Environment  Space Savings  Query Performance  Conclusion

Best Practices  Use Compression in read intensive applications  Execute bulk loads (SQLLDR and Parallel Insert) to compress rows  Compress older data in large Data Warehouses –Integrate Table Compression into the ‘rolling window’ paradigm: Compress all but most recent partition  Compress Materialized views  Only compress infrequently updated tables

Data normalization and Table Compression  “Normalize till it hurts, denormalize till it works”  High normalization may result in a high number of table joins (bad performance)  Both data normalization and table compression reduce redundancy

Conclusion  Table Compression: –reduces costs by shrinking the database footprint on disk –is transparent to applications –often improves query performance due to reduced disk I/O –increases buffer cache efficiency

A Q & Q U E S T I O N S A N S W E R S