Compression and Storage Optimization IDS xC4 Kevin Cherkauer

Slides:



Advertisements
Similar presentations
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Advertisements

Transaction.
Project Management Database and SQL Server Katmai New Features Qingsong Yao
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
CS 104 Introduction to Computer Science and Graphics Problems
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
Physical design. Stage 6 - Physical Design Retrieve the target physical environment Create physical data design Create function component implementation.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Lecture 8 Index Organized Tables Clusters Index compression
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Extents, segments and blocks in detail. Database structure Database Table spaces Segment Extent Oracle block O/S block Data file logical physical.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
© 2011 IBM Corporation September 9, 2010 Storage Provisioning Scott Pickett – WW Informix Technical Sales For questions about this presentation contact:
Module 11: Programming Across Multiple Servers. Overview Introducing Distributed Queries Setting Up a Linked Server Environment Working with Linked Servers.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
© Pearson Education Limited, Chapter 13 Physical Database Design – Step 4 (Choose File Organizations and Indexes) Transparencies.
CS 405G: Introduction to Database Systems 21 Storage Chen Qian University of Kentucky.
Database structure and space Management. Database Structure An ORACLE database has both a physical and logical structure. By separating physical and logical.
Database structure and space Management. Segments The level of logical database storage above an extent is called a segment. A segment is a set of extents.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
CE Operating Systems Lecture 17 File systems – interface and implementation.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
SQLintersection Putting the "Squeeze" on Large Tables Improve Performance and Save Space with Data Compression Justin Randall Tuesday,
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
© 2011 IBM Corporation June 22, 2016 Compression.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Introducing Hekaton The next step in SQL Server OLTP performance Mladen Prajdić
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Maximizing Performance With Informix TimeSeries
Processes and threads.
Module 11: File Structure
Indexes By Adrienne Watt.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Physical Database Design and Performance
Finding more space for your tight environment
CS222/CS122C: Principles of Data Management Lecture #3 Heap Files, Page Formats, Buffer Manager Instructor: Chen Li.
Database Management Systems (CS 564)
Ripple Joins for Online Aggregation
Modern Systems Analysis and Design Third Edition
Database Performance Tuning and Query Optimization
Lecture 10: Buffer Manager and File Organization
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Instant Add Columns in MySQL
Disk Storage, Basic File Structures, and Buffer Management
MANAGING DATA RESOURCES
CSCI 2141 – Intro to Database Systems
Physical Database Design
Chapter 12 Designing Databases
Module 11: Data Storage Structure
Introduction to Database Systems
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Chapter 13: Data Storage Structures
File-System Structure
Chapter 11 Database Performance Tuning and Query Optimization
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Large Object Datatypes
Chapter 13: Data Storage Structures
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Lecture 20: Representing Data Elements
Presentation transcript:

Compression and Storage Optimization IDS 11.50.xC4 Kevin Cherkauer

What Is Compression? Ability to store data rows in compressed format on disk Saves up to 90% of row storage space Ability to estimate possible compression ratio Fits more data onto a page Fits more data into buffer pool Reduces logical log usage

What Is Storage Optimization? Ability to consolidate free space in a table or fragment to the end Ability to return this free space to the dbspace Space returned can then be used by any table in the dbspace

Compression Concepts Lempel-Ziv (LZ) based algorithm – static dictionary, built by random sampling Frequently repeating patterns replaced with 12-bit symbol numbers Any byte that does not match a pattern is also replaced with a 12-bit reserved symbol number Patterns can be up to 15 bytes long Max possible compression = 90% (15 bytes replaced with 1.5 bytes = 12 bits)

Compression Symbols 12-bits means 4,096 symbols 256 reserved symbols for bytes that match no pattern 3,840 pattern symbols Patterns > 7 bytes use up two symbol numbers Thus not all patterns can be compressed Dictionary tries to capture the “best” patterns (frequency x length) Non-matching bytes grow by 50% (8 bits replaced by 12 bits)

Data Affects on Compression Data with frequently repeating long patterns is the most compressible Long runs of 0’s or blanks are very compressible Noise-like data is poorly or not at all compressible: Encrypted data Data already compressed by another algorithm Data without long repeating patterns Avoid putting a “noise-like” column between other columns that have frequent patterns – disrupts potential column-spanning patterns

Performance Impact of Compression IO-bound workloads Compression may improve performance by reducing IOs (both data page and logical log) More data fits on a page, so more in buffer pool Log records are smaller, so less logging For CPU-bound workloads Additional CPU used to compress and expand rows Should not be a large impact

HDR, ER, CDC (DataMirror) and Compression All are supported on compressed tables HDR Tables will be compressed on secondary iff they are compressed on primary ER Compression status of tables is independent between source and target, specified by user CDC Compression of targets is a function of what the target database supports and what use specifies

OAT Interface Compression and Storage Optimization can be managed via the OAT graphical interface Details in talk following this one

Things That Cannot Be Compressed Out-of-row data (e.g. blobs) Indexes Catalog tables Temp tables Partition tables Dictionary tables Tables in the following databases: sysuser,sysmaster,sysutils,syscdr,syscdcv1

Dictionary Storage Each compressed (non-fragmented) table or table fragment has its own compression dictionary Dictionary consumes ~75K – 100K per fragment Thus compressing tiny tables is not recommended All dictionaries for tables/fragments in a given dbspace are stored in a special hidden dictionary table in that dbspace Union of all dictionary tables exposed by sysmaster syscompdicts_full table – includes binary dictionary; access restricted to user “informix” syscompdicts view – globally accessible; omits binary dictionary for security

Compression Operations create_dictionary Creates compression dictionary Any rows inserted or updated after will be compressed Previously existing rows will not be compressed compress Does implicit create_dictionary if no dictionary exists Compresses all previously existing rows Table fully accessible to other queries estimate_compression Estimate compression ratio a brand-new dictionary could get If already compressed, estimate current compression ratio (else 0) Also shows the estimated gain to be had by making a new dictionary (difference between first and second estimates)

Compression Operations 2 uncompress, uncompress_offline Uncompress every row in the table/fragment Deactivate the compression dictionary “uncompress” – table is fully accessible “uncompress_offline” – table is XLOCKed, no query access purge_dictionary Delete old inactive dictionaries Separate command because ER, DataMirror might need old dictionaries

Storage Optimization Operations repack, repack_offline Move rows within a table or fragment to consolidate free space at the end “repack” – table is fully accessible “repack_offline” – table is XLOCKed, no query access shrink Return any free space at end of table or fragment to the dbspace Normally done after a repack

Admin API Interface All compression and storage optimization operations are invoked via the IDS Admin API built-in UDRs execute function task(…); execute function admin(…); Example execute function task(“table compress repack shrink”, “table_name”, “database_name”, “owner_name”); Enables OAT graphical interface Enables remote execution (DBA does not need to log directly in to the target machine)

Summary Compression and Storage Optimization can save disk space and thus $$$ For IO-bound workloads Compression can also improve performance Compression reduces logging Compression fits more data into the buffer pool Storage Optimization allows space saved by compression to be reclaimed from tables and fragments of tables

Thanks!