Maximizing Performance With Informix TimeSeries

Slides:



Advertisements
Similar presentations
© IBM Corporation Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM.
Advertisements

Copyright 2003Curt Hill Hash indexes Are they better or worse than a B+Tree?
File Systems.
INTRODUCTION TO ORACLE DATABASE ADMINISTRATION Lynnwood Brown System Managers LLC Introduction – Lecture 1 Copyright System Managers LLC 2007 all rights.
Module 6 Implementing Table Structures in SQL Server ®2008 R2.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Chapter 3 Parallel Search 3.1Search Queries 3.2Data Partitioning 3.3Search Algorithms 3.4Summary 3.5Bibliographical Notes 3.6Exercises.
PARTITIONING “ A de-normalization practice in which relations are split instead of merger ”
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
© 2011 IBM Corporation 11 April 2011 IDS Architecture.
Cloud Computing Lecture Column Store – alternative organization for big relational data.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Module 3: Managing Database Files. Overview Introduction to Data Structures Creating Databases Managing Databases Placing Database Files and Logs Optimizing.
7202ICT Database Administration Lecture 7 Managing Database Storage Part 2 Orale Concept Manuel Chapter 3 & 4.
Informix IDS Administration with the New Server Studio 4.0 By Lester Knutsen My experience with the beta of Server Studio and the new Informix database.
Architecture Rajesh. Components of Database Engine.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.
© 2010 IBM Corporation September 9, 2010 IDS 11.7 – New Fragmentation Strategies Scott Pickett – WW Informix Technical Sales For questions about this presentation.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
MySQL More… 1. More on SQL In MySQL, the Information Schema is the “Catalog” in the SQL standard SQL has three components: Data definition Data manipulation.
1 Chapter 13 Parallel SQL. 2 Understanding Parallel SQL Enables a SQL statement to be: – Split into multiple threads – Each thread processed simultaneously.
Swap Space and Other Memory Management Issues Operating Systems: Internals and Design Principles.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Session 1 Module 1: Introduction to Data Integrity
Creating Indexes on Tables An index provides quick access to data in a table, based on the values in specified columns. A table can have more than one.
for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.
1 Indexes ► Sort data logically to improve the speed of searching and sorting operations. ► Provide rapid retrieval of specified rows from the table without.
Notes: **A Row is considered one Record. **A Column is a Field. A Database is…  an organized set of stored information usually on one topic  a collection.
Select Operation Strategies And Indexing (Chapter 8)
Bigtable A Distributed Storage System for Structured Data.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Prepared By: Bobby Wan Microsoft Access Prepared By: Bobby Wan
Compression and Storage Optimization IDS xC4 Kevin Cherkauer
Databases.
Non Contiguous Memory Allocation
IST 220 – Intro to Databases
CHP - 9 File Structures.
Practical Office 2007 Chapter 10
Physical Changes That Don’t Change the Logical Design
Chapter 12: File System Implementation
Lecture 16: Data Storage Wednesday, November 6, 2006.
Informatica PowerCenter Performance Tuning Tips
Physical Database Design and Performance
COMP 430 Intro. to Database Systems
Are they better or worse than a B+Tree?
Database Performance Tuning and Query Optimization
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Predictive Performance
Session #, Speaker Name Indexing Chapter 8 11/19/2018.
More about Databases.
Introduction to Database Systems
Lecture 29: Virtual Memory-Address Translation
Creating Tables & Inserting Values Using SQL
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Troubleshooting Techniques(*)
Microsoft Office Access 2003
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Spreadsheets, Modelling & Databases
Chapter 11 Database Performance Tuning and Query Optimization
Physical Storage Structures
Chapter 4 Introduction to MySQL.
Indexes and Performance
Lecture 20: Representing Data Elements
Presentation transcript:

Maximizing Performance With Informix TimeSeries Jeffrey McMahon Session E03 IBM Mon/Apr 23/1:05 4/17/20184/15/12 Session Z99

Agenda Create basic time series data in your database. Use container pooling to spread container usage for optimal i/o. Best practices when loading time series data. Best practices when purging time series data. Monitoring loads and purges to ensure maximum throughput. 4/17/20184/15/12 Session Z99

TimeSeries Table What does a TimeSeries table look like? Series Meter_id Series 1 [(1-1-11 12:00, value 1, value 2, …, value N), (1-1-11 12:15, value 1, value 2, …, value N), …] 2 3 4 … Table grows 4/17/20184/15/12 Session Z99

Building a table with TimeSeries Calendartable Create a calendar of 15 minute intervals: insert into calendartable (c_name, c_calendar) values ( ‘interval_15min_gmt‘, --calendar name ‘startdate(2009-01-01 00:00:00.00000), pattern({1 on, 14 off }, minute)’ ); 4/17/20184/15/12 4/17/20184/15/12 Session Z99 4

Building a table with TimeSeries Create Rowtype Create a row type to hold your interval data: create row type vee_interval_type ( reading_dt datetime year to fraction (5), <- must be fraction(5)! reading_flag smallint, reading_value decimal(14,3), indicator smallint, code char(40) ); 4/17/20184/15/12 4/17/20184/15/12 Session Z99 5

Building a table with TimeSeries Create Table Create a table with a TimeSeries column: create table vee_interval_table ( meter_id bigint, reading_type char(10), measuring_unit char(10), vee_interval_ts TimeSeries(vee_interval_type), primary key (meter_id) ); 4/17/20184/15/12 4/17/20184/15/12 Session Z99 6

Building a table with TimeSeries TSContainerCreate Create a “container” to hold the TimeSeries interval data: execute procedure tscontainercreate ( 'container1', -- container name 'rootdbs', -- dbspace (rootdbs usually isn’t used!) 'vee_interval_type', -- TimeSeries rowtype 10000, 5000 -- first, next extent (KB) ); 4/17/20184/15/12 4/17/20184/15/12 Session Z99 7

Building a table with TimeSeries TSCreate Add a row to the table: insert into vee_interval_table values ( 1, --meter id (primary key) 'dmd', --reading type 'kwh', --unit of measurement (kwh) TSCreate( 'interval_15min_gmt', --cal_name '2009-01-01 00:00:00.00000', --date for the earliest element possible 0, --threshold (0 = all data in container) 0, --zero 0, --nelems (in-row space to preallocate) 'container1' ) --container name to hold this timeseries data ); 4/17/20184/15/12 4/17/20184/15/12 Session Z99 8

Building a table with TimeSeries TSContainerSetPool Create a pool of containers called meter_pool. This is used to automatically assigned a container to a TimeSeries (individual containers must already exist): execute procedure tscontainersetpool( 'container1', 'meter_pool' ); execute procedure tscontainersetpool( 'container2', 'meter_pool' ); execute procedure tscontainersetpool( 'container3', 'meter_pool' ); 4/17/20184/15/12 4/17/20184/15/12 Session Z99 9

Building a table with TimeSeries TSContainerSetPool Create a pool of containers called meter_pool execute procedure tscontainersetpool( 'container1', 'meter_pool' ); execute procedure tscontainersetpool( 'container2', 'meter_pool' ); execute procedure tscontainersetpool( 'container3', 'meter_pool' ); Pools can be used to uniformly distribute TimeSeries across multiple containers 4/17/20184/15/12 Session Z99

TSContainerPoolRoundRobin Building a table with TimeSeries TSContainerPoolRoundRobin Automatically assign containers to TimeSeries in Round-Robin order using a pool: insert into vee_interval_table values ( 2, 'dmd', 'kwh', TSCreate( 'interval_15min_gmt', '2009-01-01 00:00:00.00000', 0, 0, 0, tscontainerpoolroundrobin( 'vee_interval_table', 'vee_interval_ts', 'vee_interval_type', 0, 'meter_pool') ) ); This works well only if each time series grows at about the same rate Stock market data would not work well with pools Better to assign time series to containers using custom logic for these cases 4/17/20184/15/12 4/17/20184/15/12 Session Z99 11

Spreading the TimeSeries Building a table with TimeSeries Spreading the TimeSeries SELECT meter_id, vee_interval_ts From vee_interval_table meter_id 1 vee_interval_ts origin(2009-01-01 00:00:00.00000), calendar(interval_15min_gmt), container(container1), threshold(0), regular, [] meter_id 2 vee_interval_ts origin(2009-01-01 00:00:00.00000), calendar(interval_15min_gmt), container(container2), meter_id 3 vee_interval_ts origin(2009-01-01 00:00:00.00000), calendar(interval_15min_gmt), container(container3), 4/17/20184/15/12 4/17/20184/15/12 Session Z99 12

vee_interval_table Table Result vee_interval_table Table Each Container should be placed on a separate disk meter_id vee_interval_ts (int) timeseries(mtr_data) Container1 1 2 3 4 Container2 5 6 7 Container3 8 4/17/20184/15/12 Session Z99

What a Container Looks Like Each time series has a unique ID generated internally For regular TimeSeries: This ID plus the offset is used to search the btree. For irregular TimeSeries: This ID plus the timestamp is used to search the btree BTREE Each data page holds sorted data for exactly one time series MTR1 Jan 1 MTR1 Mar 3 MTR4 Jan 1 MTR7 Jan 1 MTR10 Jan 1 MTR13 Jan 1 Data Pages: Data for MTR1 on Jan 1, 2, 3, 4, 5, … Data for MTR1 on Mar 3, 4, 5, 6, 7, …. Data for MTR4 on Jan 1, 2, 3, 4, 5, … Data for MTR7 on Jan 1, 2, 3, 4, 5, … Data for MTR10 on Jan 1, 2, 3, 4, 5, … Data for MTR13 on Jan 1, 2, 3, 4, 5, … 4/17/20184/15/12 Session Z99

Loading TimeSeries The key to good load performance is I/O parallelism Use multiple containers for your TimeSeries data Run multiple loaders in parallel Don’t allow two loaders to load the same container Need to balance number of loaders with number of cpus and (virtual) disks 4/17/20184/15/12 Session Z99

Loading TimeSeries Other Considerations: You will most likely have to preprocess the input data for best performance: Pre-sort the data by ascending time and group it by primary key Create “N” input files for each loader process to load Insure all the data for a particular container is assigned to the same loader process Create TimeSeries with threshold(0) to insure all time series data is loaded into containers and not into the home row Note: Inserting data into a TimeSeries virtual table is slow! 4/17/20184/15/12 Session Z99

Loading TimeSeries Ensure one loader per container. If base table is fragmented, you can create containers per table fragment Possibly store the container id in the home row and use it in the loads and queries “where container_id == X” Otherwise, you can use GetContainerName() UDR to determine which TimeSeries are assigned a container 4/17/20184/15/12 Session Z99

Loading TimeSeries TimeSeries Loader API is the fastest way to load TimeSeries data (These names are about to change!) Init Put -- builds a 32k input buffer Flush -- flushes current buffer Close Shutdown Requires you pre-sort and pre-group data into files as mentioned previously 4/17/20184/15/12 Session Z99

The End Result vee_interval_table Loader Loader Loader Loader Data sorted by time, grouped by primary key vee_interval_table Loader Container1 Unsorted, ungrouped data Loader Container2 Loader Container3 Loader Container4 4/17/20184/15/12 Session Z99

How it Works TSBinLoad_Flush() Btree for Container1 32k Buffer … | (id A, Time X+7, values) | (Id A, Time X+8, values) | (Id B, Time X, values) | … 32K buffer holds consecutive records grouped by TimeSeries ID and sorted by ascending time or offset (Time X, Values) (Time X+1, Values) (Time X+2, Values) (Time X+2, Values) (Time X+3, Values) (Time X+4, Values) (Time X+5, Values) (Time X+6, Values) Data Page for Id A, Container1 Data Page for Id B, Container1 4/17/20184/15/12 Session Z99

IBM AMT-Sybex Benchmark https://www.ibm.com/developerworks/forums/thread.jspa?threadID=391 263 4/17/20184/15/12 4/17/20184/15/12 Session Z99 21

IBM AMT-Sybex Benchmark https://www.ibm.com/developerworks/forums/thread.jspa?threadID=391 263 4/17/20184/15/12 4/17/20184/15/12 Session Z99 22

Purging TimeSeries DelClip() Deletes elements but leaves page allocated DelTrim() Deletes elements and reclaims space only if deleting data at end of TimeSeries DelRange() Deletes elements and reclaims space at any location in the TimeSeries 4/17/20184/15/12 4/17/20184/15/12 Session Z99 23

Purge Details Purge is very similar to load Future work Key to success is parallelism Run multiple purge operations Never run two purges on the same container at the same time Future work Attach/detach container partitions 4/17/20184/15/12 Session Z99

Container Usage Calculate speed of container usage during loads and purges TSContainerUsage() TSContainerTotalUsed() TSContainerTotalPages() execute function tscontainerusage('container1'); pages slots total 3952620 586099080 16515019 4/17/20184/15/12 4/17/20184/15/12 Session Z99 25

UDR Cache Try to achieve one udr per list PC_HASHSIZE / PC_POOLSIZE onstat -g cac prc list# id ref_cnt dropped? heap_ptr udr name -------------------------------------------------------------- 8 561 0 0 70000011eb4f838 test@jmcmahon:.getindex 11 608 0 0 70000011d525438 test@jmcmahon:.clipgetcount 11 565 0 0 70000011ed02038 test@jmcmahon:.tscreate 11 176 0 0 70000011c467c38 test@jmcmahon:.deepcopy 18 522 0 0 70000011dd47038 test@jmcmahon:.assign 63 579 23 0 70000011d52a838 test@jmcmahon:.putelemnodups 63 591 0 0 70000011eb50838 test@jmcmahon:.getnthelem 4/17/20184/15/12 4/17/20184/15/12 Session Z99 26

Preload Shared Libraries PRELOAD_DLL_FILE onconfig parameter PRELOAD_DLL_FILE $INFORMIXDIR/extend/TimeSeries.5.00.FC3/TimeSeries.bld PRELOAD_DLL_FILE $INFORMIXDIR/extend/TimeSeries.5.00.FC3/tsbloader.bld 4/17/20184/15/12 4/17/20184/15/12 Session Z99 27

Questions?!? 4/17/20184/15/12 4/17/20184/15/12 Session Z99 28

Maximizing Performance with Informix TimeSeries Jeffrey McMahon jmcmahon@us.ibm.com 4/17/20184/15/12 Session Z99