Presentation is loading. Please wait.

Presentation is loading. Please wait.

6 IQ Indexes.

Similar presentations


Presentation on theme: "6 IQ Indexes."— Presentation transcript:

1 6 IQ Indexes

2 Indexes Typical RDBMS uses B-tree index B-tree indexes are expensive
usually a separate structure in addition to data consists of data pages with values and pointers Sybase (and others) has special type which contains pointers and data pages B-tree indexes are expensive consume space and time (to build) require maintenance after data refresh not useful with low cardinality data columns December 6, 2018

3 B-Tree – an example 4 2 1 3 6 5 8 7 Index Levels - Keys
Let’s assume a table with key values from 1 to 8 – and build the B-Tree The decision tree is now very simple. The question asked is “Is the required value less than or equal to the value in the tree?” If it is take the left fork else go right Index Levels - Keys 4 2 1 3 6 5 8 7 Leaf Level (Data) December 6, 2018

4 IQ-M Indexes Indexes are the data (a data access method)
no separate data store A column is likely to have multiple indexes always has at least one index Index selection for a column is based upon Number of discrete values (cardinality) Usage in queries To some extent, the column’s data type December 6, 2018

5 Index Types – Now 10 (13 in 12.5)! 1 Default Index (aka Fast Projection or FP) Raw data compressed on disk comes in 3 flavours Low Fast (LF) Bit map index High Non Group (HNG) Bit-wise index High Group (HG) G-Array (relative of a B-tree) Comes in 3 flavours December 6, 2018

6 Index Types – Now 10 (13 in 12.5)! 2 Compare Index (CMP)
Column Compare Index Word Index (WD) Sort of “Free Text” Index 12.5 DATE 12.5 TIME 12.5 DTTM (datetime index) December 6, 2018

7 Default Index - FP Created automatically by Create Table Used for
Ad-Hoc joins String searches Certain calculations Projection of data This index cannot be created or dropped December 6, 2018

8 IQ Unique Constraint - 1 Improves load processing
Create Table department (dept_id char(4) Not Null IQ Unique(200), emp_lname varchar(25) Not Null IQ Unique(75000), Improves load processing IQ will construct 1-byte FP index for dept_id Won’t try either a 1-byte or 2-byte FP on emp_lname Too many values for either December 6, 2018

9 IQ Unique Constraint - 2 In terms of query performance the 1-byte and 2-byte FP indexes can speed up the server The first method is that searching an FP (of any type) can be done in parallel, which may be faster than a so-called fast index (LF and HG) The second speed up is that for a LIKE operator the 1-byte and 2-byte FP can be faster than an HNG index Based on the width of the search predicate And the width of the FP index (1 or 2 bytes) December 6, 2018

10 IQ Unique Constraint - 3 There is no mileage in setting IQ UNIQUE 255 set on every column at index/table create time. The incremental cost of the rollovers to 2-byte FP and Flat FP is very high for larger table sizes (above million rows). Testing at 50 million rows indicates that the 1-byte to 2-byte cost is very high (increase per index load time by 200 times) the 2-byte to Flat FP transition increases per index load times by another 300 times. But let’s think about the look up pages… December 6, 2018

11 IQ UNIQUE Constraint – 4 Consider using Minimize_Storage option in Sybase IQ This will place an IQ UNIQUE(255) on every column for every table created and removes the need to use IQ UNIQUE If the value is <= 255 then IQ will place a 1-byte FP index on the column – 1 byte of storage per row If the value is > 255 but <= then IQ will place a 2-byte FP index on the column – 2 bytes of storage per row May slightly hinder data loads, but improve query speeds May incur onetime slight load slowdown while 2-byte FP is converted to flat FP, but this usually happens during the first load December 6, 2018

12 FP Index Types Depending on cardinality and the use of IQ UNIQUE, IQ-M will initially construct FP index in one of 3 ways If > distinct values in column Flat FP index If < 256 distinct values 1-byte FP Between 256 and distinct values 2-byte FP December 6, 2018

13 Flat FP Index Red Blue Green Red More than 65536 values in a column
The raw data is compressed (on disk) Color Red Blue Green Red December 6, 2018

14 The Flat FP Index Structure
December 6, 2018

15 1-byte FP Index Data Lookup Table 1 2 3 Red Red 1 Blue Green 2 Blue
Less than 256 values in a column A one byte lookup table is built Data Data Values Lookup Table 1 2 3 Red Color Red 1 Blue Green 2 Blue Green Red 3 December 6, 2018

16 2-byte FP Index Data Red Blue Green Lookup Table 1 2 3 1 2 3 Red Blue
Between 256 and values the data is stored in a two byte lookup table Data Data Values Red Blue Green Lookup Table 1 2 3 Color 1 2 3 Red Blue Green December 6, 2018

17 FP Indexes During load, IQ-M will try to build Flat FP first, unless otherwise specified If you specify a low cardinality 1-byte FP at the start (by using IQ UNIQUE) then this will Resort to a 2-byte FP after 256 values… then … Resort to Flat FP after approx values or n lookup pages (see two slides on…) If you specify a 2-byte FP at the start (by using IQ UNIQUE) then this will December 6, 2018

18 FP Index Growth Although an FP can, as the cardinality grows, change (1-byte FP>2-byte FP) or (2-byte FP>Flat FP) it can never revert There is a high cost in conversion – either way The only backwards conversion is to drop the column (not the index), recreate and reload (Expensive) December 6, 2018

19 Limit for 2-byte FP - 1 FP_Lookup_Size Def 32767 This option controls the number of discrete values that a 2-byte FP can contain, as a maximum Note this is in Kbytes This was new in for constraining the size of a 1- or 2- byte FP By default a 2-byte FP to Flat FP flip will occur when the lookup table grows beyond 32 MB Note – If you have a pre byte FP it is constrained by FP_Number_Lookup_Pages December 6, 2018

20 Limit for 2-byte FP - 2 FP_Lookup_Size Def 32767 For a bigint of 8 bytes, entries take only around 1+ MB. For a max varchar of 255 bytes, entries take no more than 18 MB. It is critical to keep the entire ByteStore (lookup table) in memory for performance reason. December 6, 2018

21 Look up Pages If we specified IQ UNIQUE 255 for all columns
Then all columns would have a pinned lookup page (or pages) in memory If we don’t have a lot of memory for the caches then we could flood memory with lookup pages This needs further thought… December 6, 2018

22 Use of 1 and 2-byte FP Indexes
Access Paths In and beyond we can perform more operations on the lookup table of the FP – instead of the other indexes. LIKE (simple and Complex) Simple Predicate YEAR(column_name) > ‘1995’ SQRT(column_name) < 100 DATEPART(hh,column_name) between 10 and 12 Also the resulting “scan” is performed in parallel (if required) December 6, 2018

23 Optimized FP Indexes The IQ Query Engine is taking even more advantage of the Optimized FP Indexes (1-2 byte FP Indexes) to improve Query Performance The DBA may periodically rebuild the FP indexes New System Procedure: sp_iqrebuildindex Rebuilds the FP index(es) for a table or column December 6, 2018

24 CASE - 1 CASE Implications to FP Indexes CASE RESPECT is the fastest
There is a 10-20% hit going to CASE IGNORE Implications to FP Indexes Regardless of RESPECT vs. IGNORE all 1-byte and 2-byte FP indexes store all the binary values for the data So ABC, abc, Abc, Abc are all stored even for CASE IGNORE December 6, 2018

25 CASE - 2 Remember because we store all bitmaps we can go from 1-byte to 2-byte FP, or 2-byte to flat FP where we might not want to A solution to this is to set the server in CASE RESPECT (because it is faster) Then use an ETL tool to rtrim() and ucase() or lcase() all of the incoming character data December 6, 2018

26 CASE - 3 The HG index stores data in what is called “conditioned” mode. For a CASE IGNORE database there is only one entry per logical value ABC = abc = Abc etc. For a CASE RESPECT database there has to be one entry per value ABC != abc != Abc etc. December 6, 2018

27 CASE – 4 For an LF index we hold partially conditioned values
For CASE RESPECT and CASE IGNORE all values have a bit-map This can be wasteful on space The reason for having this is two fold To allow for the recreation of the FP index from the LF To allow for some rare cases (some group by’s) where we still project values from an LF index December 6, 2018

28 Low Fast Index Traditional Bit Map for Low Cardinality
Less than 10,000 unique values in a column Can be unique Required for performance involving Joins Group by MIN, MAX, functions Where clause predicates Equality / Inequality, Ranges, IN lists December 6, 2018

29 Bitmap Indexes What are bitmaps?
Bitmaps are representations for each value in a field True = 1 False = 0 Bit position corresponds to a fixed row ID For each discrete value there is one bitmap – the length of which is number of rows in the table December 6, 2018

30 Digression on Bitmaps - 1
If there are 7,000,000 rows in the table each bitmap will be 7,000,000 bits long This could be 1,000,000 bytes (almost a megabyte) If there are 1000 possible values in the column this could mean 1 Gbyte (approx.) for the column Is this correct ? December 6, 2018

31 Digressions on Bitmaps - 2
Yes, sort of… In IQ-M there are 4 ways of holding a bitmap page The conditions for the 4 types of page are All Zero Bitmap Few 1s 20-80% 1s Almost all 1s All 1 bitmap December 6, 2018

32 Bitmap “Types” An all Zero bitmap page is not stored
just an entry in the block map An all 1 bitmap page is also not stored a similar entry in the block map For the 20-80% 1s there is a real bitmap For the nearly all 1s or nearly all zero pages the data in Run Length Encoded December 6, 2018

33 Run Length Encoding 1-50,90,102-135,1090-4573,7833, 9011-11430,...
Used when there is a very sparse set in bits set (or not set) Very efficient on storage 1-50,90, , ,7833, ,... December 6, 2018

34 ... Bitmap Indexes Designed for incremental additions of rows Query:
Each unique value has it’s own bitmap Designed for incremental additions of rows Query: select count(*) from customers where state =‘AL’ For the state column of a table, a LowFast index would be appropriate since there are only 50 distinct values. Sybase IQ will build 50 bitmaps, one for each state. For processing this query, Sybase IQ will select the bitmap for “AL” and count the “1”s.

35 High Group Index - 1 High Cardinality data columns
More than 1000 unique values Can enforce uniqueness Special internal structure for unique HG indexes Automatically created by Create Table for columns with UNIQUE or PRIMARY KEY constraint (regardless of cardinality) December 6, 2018

36 High Group Index - 2 New Blocks can be added into the Linked List
B-Tree Index a b c 4 1,2 3,5,6 New Blocks can be added into the Linked List December 6, 2018

37 Much faster for Skewed data give bitmaps directly to optimizer
High Group Index - 3 Much faster for load Much faster for Skewed data give bitmaps directly to optimizer B-Tree Index a B(ptr) C = 4 1,2 When a page is completely filled With one value the array is converted to a bitmap December 6, 2018

38 High Group Index - 4 Required for performance on High Cardinality columns used for: Joins Select Distinct, Count Distinct Group By Takes up the most space in the database Requires the longest time to load/delete Cannot be used with certain data types December 6, 2018

39 High Group Transition - 1
The point at which a High Group G-Array transitions from a list of rowids to a bitmap is dependant upon the IQ Page Size: Page Size Transition Point 64 Kbytes 4,096 rowids 128 Kbytes 8,192 rowids 256 Kbytes 16,384 rowids 512 Kbytes 32,768 rowids This is quite important when calculating how large indexes are likely to grow, and hence potentially which index to use for a given column (datatype/cardinality). December 6, 2018

40 High Group Transition - 2
Remember when the G-Array is a list or rowids it is, at most, one page long for each value. A bitmap is a lot bigger (for larger tables). Page Size # rows in a 1 page bitmap 64 Kbytes 640,000 rowids 128 Kbytes 1.2 M rowids 256 Kbytes 2.4 M rowids 512 Kbytes 5.1 M rowids So as you can see - if you are running on a 64 KB database, and have a 2 million row table - when the G-Array entry flips to a bitmap – it will grow the G-Array part of the index size by 3 pages (for each and every value that flips). December 6, 2018

41 Specialised High Group - 1
When a column is created (or altered) to have the following constraints UNIQUE Primary Key Then the column has a unique HG index created automatically This is a HG without a G-Array December 6, 2018

42 Specialised High Group - 2
Provided the combination of two or more columns is unique, then you can generate a unique multi-column HG index on product of the two columns You may still generate other indexes on the base columns 12.5 Non-Unique Multi-Column Index (required for the Referential Integrity Process) December 6, 2018

43 High Non Group Index (HNG)
Bit-Wise Index data stored as binary vertically partitioned patented by Sybase cannot be unique cannot be used with certain data types Used for range searches for all cardinality columns aggregation (sum and average functions) December 6, 2018

44 HNG - High Card Bit-Wise Index
Data with large number of values stored in binary form Data sliced vertically - each bit position can be manipulated separately Many bit positions are either all on or all off so no storage space is required with compression

45 HNG Index Processing For the query: select sum(sales) from customers
ASIQ performs the sum as follows : #1bits on*1 + #2bits on*2 + #4bits on*4 +#8bits on*8 (6*1) + (4*2) +(4*4)+ (4*8) = 62 December 6, 2018

46 HNG Indexes with Other Indexes
Any Cardinality columns also need an HNG Index Columns used with aggregates (sum, avg) Range searches Root String searches example: where cust_name like “Stan%” all other string searches will use the FP index December 6, 2018

47 Compare (CMP) Index The CMP index is used for “comparing” 2 columns in the same table It is really just 3 bitmaps A “less than” bitmap An “Equal to” bitmap A “Greater than” bitmap Performance of t1.col1 > t1.col2 is substantially improved Load times are only marginally affected (<1%) December 6, 2018

48 Word (WD) Index This is a specialised index that indexes each and every “word” in a column Used for char() varchar() and long varchar() Slightly faster to load than an equivalent column HG index Accessed by the “contains” verb where t1.col1 contains (‘Richard’, ‘Soundy’) December 6, 2018

49 Word (WD) Index 12.5 In 12.5 this will support the like clause – but it is only accelerated with the WRD index if the token is delimited. Like “Richard” -> Handled by HG or LF Like “%Richard%” -> Handled by FP Like “Richard%” -> Handled by HNG/HG/LF (Range query) Like “%ÿRichardÿ%” -> Handled by WRD (Note spaces) December 6, 2018

50 Word Index Use The delimiters and the length of the entries can be set for the WD index during index creation time. By default the delimiters are all the ASCII characters not defined as number or alphabetic The limit is the max size of each entry CREATE WD INDEX earnings_wd ON earnings_report_table(earn_col) DELIMITED BY ‘ :;.’ LIMIT 25 December 6, 2018

51 DATE, TIME and DTTM Indexes - 1
There are 3 new indexes in These are the DATE, TIME and DTTM indexes. These are not automatically generated, they require a manual create index command for each required index. Each index has a complex internal structure that is shown on the next page(s) December 6, 2018

52 DATE, TIME and DTTM Indexes - 2
Day Month Year Hour Minute Sec. Day of Week Quarter of Year Week of Year DTTM þ DATE ý TIME The components of the index are persistent bitmaps, similar in construction to an LF Bitmap structure December 6, 2018

53 DATE, TIME and DTTM Indexes - 3
The Query Engine is fully conversant with the new indexes and now DATEPART(), DATEDIFF() etc. will/should work incredibly fast in comparison with the existing implementations. Also it should be noted that the Query Engine will use the new indexes even for syntax such as below: Where datepart(month, date_column) = ‘Jan’ December 6, 2018

54 Indexing Strategy Some columns may now need 2 or 3 (or even more )indexes Possible Combinations of “basic” indexes: FP + LF FP + LF + HNG FP + HG FP + HG + HNG Unlikely Combination FP + LF + HG (the HG will always be used in preference to the LF) December 6, 2018

55 Assigning Indexes Every column has the Default Index (FP)
Create Table command builds this index cannot be dropped No additional indexes needed with columns used only for Projection (Select list) columns with ‘bit’ datatype December 6, 2018

56 High Group – 1 Regardless of cardinality put an HG on:
Any Primary Key (if not defined in create or alter table – where it is created automatically) Any UNIQUE column (if not defined in create or alter table – where it is created automatically) Any column used in a join December 6, 2018

57 High Group – 2 Why do the above?
It is better and generally faster – especially on Low Cardinality Columns This is a change from previous thoughts December 6, 2018

58 High Group – Warning - 1 A High Group Index is the most complex index
It takes the longest to load (and delete) The load time is directly proportional to the length or width of the data component Smaller (in size not cardinality) data is better Cardinality is also an issue the higher the number of “new” values in an incremental load the slower the load can get Can the data be broken into more than one column? December 6, 2018

59 High Group – Warning - 2 For 12.5 there have been multiple changes to the HG load process that will remove some of the unfortunate issues concerning incremental HG loads, and also improve the overall HG load performance. These include: Removing Multiple B-Tree walks (this was done in and provided a good performance improvement). Changes to the G-Array split code – this has vastly improved the incremental load performance – especially the “second” load to a table. Also there is no real requirement to play with the G-Array options, as the loading code works “best” with the defaults. December 6, 2018

60 Low Fast or High Group If a column is not
A Primary Key UNIQUE used in a join (ad-hoc or Join Index) But is still used for more than just a projection column it will require another one or two indexes December 6, 2018

61 Low Fast or High Group Put an HG or an LF (depending on Cardinality) on the following column types used in SELECT statement with functions: min, max, count, count(distinct) used in WHERE clause predicates: equality / inequality IN list EXISTS used in GROUP BY used in ORDER BY December 6, 2018

62 Additional Indexes Misconception
The 1000 cardinality break point is not a fixed limit Absolute limit is cardinality of 9,999 There are no hard and fast rules A 5,000 card LF may work faster in one case In an other an 800 card HG may be faster Remember an LF is much faster to load data into than an HG Although for larger cardinality an LF may well be larger on disk – and hence slower for query operations. December 6, 2018

63 Regardless of Cardinality
Build an HNG index on all columns used in BETWEEN or RANGE comparisons with AVG() or SUM() functions root string searches Examples: where sold_date between ‘1/5/99’ and ‘2/5/99’ where revenue > 1000 Select sum(revenue) from ... where customer_name like ‘Syb%’ But remember a 1 or 2 Byte FP may do the above faster! December 6, 2018

64 HNG and LF It is also suggested that for columns that are used in Range searches have both an HNG and an LF on them If the LF can be applied (for cardinality reasons) Why? – The optimiser can (in certain circumstances) use either an HNG or an LF to satisfy a range query – so lets have both – if we can afford the load and delete overheads December 6, 2018

65 HNG or LF (?) An HNG will be faster than an LF for SUM() anytime the number of distinct values is larger than the number of bits in the column’s data type An HNG will be faster than an LF for range predicates whenever the number of distinct values within the selected range exceeds ~2.5 times the number of bits in the column’s data type For very low cardinality, in a join column a HG will be the fastest index December 6, 2018

66 HG vs. LF and HNG For very low cardinality (dimension) tables then an HG may be better than both an LF and an HNG Why? The HG may only be 2 pages long (1 for the B-tree and 1 for the G-Array) Whereas the LF will be n pages long (1 for the B-Tree Identity page and 1 for each value) The HNG can also be large (1 page for the B-Tree Identity Page and 1 for each bit in the datatype) December 6, 2018

67 HG vs. LF and HNG – There are now 3 algorithms used for the selection of processing for Range Predicates. These are used when the decision has been taken to use the HG to process the range. It should be noted that when the range of values is less than about 128 the HG is a lot faster to process the range than the HNG index. December 6, 2018

68 HG vs. LF and HNG – 12.5 - 2 Bit Vector Sort Traditional Bitmap
This is a non-persistent bitmap (in memory only). Here we take each G-Array and set bits on in the bit vector. This is some 4x faster than a persistent bit map Sort All rowids for the range are sorted then a bitmap is produced Traditional Bitmap Not used that often only in the event of massive ranges. Note the bitmap is persistent December 6, 2018

69 HNG Given the above slide – if you have an HNG and an HG on a low cardinality column, test the range queries If they are erratic in performance Sometimes fast, sometimes slow Remove the HNG and try again This is an optimiser “deficiency” that might not be able to be corrected… December 6, 2018

70 Indexing Tips - 1 Use IQ Unique constraint in Create Table
This will improve (usually) load performance Cardinality near the critical thresholds is the most important Create Indexes before loading Data ASIQ is NOT like ASE – indexes load very fast, so do all the index loads simultaneously December 6, 2018

71 Indexing Tips – 2 Choose Indexes by committee
If in doubt as to the use of a column ask everybody! If in doubt index everything, you can always drop indexes later (but beware of load times on wide HG) Specify HG index as UNIQUE This information is of great use to the optimiser Also the UNIQUE HG is a very efficient index December 6, 2018

72 One Last thing on Indexes
There is a bitmap on each and every column and each and every table called the Existence Bitmaps The row Existence Bitmap tells IQ-M whether there is a row for that RowId or not The Column Existence Bitmap tells IQ-M that the column has a value (not NULL) December 6, 2018

73 A Well Forgotten Fact There are 2 reasons IQ-M needs enumerated indexes (LF, HG, 1-byte and 2-byte FP) The primary reason we have discussed, that is to solve the query In IQ 12.4 and beyond there is another reason, to provide the Optimiser with accurate distinct counts The second reason is very important – we will discuss it more when we get to the Query Tree December 6, 2018

74 Database Administration – Index Maintenance System Procs
New System Stored Procedures for the DBA To Report Index Fragmentation and Density To Rebuild Index(es) sp_iqindexfragmentation ( table| index_names ) Reports on the amount of empty space in index structures sp_iqrowdensity (table | columns ) Reports internal row fragmentation of FP indexes December 6, 2018

75 Database Administration – Index Maintenance
sp_iqrebuildindex ( table | indexes ) Rebuild any index that has become fragmented Will recreate optimized FP indexes Over time some optimized FP indexes may have "rolled over" to become a Flat FP This utility will rebuild the optimized FP index if cardinality is within limits We need to think about how we “present” this to customers December 6, 2018

76 Index Descriptions - End
December 6, 2018


Download ppt "6 IQ Indexes."

Similar presentations


Ads by Google