About Tables and Datatypes

About Tables and Datatypes
5 About Tables and Datatypes

Introduction This section is probably the most important in terms of performance of an IQ system We discuss Tables Datatypes The next section discusses the other vital part of IQ Indexes April 18, 2019

Tables Actually in IQ tables do not really exist
Tables are implicit in the IQ Catalog Store meta-data The concept of a table only comes to the fore in SQL, all other times IQ is a simple (hah!) Column Store However the Create Table does have some interesting “features” April 18, 2019

CREATE TABLE Of all of the create table command the following are of interest: [ GLOBAL | TEMPORARY ] [ { IN | ON } ] [ AT location ] { UNIQUE | PRIMARY KEY | REFERENCES … } April 18, 2019

GLOBAL TEMPORARY In IQ a temporary table can be either TEMPORARY or GLOBAL TEMPORARY A temporary table only exists for the duration of the transaction that creates it In a Global Temporary table the schema lasts for ever, only the data is destroyed at transaction commit/rollback All temporary data lives in IQ TEMP STORE April 18, 2019

Temp Tables You may no longer specify the table owner when creating temp tables If you specify the owner it will create a permanent (base table) in the IQ Store Create Table dbo.#my_temp_table (…) Creates a Permanent Table in the IQ Store Declare Local Temporary Table dbo.my_temp_table(…) Results in a Syntax Error April 18, 2019

IN | ON In IQ you cannot position objects (tables or indexes)
The reason for the IN (or ON) clause is to allow you to create an ASA table (base or temporary) An ASA table is created using ON SYSTEM System being the IQ Catalog Store This table will obey all the rules of an ASA (not an IQ) table April 18, 2019

AT The AT clause is used to define a proxy table that maps to a table at a remote location The remote server name must be defined to IQ This is not a fast way of accessing data CREATE TABLE fred AT ‘anotherserver;adatabase;;fred’ April 18, 2019

Constraints Check Constraints Added
Includes Check Constraints, Unique and Referential Integrity Constraints Permits constraint modification without recreating a table Constraints may be named for reuse UNIQUE, PK, FK, CHECK, IQ UNIQUE No DEFAULT (expected in IQ 12.7/IQ 15) New Stored procedures for maintenance sp_iqprintconstraints sp_iqdropconstraints April 18, 2019

Identity Columns IDENTITY/DEFAULT AUTOINCREMENT Property
Column may be defined as IDENTITY or DEFAULT AUTOINCREMENT Must be enabled using IDENTITY_INSERT Option Only one column per table may be defined with this property New Global Variable retains last value inserted New Database Option to Auto-Index Identity Columns The option IDENTITY_ENFORCE_UNIQUENESS is 'OFF' by default If ON creates a Unique HG index on Identity Columns Alter Table supports modifying/adding column as IDENTITY or DEFAULT AUTOINCREMENT April 18, 2019

New LOB Datatypes Char() data type may be defined to 32K(-1)
Same as Sybase IQ varchar() If defined > 255 bytes only FP, WD and CMP indexes are permitted Varchar() and char() may now be the same Certainly they behave identically, except that varchar() is one byte longer (per row) "Select Into" a Permanent Table now permitted (select into temporary table support since ) April 18, 2019

DDL Locks Concurrent DDL Lock Reduced to Table Level
This was a Database Lock in previous versions You may perform multiple DDL operations in a database as long as they operate on different tables "Begin Parallel IQ" to create multiple indexes on one table remains available (Multi-column) Key length increased from 1024 bytes to 1530 bytes (can still only be composed of 255 columns) April 18, 2019

Primary Key/Foreign Key
IQ-M does not enforce Primary Key/Foreign Key relationships – but it will in 12.5 (see following slides) The optimiser does use the PK/FK relationship for query planning Only specify this relationship if the relationship does exist Incorrect specification can result in query plan errors (performance degradation) and possibly errors ASA does modify a join that is defined as PK/FK to an ANSI NATURAL join – this can cause problems with orphan rows April 18, 2019

Key Specification In a Data Warehouse the production key is not, generally, used as the warehouse key It is more acceptable practice to use a generated key Make this key an Unsigned INT or BIGINT This is the absolutely most efficient key datatype in IQ-M April 18, 2019

Primary Keys In IQ-M a Primary Key is an ANSI standard Primary Key
It is UNIQUE It must not be null If specified as a table or column constraint then a specialised form of the HG index is created April 18, 2019

Foreign Keys Always generate an HG index on a Foreign Key
If the relationship is 1:1 then generate the Foreign Key column as a UNIQUE This will force auto generation of a unique HG index Again try to specify join columns as Unsigned INT or BIGINT April 18, 2019

Referential Integrity – 12.5 (1)
12.5 supports Primary Key/Foreign Key referential Integrity on loads. The overhead on loads is minimal. The maximum reduction in load performance that has been seen is under 8% of the total load time. For RI to work there must be a HG index on both the Primary and Foreign keys – and both the Primary and Foreign keys must be defined at the table level. This is the requirement (as above) for the Non-Unique Multi-Column Index. April 18, 2019

Referential Integrity – 12.5 (2)
The RI checking is accomplished after the sort phase for the foreign key index. At this point the keys are all in sorted sequence, so we read the Primary Key (PK) HG index (or rather we read the Leaf Nodes of the PK HG index – which is a Unique Index – hence has no G-Array), and we walk the PK index Leaf Nodes. Because all the data is sorted we only have to walk the Leaf Nodes once for the entire load. Hence the low overhead for Referential Integrity. April 18, 2019

A digression on Datatypes
There are some very important issues concerning datatypes We have discussed the actions of the indexes – there are areas where an index can be forced to run slowly if the datatype is specified wrongly Always consider the requirements for the datatype In correct datatype specification is as bad as incorrect index selection April 18, 2019

Signed vs. Unsigned - 1 If you don’t need signed data in an int or bigint – use UNSIGNED This will speed up the accessing of the HNG index sometimes doubling the performance HNG stores negative data as 1s complement This means SUM() AVG() etc. run quickly But range checks require another set of scans If we stored as 2s complement then Range checks would run with 1 scan But SUM() AVG() would be slower!! April 18, 2019

Signed vs. Unsigned - 2 Use Unsigned for surrogate keys and join columns Unsigned data comparisons are quicker (=, !=) The caveat to this is that Open Client may misinterpret the value if it is too large as it does not understand large unsigned data Can convert to signed integer, numeric, or decimal if returning data to an Open Client application This caveat applies to moving data between IQ servers with INSERT FROM LOCATION April 18, 2019

Other Datatype Issues Signed vs. Unsigned does not affect the other indexes to any great degree But… The selection of datatypes does We have already discussed keys but some other areas are worth commenting on… April 18, 2019

Long Varchar() - 1 A long varchar() is defined as a varchar() with a length greater than 255. If you can avoid this please try to Only FP and WRD index index is allowed No enumerated indexes or HNG We have seen a number of customers who use varchar(1024) as Primary Keys please DO NOT DO THIS!! April 18, 2019

Long Varchar() - 2 Long varchar() are stored as 256 byte chunks, so using 4 bytes in a varchar(32000) only uses 256 bytes By default these 256 byte chunks are memset (set to zeros to improve compression) There is an upgrade option to memset existing varchar() – this is worth doing, if you have the time! April 18, 2019

Char() vs. Varchar() Always, if you can, use char()
Generally this will improve performance, at the modest cost of storing some small number of extra bytes Query performance on retrieval of char() vs. Varchar() indicates that there can be a 2-3% performance hit per column, and we have seen 10% degradation on single columns April 18, 2019

Float, Real and Double Unless you really need them – please do not use
They can only have Flat FP indexes – no others The do not store “exact” values – only approximate Please try to use NUMERIC DECIMAL April 18, 2019

NUMERIC and DECIMAL Numeric and Decimal are aliases of each other
Any numeric or decimal with a precision of less than 12 will be stored as an INT (with conversions) Any numeric or decimal with a precision of between 12 and 18 will be stored as a BIGINT (with conversions) April 18, 2019

Join Columns You must generate the database schema with the table join columns having the same datatype. INT, UINT and BIGINT are best, but the column datatypes for each join must be the same Conversion cost is horrendous April 18, 2019

Case and Collation Sequences
In terms of RAW performance the fastest IQ database is one where CASE is set to RESPECT and the collation sequence is BINARY (ISO_bineng) This is probably not suitable for the general application of the database or warehouse server CASE set to IGNORE is the next fastest, then changes in the collation sequence The performance hits can be quite high (around 10-20% - we think!) April 18, 2019

String Searches String Searches such as substr(1,3,col_name) are really very slow, they rely on FP searches With low cardinality (1 and 2 byte FP) data the search is faster, but this can still be a restriction Create a new column which is the first 3 characters of the col_name column, then search on this This way there is no function call, so no projection, so the optimiser can use a fast index LF or HG (or if it is a range query an HNG) April 18, 2019

Telephone Numbers A classic example of the above is the telephone number +1 -> Country Code 301 -> Area Code 896 -> Sub Area Code 1733 -> Local Number Make this 4 columns (actually 5 - the whole number), then searches use fast indexes April 18, 2019

Date time As with telephone numbers, try storing a data time as as series of columns (or a dimension table) Try creating columns DD MM YY HH MM SS DoWeek DoYear Quarter etc. This changes in 12.5 with the DATE, TIME and DTTM indexes April 18, 2019

Date vs. Datetime A slightly better solution to the above can be considered in the light of the 1 and 2 byte FP indexes Try storing the date part of a datetime as a date and the time part as hh mm ss So: Datetime -> date_col, hh_col, mm_col, ss_col April 18, 2019

Loading Dates There is NO default date or datetime format for loads into IQ The format must be explicitly set for the load/insert to get the best performance However some formats are conversion enhanced April 18, 2019

Enhanced Conversion formats
DD/MM/YYYY DD.MM.YYYY DD-MM-YYYY HH:NN:SS HHNNSS HH:NN:SS.S HH:NN:SS.SS HH:NN:SS.SSS HH:NN:SS.SSSS HH:NN:SS.SSSSS YYYY-MM-DD HH:NN:SS YYYYMMDD HHNNSS YYYY-MM-DD HH:NN:SS.S YYYY-MM-DD HH:NN:SS.SS YYYY-MM-DD HH:NN:SS.SSS YYYY-MM-DD HH:NN:SS.SSSS YYYY-MM-DD HH:NN:SS.SSSSS April 18, 2019

Date Load So it is better to use than
Col1 DATE(‘YYYY-MM-DD’) than Col1 ASCII(10) The performance enhancement can be as much as a 100 fold speed up in loads (for small tables) April 18, 2019

UNION In IQ-M 12.4.3 the UNION clause has very few disadvantages
Generally UNIONs are all processed in parallel so if you have a low user count they work well Also the delete question now can be solved Do not use DISTINCT in the UNION clause, or in the SELECT statement April 18, 2019

UNION and Delete If you are storing a fixed (in time) amount of data e.g.. 6 months Then every month you delete 1/6th of the data in the table This is expensive It is better to split the fact table into 6 x one month tables At the end of the month you truncate the oldest table And possibly rename the table sets Remember for Multiplex table rename is DDL and hence can only be done in simplex mode! April 18, 2019

Cartesian Joins These are expensive – they involve the join of every row in one table to every row in a second table. Table A 1,000,000 rows Table B 100,000 rows Worktable 100,000,000,000 rows Select * from T, R where T.a = 10 Cartesian Select * from T, R where T.a between R.b and T.b Cartesian Select * from T, R where ABS(T.a * R.b) = T.b Cartesian But Select * from T, R where ABS(T.a * T.b) = R.b Not Cartesian April 18, 2019

Cursors Avoid using cursors Open With Hold
Generally means row based processing IQ was designed for set based processing Sometimes they cannot be avoided If used, make sure to use NO SCROLL cursors Open With Hold Allows the cursor to remain open across transactions If not used, the cursor may be closed when a commit is issued (depends on connectivity type) April 18, 2019

Watcom SQL vs. T-SQL IQ (ASA) is not 100% T-SQL Compatible, but very close Recommend using Watcom SQL All system procedures written with it Many more code examples and more IQ people versed in it Watcom SQL has some extensions that T-SQL does not: Dynamic SQL Better Loop control Full cursor movement rather than just read next Batches and procedures must be written in the same dialect Cannot mix T-SQL with Watcom SQL April 18, 2019

Watcom SQL vs. T-SQL Behavior differences include: Global variables
DECLARE CURSOR GOTO IF PRINT RAISERROR SET WHILE (T-SQL) vs. LOOP Global variables Variable Names CALL FOR ASA requires variables to be declared immediately after a BEGIN April 18, 2019

Commit and Rollback Use transaction control around logical units of work, even read only queries Should commit before a read/write batch is started to ensure latest version of data is available Should issue commit and rollback after batch completion to release all query resources Rollback will free memory resources in use by previous operations For systems with high number of connected users, freeing memory resources can aid in query performance April 18, 2019

Custom Functions Custom functions can be written in either SQL or Java
Great way to encapsulate business logic for transforming data Can have a significant performance impact on queries Functions are executed in the catalog portion of the engine All result rows may need to be moved to ASA Can be time consuming for large result sets Turn on query plans to see what impact the functions have on effective query plans April 18, 2019

About things - End April 18, 2019

About Tables and Datatypes

Similar presentations

Presentation on theme: "About Tables and Datatypes"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

About Tables and Datatypes

Similar presentations

Presentation on theme: "About Tables and Datatypes"— Presentation transcript:

Similar presentations

About project

Feedback