© Copyright 2000-2018 TIBCO Software Inc. Tuning For Performance in TDV Professional Services Group © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Tuning For Performance in TDV Topics Goals of Performance Tuning Evaluating Query Plans TDV Join Algorithms Performance Considerations for TDV Procedures Advanced Tuning Concepts © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Goals of Performance Tuning Minimize Network Load Different parts of the data set may reside on physically separate data sources Network latency is usually the biggest limiting factor for performance Solution: Minimize the amount of data that comes back over the network Minimize Memory Utilization TDV server has finite memory and processing power available which can be consumed by a few badly designed requests. Solution: Distribute work to data sources where possible. Swap queries with slow response times or large result sets to disk to free up memory © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Goals of Performance Tuning Leverage Data Source Efficiencies (i.e. Push Down) TDV does not store data internally, all operations require data retrieval Physical data sources such as databases are optimized to take advantage of native data type definitions, indexes and other system efficiencies Push as much work down to the data sources as possible to minimize load on TDV and the network Remember: Push down is not always possible or desirable Some data sources can’t be pushed to (i.e. files, web services) Something in the system makes push down slower (bad index, source under load) Avoid push down to protect a source system from load © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Why Evaluate Execution Plans? Performance of distributed queries depends very much on the query execution plan Execution plans are generated from query SQL and are heavily influenced by TDV query hints Execution plans allow you to evaluate how much work is being pushed and where unpushed operations may be a bottleneck A lot of TDV performance tuning is accomplished by rewriting SQL to force generation of a more optimal execution plan © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Viewing Execution Plans Execution plans can be displayed for views, sql script procedures and parameterized queries by clicking on the button You may not see the complete explain plan for a procedure until you click on the execute and show statistics button Clicking on the Execute and Show statistics button executes the view or procedure and evaluates the number or rows produced by each execution node and times Warning: Execute and Show statistics completely loads a result set. Large result sets can take a very long time to load! © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Viewing Execution Plans © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Types of Execution Nodes – Common Nodes Execution plans will commonly contain at least one of these nodes Multiple copies of a node may be an indicator of potential pushdown SELECT Internal TDV Select operation. All views will have at least one FETCH Fetch operation against an external data source. There should be at least one node for every database, flat file or xml file queried PROCEDURE Executes a stored procedure and returns all rows. There should be at least one for every packaged query, excel file or web service accessed © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Types of Execution Nodes – Potential Pushdown The following nodes identify work done by TDV on the result set Some of these nodes can be pushed down to data sources Remember that push down may not be possible, especially if joining data from multiple data sources FUNCTION Identifies a call to a TDV internal function being called on at least one column FILTER Applies a filter condition to each row © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Types of Execution Nodes – Potential Pushdown GROUP BY Aggregates (i.e. groups) a result set ORDER BY Sorts (i.e. Orders) a result set DISTINCT Removes duplicates from a result set EXCEPT Only returns rows that appear in the left side fetch © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Types of Execution Nodes – Potential Pushdown INTERSECT Only returns rows that appear in the left and right side fetch JOIN Merges two result sets based on specified join criteria UNION Returns all rows that are returned from the left or right fetch © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Join Algorithms Sometime join operations must be performed on the TDV instance Push down to a data source is not possible (i.e. files) Join involves multiple separate data sources (data federation) Push down is not desirable due to data source constraints TDV provides multiple join algorithms to efficiently join data Where possible TDV tries to push down some work to the data source to minimize the data returned Not all join algorithms are efficient in all situations Developers can use query hints to try to influence which joins are used. These may be ignored by the TDV optimizer © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Influencing Execution Plans The TDV optimizer uses both Rule Based Optimization and Cost Based Optimization to generate an efficient execution plan Rules based optimization enables the TDV query engine to interpret SQL into an efficient execution plan using internally defined rules Cost-based optimization uses statistical processing to provide cardinality estimates used during query plan generation We will look at ways that a developer can influence the optimizer to improve query performance © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Influencing Execution Plans Rule Based Optimization Modify join ordering Force TDV to query data source in a specific order to take advantage of faster data sources or ones that return smaller result sets using hints or by restructuring SQL Remove extra join nodes Complex queries might result in an execution plan that does not push a join down to a single data source Where possible, modify SQL to push down join operations to data source Limit the use of SQL Procedures and Packaged Queries The TDV optimizer does not have the same degree of freedom to restructure code in procedures © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Influencing Execution Plans Rule Based Optimization Use parameterized Queries for presentation layer objects Parameterized queries can be used to force clients to provide filter criteria. This prevents execution of unbounded queries that return extremely large result sets Use SQL-92 Join syntax TDV evaluates SQL-92 join syntax more cleanly than other Use the INNER JOIN ON instead of separating tables with commas and placing the join criteria in the WHERE clause Structure views to enable join pruning If a view contains a left or right outer join, TDV can trim references to tables or views that are not referenced in a query © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Influencing Execution Plans Cost Based Optimization Gather statistics for queried tables or views Statistics allows the TDV optimizer to modify queries to improve efficiency (i.e. join reordering) Statistics can be gathered for individual columns or an entire table Different levels of statistics granularity are available Statistics collection should be scheduled during off peak hours to avoid overloading TDV or the data sources Provide cardinality hints Allows you to test cost based optimization without gathering statistics If no cardinality information is available, TDV assumes a table has 1 million rows © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Caching TDV enables definition of caches on views and procedures to temporarily returned results The primary purpose of caching is to protect your data sources from load Caching can be used to improve query performance in some cases if used correctly Enable push down where the original source can’t Precompute expensive results of expensive or slow joins Cache to a faster data source than the original source system Remember: Caching does not guarantee improved performance! We will examine file caching, single table caching, multi table caching and incremental caching © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Caching – Types of Caches File Caching Stores data on the a file system accessible to the TDV server Requires a full table scan for each access Best used for small data sets that need to be completely retrieved each time © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Caching – Types of Caches Single Table Database Caching Stores data in a single database table accessible to TDV Push down optimizations are generally possible During a refresh, TDV loads a new result set into the same storage table while serving up requests from an existing result set Maintenance of indexes can be problematic © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Caching – Types of Caches Multi Table Database Caching Maintains multiple parallel copies of a table for data storage TDV serves data from one copy while a refresh is loading another table Once a refresh finishes, TDV automatically switches to the newly loaded table TDV can automatically manage creating and dropping indexes on tables before and after a refresh TIBCO recommends a minimum of three parallel tables for storage Database storage requirements are higher than single table caches © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Incremental Caching Incremental caches do not completely refresh once populated. Only changes to cached data are applied to the cache Incremental caching is only applicable to single table caches Incremental caching can significantly reduce the overall time and load needed for cache maintenance if: The size of the data set is sufficiently large Only a small subset of the data changes at any given time Two types of incremental caches are possible in TDV; pull- and push-based © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Incremental Caching Pull-based Caches TDV actively checks for changed records at regular intervals Data must have some way of identifying changed records (timestamp, change code) Easier to implement, TDV provides hooks for procedures Disadvantage is that TDV actively has to check for changes, so there is always a latency period between when data is generated and when it’s in cache © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Incremental Caching Push-based Caches Requires the use of an external change data capture product such as Oracle Golden Gate and JMS queues TDV passively listens to a JMS queue for change notifications and updates caches Generally results in faster cache updates Much more complex to implement and maintain © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Case Sensitivity and Trailing Spaces Mismatches Case sensitivity and trailing space mismatches often occur in environments with many different database systems Case sensitivity and trailing spaces mismatches only occur with the following conditions: There is a mismatch between TDV and the underlying data source’s case sensitivity and/or trailing spaces settings There is a join or where clause with a CHAR or VARCHAR field in the test condition We will look at common default settings and the effect of a setting mismatch on execution plan behavior © Copyright 2000-2018 TIBCO Software Inc.
Ignore Trailing Spaces Common Database Default Settings Default Settings Product Case Sensitive Ignore Trailing Spaces TDV Server False True MySQL Oracle Sybase Informix MS SQL Server © Copyright 2000-2018 TIBCO Software Inc.
Underlying Datasource Setting Mismatch Effects on TDV Query Plan TDV Setting Underlying Datasource Setting Effect on Joins Effect on Where Clause CS = true None CS = false Prevents JOINs being pushed down to same datasource involving CHAR or VARCHAR fields. Performs WHERE clause string comparison in TDV in addition to pushing down to database. Manifested in query plan as FILTER nodes. Cannot use more efficient Sort Merge algorithm between sources with conflicting settings. Optimizer reverts to Hash Join algorithm. Adds UPPER() function to query plan FETCH node. May invalidate datasource indexes. ITS = true ITS = false Adds RTRIM() function to query plan FETCH node. May invalidate datasource indexes. Will prevent JOINs from being pushed down to same datasource involving VARCHAR fields. Legend CS case sensitive setting ITS ignore trailing spaces setting © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Case Sensitivity and Trailing Spaces Mismatches What can happen if the settings don’t match? Some operations may not be pushed down to a data source Certain join algorithms may not be chosen by TDV Result sets may not match the expected values Joins and filters on non-string values are not affected TDV settings can be overridden on a view by view bases using optimizer hints Warning: Hints should only be used at the presentation layer to avoid unexpected interactions with other hints TDV’s case sensitivity and ignore trailing space settings represent a contract with the end user Any changes to these settings should be carefully considered and tested © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Tuning Externally Generated Queries Some applications such as BI tools may generate their own SQL that encounters performance issues on TDV Developers may have limited or no ability to affect the structure of the generated query Generated queries are very rarely human readable The TDV rule based optimizer is not able to restructure such a query. How does one tune such a query? Manual analysis can still be done to identify some issues like excessive group by operations Try converting the query into a TDV view and generate an execution plan Turn on statistics for the source tables or provide cardinality hints © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. More Information A lot of the information presented here is a high level summary For more detailed information, see the following materials TDV Basic Training Modules Query Engine Caching TDV Reference Manual TDV User Guide TDV Performance Tuning Whitepaper © Copyright 2000-2018 TIBCO Software Inc.
© Copyright 2000-2018 TIBCO Software Inc. Thank you! Presenter’s Name Presenter’s Email Presenter’s Phone © Copyright 2000-2018 TIBCO Software Inc.