The Model Clause explained Tony Hasler, UKOUG Birmingham 2012 Tony Hasler, Anvil Computer Services Ltd.
Who is Tony Hasler? Google Tony Hasler! My blog contains all the material from this presentation on the front page Additional blog entry (October) on the model clause Tony Hasler
Health Warning about the model clause No support for Active Session History (ASH) whilst on the CPU Limited support for SQL Performance Monitor No session statistics related to modelling No trace events for diagnosis (that I can find) Performance is poor when data spills to disk Introduced in 10g but no new features in 11g The main example in chapter 22 (10g) or chapter 23 (11g) in the Data Warehousing Guide (calculating mortgage amortization) is incorrect and always has been (mortgage fact ‘Payment’ should be ‘PaymentAmt’) Perhaps this is not a flagship component! However, the model clause: −Enables you to do stuff that cannot otherwise be (easily) done in SQL −The parallelisation capabilities can be useful in some cases Tony Hasler
What is in this talk and what is not IN: −What the model clause is −What it is used for and what it isn’t −Syntax and semantics of the main features −Recommendations on how to use the model clause −A real life example Not in due to time constraints: −An exhaustive list of all model clause features −This can be found in the Data Warehousing Guide Chapter 22 (10g) or Chapter 23 (11g) −No demos Tony Hasler
What the model clause is and what is it used for Part of a query block meaning it can appear after any select keyword. Provides spreadsheet like capabilities inside SQL Can perform unimplemented SQL features, e.g. moving medians. => Just because something is inefficient to execute doesn’t mean you don’t need to do it! Iteration is possible, such as with ‘Feedback loops’ (see Example 4 Calculating Using Simultaneous Equations in the Data Warehousing guide) A series of calculations within the model can be sequenced manually or dependencies can be determined by the engine (only use when necessary, not because of laziness) Tony Hasler
Model terms compared with Excel Tony Hasler Rows/columns Dimensions Worksheets Partitions Formulas Rules Values Measures
Comparison of Model clause terms with Excel spreadsheet terms Model TermExcel termDifferences PARTITIONWorksheetIt is possible for formulas in one Excel worksheet to reference cells in another. It is not possible for partitions in a model clause to reference each other. DIMENSIONRow and columnIn Excel there are always exactly two dimensions. In a model clause there are one or more. MEASUREA cell valueIn Excel only one value is referenced by the dimensions. The model clause allows one or more measures. RULESFormulasNested cell references possible with models Tony Hasler
Overview of a query block with no model clause Tony Hasler ORDER BY clause Scalar, aggregate, and analytic clauses allowed Joins and outer join predicates Only scalar functions allowed WHERE clause: inner join predicates and selection predicates Only scalar functions allowed Logically precedes GROUP BY clause Only scalar functions allowed Logically precedes HAVING clauseScalar and aggregate functions allowed DISTINCT Scalar, aggregate, and analytic functions allowed Logically precedes Analytic calculations SELECT list Logically precedes
Overview of a query block with a model clause Tony Hasler Joins, join predicates, selection predicates and GROUP BY clause Only scalar functions allowed HAVING clause Scalar and aggregate functions allowed Logically precedes Model PARTITION, DIMENSION and MEASURES clauses Scalar, aggregate, and analytic functions allowed Logically precedes Model RULES clause Independent calculations SELECT list Only scalar functions on model outputs allowed Logically precedes ORDER BY clause Only scalar functions on model outputs allowed Logically precedes Analytic calculations DISTINCT Logically precedes
Recommendations on use of the model clause Use factored subqueries to avoid mixing aggregate functions and analytic functions with the model clause Use only scalar values as inputs to the model clause Perform all calculations in the model clause so that… The final select list is just a simple list of identifiers Be careful using the model clause in a sub query as the CBO assumes that the cardinality of the output of a model clause is the same as the cardinality of the input. Tony Hasler
A simplified version of the real world problem CREATE TABLE stock_holdings ( customer_name VARCHAR2 (100),stock_name VARCHAR2 (100),business_date DATE,VALUE NUMBER ); Calculate the moving one year average (mean) value and moving standard deviation, and zscore for each customer, stock, and business date. Tony Hasler
Mathematical terms loosely explained The Average (mean) is a “typical value” The Standard Deviation is a “typical difference between a value and the mean” The Zscore is the number of standard deviations that a particular value deviates from the mean. In other words it is a measure of how unusual a value is. Tony Hasler
Problem 1: Oracle doesn’t support moving intervals of months or years properly!!! WITH q1 AS ( SELECT DATE ' ' + ROWNUM mydate FROM DUAL CONNECT BY LEVEL <= 400),q2 AS (SELECT mydate,COUNT (*) OVER ( ORDER BY mydate RANGE BETWEEN INTERVAL '1' YEAR PRECEDING AND CURRENT ROW) cnt FROM q1) SELECT * FROM q2 WHERE mydate = DATE ' '; Result: MYDATE CNT 10/01/ Tony Hasler
Model clause solution (1) WITH q1 AS ( SELECT DATE ' ' + ROWNUM mydate, 0 cnt FROM DUAL CONNECT BY LEVEL <= 400) SELECT mydate, cnt FROM q1 MODEL RETURN UPDATED ROWS DIMENSION BY (mydate) MEASURES (cnt) RULES (cnt [DATE ' '] = COUNT (*) [mydate BETWEEN ADD_MONTHS (CV () + 1, -12) AND CV ()]); Tony Hasler
Model clause solution (2) CREATE TABLE business_dates ( business_date PRIMARY KEY NOT NULL,business_days_in_year NOT NULL,first_day_in_year NOT NULL,business_day_number NOT NULL ) ORGANIZATION INDEX AS WITH q1 AS ( SELECT DISTINCT business_date, 0 AS business_days_in_year, SYSDATE AS first_day_in_year, 0 AS business_day_number FROM stock_holdings ) SELECT business_date,business_days_in_year,first_day_in_year,business_day_number FROM q1 MODEL DIMENSION BY (business_date) MEASURES(business_days_in_year,first_day_in_year,business_day_number) RULES ( first_day_in_year[ANY] = ADD_MONTHS(CV(business_date)+1,-12), business_days_in_year[ANY] = COUNT(*)[business_date BETWEEN first_day_in_year[CV()] AND CV()], business_day_number[ANY] = ROW_NUMBER() OVER(ORDER BY BUSINESS_DATE)); Tony Hasler
Execution plan for previous statement | Id | Operation | Name | | 0 | CREATE TABLE STATEMENT | | | 1 | LOAD AS SELECT | BUSINESS_DATES | | 2 | SQL MODEL ORDERED | | | 3 | VIEW | | | 4 | HASH UNIQUE | | | 5 | TABLE ACCESS FULL | STOCK_HOLDINGS | | 6 | WINDOW (IN SQL MODEL) SORT| | Tony Hasler
Problem 2: avoiding data densification Oracle recommends the use of partitioned outer joins for data densification to simplify analytic functions. The model clause can also be used for data densification In my case, however, this would have multiplied the number of rows enormously as customers tended to hold particular stocks for about 10% of the time. We can use the model clause to avoid this by performing multiple calculations in sequence Tony Hasler
Model clause solution SELECT customer_name,stock_name,business_date,first_day_in_year,VALUE,mov_avg,mov_stdd,zscore FROM stock_holdings sh, business_dates bd WHERE sh.business_date=bd.business_date MODEL PARTITION BY (customer_name,stock_name) DIMENSION BY (sh.business_date) MEASURES (VALUE,business_days_in_year, first_day_in_year,0 AS mov_avg, 0 AS mov_stdd, 0 AS zscore, 0 AS mov_sum, 0 AS mov_cnt) RULES ( mov_sum[ANY] = SUM(VALUE)[business_date BETWEEN first_day_in_year[CV()]AND CV()], mov_cnt[ANY] = COUNT(*) [business_date BETWEEN first_day_in_year[CV()] AND CV()], mov_avg[ANY] = mov_sum[CV()]/ business_days_in_year[CV()], mov_stdd[ANY] = SQRT(( (VAR_POP(VALUE)[business_date BETWEEN first_day_in_year[CV()] AND CV()] *mov_cnt[CV()]) +(mov_sum[CV()]* (AVG(VALUE)[business_date BETWEEN first_day_in_year[CV()] AND CV()] - mov_avg[CV()]))) /business_days_in_year[CV()] ), zscore[ANY] = DECODE(mov_stdd[CV()],0,0,(VALUE[CV()]-mov_avg[CV()])/mov_stdd[CV()]) ); Tony Hasler
Parallel execution plan | Id | Operation | Name | | 0 | SELECT STATEMENT | | | 1 | PX COORDINATOR | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | | 3 | BUFFER SORT | | | 4 | SQL MODEL ORDERED | | | 5 | PX RECEIVE | | | 6 | PX SEND HASH | :TQ10000 | | 7 | NESTED LOOPS | | | 8 | PX BLOCK ITERATOR | | | 9 | TABLE ACCESS FULL| STOCK_HOLDINGS | |* 10 | INDEX UNIQUE SCAN | SYS_IOT_TOP_78729 | Tony Hasler
Summary The model clause allows you to build your own analytical functions and/or your own analytical windows – amongst other things. The model clause also allows you to parallelise calculations and can be useful even if the calculations are supported without the model clause However, model clause aggregates will be slower than standard analytic functions Performance degrades rapidly when partitions spill to disk. Tony Hasler
Questions Tony Hasler