SQL/OLAP Sang-Won Lee Let’s e-Wha! Email: swlee@ewha.ac.kr URL: http://home.ewha.ac.kr/~swlee Jul. 12th, 2001 SQL/OLAP ISO/IEC JTC1 SC32
Contents Introduction to OLAP and SQL Issues Current OLAP Solutions SQL/OLAP Future OLAP Trends SQL/OLAP ISO/IEC JTC1 SC32
OLAP On-Line Analytical Processing E.F. Codd coined the term “OLAP”([1]) Multi-dimensional data model vs. On-Line Transaction Processing vs. Data warehouse SQL/OLAP ISO/IEC JTC1 SC32
Multi-dimensional Data Model Sales(prod-id,store-id,time-id,qty,amt) Dimension: Product, Store, Time Hierarchy: Product -> Category -> Industry Store->City -> State -> Country Date -> Month -> Quarter -> Year SQL/OLAP ISO/IEC JTC1 SC32
Multi-dimensional Data Model(2) Operations roll-up/drill-down slice/dice pivot ranking comparisons drill-across etc. Example for each state show me top 10 products based on total sales what is the percentage growth of Jan-99 total sales over total Jan-98? for each product show me the quantity shipped and sold SQL/OLAP ISO/IEC JTC1 SC32
OLAP Operations Many business operations was hard or impossible to express in SQL multiple aggregations comparisons(with aggregation) reporting features Be prepared for serious performance penalty Client and middle-ware tools provide the necessary functionality OLAP server: ROLAP vs. MOLAP SQL/OLAP ISO/IEC JTC1 SC32
Multiple Aggregations Create a 2-dimensional spreadsheets that shows sum of sales by maker as well as model of car Each subtotal requires a separate aggregate query SELECT color, make, sum(amt) FROM sales GROUP BY color, make union SELECT color, sum(amt) GROUP BY color SELECT make, sum(amt) GROUP BY make SELECT sum(amt) RED WHITE BLUE Chevy Ford By Make By Color Sum Cross Tab SQL/OLAP ISO/IEC JTC1 SC32
Comparisons Examples: last year’s sales vs. this year’s sales for each product requires a self-join VIEW: create or replace view v_sales as select prod-id, year, sum(qty) as sale_sum from sales group by prod-id, year; QUERY: select cur.year cur_year, cur.sale_cur_sales, last.sum last_sales from v_sales curr, v_sales last where curr.year=(last.year+1) SQL/OLAP ISO/IEC JTC1 SC32
The Data CUBE Relational Operator Generalizes Group By and Aggregates Sum Aggregate RED WHITE BLUE By Color Sum Group By (with total) RED WHITE BLUE Chevy Ford By Make By Color Sum Cross Tab The Data Cube and The Sub-Space Aggregates CHEVY FORD 1990 1991 1992 1993 By Year By Make By Make & Year RED WHITE BLUE By Color & Year By Make & Color Sum By Color source:[6] SQL/OLAP ISO/IEC JTC1 SC32
Getting Sub-totals: ROLLUP Operation SELECT year, brand, SUM(qty) FROM sales GROUP BY ROLLUP (year, brand); YEAR BRAND SUM(qty) 1996 Ford 250 1996 Honda 300 1996 Toyota 450 1997 Ford 300 … 1996 1000 ROLLUP Operator The ROLLUP operator can be used obtain sub-totals in a query. ROLLUP grouping is an extension to the GROUP BY clause in a query that produces a sub-totals in addition to the “regular” grouped rows. The sub-totals are rows that contain further aggregates whose values are derived by applying the same set functions that were used to obtain the grouped rows. In the example, the use of the ROLLUP operator produces the following rows in addition to the sum of AMOUNT for each YEAR-BRAND pair: Sum of QTY for each year Grand total of QTY for all the years If there are 2 years and 3 products as shown in the example, a GROUP BY operation without ROLLUP would produce at most 6 rows (m*n), while the same operation using ROLLUP could result in 9 rows (m*(n+1)+1). 1997 1200 2200 SQL/OLAP ISO/IEC JTC1 SC32
Getting Cross-tabs: CUBE Operation SELECT year, brand, SUM(amount) FROM sales GROUP BY CUBE (year, brand); YEAR BRAND SUM(AMOUNT) 1996 Ford 250 ... 1996 Toyota 450 1997 Ford 300 ... 1997 1200 2200 Ford 550 Honda 650 Toyota 1000 A CUBE grouping is an extension to the GROUP BY within a query that produces a result set that contains sub-totals for every possible combinations of the columns or expressions in the GROUP BY clause. The aggregation of those rows are known as n-dimensional cross-tabulation, and produces cubes as discussed earlier. In the example shown the CUBE operator would produce productwise sub-totals in addition to all the aggregates produced by the ROLLUP operation. Distinguishing NULLs: GROUPING Function In the result of a ROLLUP or CUBE operation, a NULL value is used to represents the “ALL” value - that is, in the total for year-1, a NULL is shown against the product column. This presents a problem where a user cannot differentiate between an actual NULL and a null representing “ALL”. A new function GROUPING is used to distinguish between the two types of NULLs. Grouping is a set function that returns the value 1 if the value in the column in the row is a NULL that represents the set of all values resulting from a ROLLUP or CUBE operation. Where the NULL represents traditional NULL, the function returns the value 0. SQL/OLAP ISO/IEC JTC1 SC32
Flexible Grouping: GROUPING_SETS Operator SELECT year, brand, color, SUM(qty) FROM sales GROUP BY GROUPING_SETS ((year, brand), (brand,color),()); YEAR BRAND COLOR SUM(QTY) 1996 Ford 250 1996 Honda 300 1996 Toyota 450 1997 Ford 300 1997 Honda 350 1997 Toyota 550 Ford Blue 400 Ford Red 150 Honda Blue 650 Toyota Red 700 Toyota White 300 2200 Year, Brand Brand, Color The GROUPING_SETS operator can be used to group multiple unrelated groupings without the need to use set operations. Grand total SQL/OLAP ISO/IEC JTC1 SC32
LAG Operator SQL> SELECT timekey, sales 2 LAG(sales, 12) OVER 3 (ORDER BY timekey) AS sales_last_year, 4 (sales - sales_last_year) AS sales_change 5 FROM sales; TIMEKEY SALES SALES_LAST_YEAR SALES_CHANGE 98-1 1100 - - ….. … … ... 99-1 1200 1100 100 99-2 1500 1450 50 99-3 1700 1350 250 99-4 1600 1700 -100 99-5 1800 1600 200 99-6 1500 1450 50 99-7 1300 1250 50 99-8 1400 1200 200 SQL/OLAP ISO/IEC JTC1 SC32
MOVING Average SELECT time-id, avg(sum(qty)) over (order by time-id RANGE INTERVAL ‘2’ DAY PRECEDING ) as mvg_avg_sales from sales group by time_id ; SQL/OLAP ISO/IEC JTC1 SC32