Advanced Topics: Business Intelligence Features Data Warehousing, Materialized View, Partitioning, SQL Analytics
Data Warehousing Purpose: Query and Analysis of Data Usually contains historical data derived from transaction data (OnLine Trans. Processing or OLTP) OLAP (OnLine Analytical Processing) Enable Fast execution of ad-hoc queries
Data Warehouse Architecture Data Mart: small, specialized DW for a dept or so DW Purpose: Enable analysis using Unified Data
OLAP example Assume one (real-valued) measure value eg Sales Amount and several finite dimensional attributes e.g. Item, Place, Month. Example fact: “Iced Tea was sold in Auckland in January”. Measure: $20k Maybe no fact about “Iced Tea in Auckland in August”. Mapping Item×Place×Time Sales Amount called a cube. (Think “array”.) Answer queries based on dimensions time, place group by month, country or year, state, etc.) All possible combinations need to be fast.
Logical Design: Star Schema Central “Fact” table storing “measure” attributes such as “sales” Dimension tables store information about the dimension attributes Queries: Find “sales” by product, region, customers, time (year),..
Dimensions Hierarchy of attributes Analysis based on any level in hierarchy sales on region, subregion, country_name,…
Logical Design: Snowflake Schema Snowflake: complex improvisation of star. Products: divided into categories and suppliers (more normalized) Queries: will be slower
Physical Design Tables: Can be regular tables or Partitioned Tables Indexes: Specialized indexes like bitmap Additional Objects: Materialized views, Dimensions
Partitioned Tables Behave like regular tables Query on entire table or on a specific partition Parallelism on multiple partitions Can load/modify multiple partitions concurrently
Query & Analysis Functions All SQL analysis functions SELECT SUM(amout_sold), p.pname FROM sales s, products p, times t WHERE s.product_id = p.product_id GROUP BY p.pname;
More SQL Analytic Functions Conversion Functions ASCIISTR CAST CHARTOROWID COMPOSE CONVERT DECOMPOSE HEXTORAW NUMTODSINTERVAL NUMTOYMINTERVAL RAWTOHEX RAWTONHEX ROWIDTOCHAR ROWIDTONCHAR SCN_TO_TIMESTAMP TIMESTAMP_TO_SCN TO_BINARY_DOUBLE TO_BINARY_FLOAT TO_CHAR (character) TO_CHAR (datetime) TO_CHAR (number) TO_CLOB TO_DATE TO_DSINTERVAL TO_LOB TO_MULTI_BYTE TO_NCHAR (character) TO_NCHAR (datetime) TO_NCHAR (number) TO_NCLOB TO_NUMBER ,… NUMERIC ABS ACOS ASIN ATAN ATAN2 BITAND CEIL COS COSH EXP FLOOR LN LOG MOD NANVL POWER REMAINDER ROUND (number) SIGN SIN SINH SQRT TAN TANH TRUNC (number) WIDTH_BUCKET CHARACATER CHR CONCAT INITCAP LOWER LPAD LTRIM NLS_INITCAP NLS_LOWER NLSSORT NLS_UPPER REGEXP_REPLACE REGEXP_SUBSTR REPLACE RPAD RTRIM SOUNDEX SUBSTR TRANSLATE TREAT TRIM UPPER DATE ADD_MONTHS CURRENT_DATE CURRENT_TIMESTAMP DBTIMEZONE EXTRACT (datetime) FROM_TZ LAST_DAY LOCALTIMESTAMP MONTHS_BETWEEN NEW_TIME NEXT_DAY NUMTODSINTERVAL NUMTOYMINTERVAL ROUND (date) SESSIONTIMEZONE SYS_EXTRACT_UTC SYSDATE SYSTIMESTAMP TO_CHAR (datetime) TO_TIMESTAMP TO_TIMESTAMP_TZ TO_DSINTERVAL TO_YMINTERVAL TRUNC (date) TZ_OFFSET
Analytic Functions AVG, CORR, COVAR_SAMP, COVAR_POP, COUNT, CUME_DIST, DENSE_RANK, FIRST, LAST, LAG, LEAD, MAX, MIN, PERCENTILE_RANK, RANK, ROW_NUMBER, STDDEV, SUM, VARIANCE, …
Materialized Views Queries joining fact and dimension tables Precompute the join and store as Materialized Views Denormalized and so not in 3NF Queries are automatically ‘rewritten’ using Materialized view Materialized views Facilitate fast execution of queries
Summary Fast Query and Analysis Denormalization into Materialized Views Queries implicitly ‘rewritten’ under the covers Research issues: What queries can be rewritten using the MV Pushing the updates to the MVs Refreshing the MVs