OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.

Slides:



Advertisements
Similar presentations
OLTP Compared With OLAP
Advertisements

Dimensional Modeling.
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Data Warehousing M R BRAHMAM.
Chapter 13 Business Intelligence and Data Warehouses
Database Systems: Design, Implementation, and Management Tenth Edition
Data Warehousing and OLAP
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Chapter 15 Data Warehousing, OLAP, and Data Mining
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Lead Black Slide. © 2001 Business & Information Systems 2/e2 Chapter 7 Information System Data Management.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
DATA WAREHOUSE (Muscat, Oman).
CS346: Advanced Databases
Designing a Data Warehouse
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “ An Introduction to Multidimensional Database Technology,
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
OnLine Analytical Processing (OLAP)
Datawarehouse Objectives
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
BI Terminologies.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Ahsan Abdullah 1 Data Warehousing Lecture-10 Online Analytical Processing (OLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
Data Warehousing and OLAP. Warehousing ► Growing industry: $8 billion in 1998 ► Range from desktop to huge:  Walmart: 900-CPU, 2,700 disk, 23TB Teradata.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
1 On-Line Analytic Processing Warehousing Data Cubes.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Data Warehousing Multidimensional Analysis
Data Mining Data Warehouses.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.
MIS2502: Data Analytics Advanced Analytics - Introduction.
What is OLAP?.
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Data Warehousing.
Advanced Database Concepts
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins –
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Introduction to SQL Server Analysis Services
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 Business Intelligence and Data Warehouses
On-Line Analytic Processing
Chapter 13 The Data Warehouse
On-Line Analytical Processing (OLAP)
Data Warehouse and OLAP
Data Warehousing and OLAP
Introduction of Week 9 Return assignment 5-2
DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation
Data Warehouse and OLAP
Data Warehousing.
Presentation transcript:

OLAP Tuning

Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning the cube Data Mining Dennis Shasha and Philippe Bonnet, 2013

OLAP Online Analytical Processing – OLAP enables a user to interactively and selectively extract and view data from different points-of-view. – Typical OLAP queries Find sales for seniors in Copenhagen (selection) Find sales per age group, per city (aggregation) Find sales per age group, per country (aggregation) Find total sales (aggregation) Find sales for seniors, per country (selection, aggregation) Selections & Aggregations on Multi-dimensional Data

Data Warehouse Data Warehouse (DW) Data Warehouse (DW) Production Data Production Data Production Data Production Data Production Data Production Data Mart Data Mart Data Mart Data Mart Data Mart Data Mart Data Mart Data Mart Transactional ProcessingAnalytical Processing Infrequent updates Data in DW MUST NOT be latest, up to date version

ROLAP, MOLAP, HOLAP MOLAP – DW is a proprietary system, tailored for multi- dimensional data manipulations Relational OLAP – Multi-dimensional data mapped onto tables, and manipulations mapped onto relational queries Hybrid OLAP – Relational systems extended with specific OLAP functionalities

Star Schema Fact table Sales(Product_Id, Time_Id, City_id, Amount) Multi-dimensional data Can be represented as a (hyper-)cube – 3 dimensions: Product, Time, City – The cube contains Amount values Dimensions Product(Product_id, Name, Category, Price) Space(City_id, City, Country, Region) Time(Time_id, Week, Month, Quarter) Typically organized in a hierarchy

Drill Down and Roll Up Dimensions as aggregation hierarchy Drill down – Series of queries that moves down the aggregation hierarchy – E.g., per region, per country, per city Roll up – Series of queries that moves up the aggregation hierarchy – E.g., per week, per month, per year Same form of SQL query, different attributes – When rolling up, query results can be re-used An aggregation can be used as a basis for an aggregation one or more levels up in the hierarchy

Pivoting Data as a cube which is pivoted so that a user can “see” its various faces – Pivoting on dimensions D1, D2, D3 means grouping by attributes from these dimensions – New pivot on D3, D2, D1 Interesting in case a visualization software is used to represent 3 dimensions as x, y, z in space Interesting if there are N dimensions, and the pivot concerns a subset of these dimensions

Slicing and Dicing Slice – A value is given for a dimension attribute in the where clause – We take a “slice” of the cube Dice – Multiple values (or a range) are given for a dimension attribute in the where clause – We are dicing, i.e., reduce the size of, the original cube

Star Schema Operations Write the following sequence of queries in SQL on the sales star- schema – Original cube: Sales amount per country, per week, per category – Roll-up on time Sales amount per country, per month, per category Sales amount per country, per year, per category – Drill-down on city Sales amount per city, per year, per category – Pivot on product, time and space Sales amount per category, per year, per city – Slice on year 2012 Sales amount per category, per city for 2012 – Dice on the last three years Sales amount per category, per city for 2010,2011,2012

Star Schema Operations What are the SQL queries you need to construct the following table Product 1Product 2Product 3Total City City City Total

The CUBE Operator SELECT city_id, product_id, SUM(amount) as sum_a FROM SALES GROUP BY CUBE (city_id, product_id) Defined by Jim Gray et al. in 1996Jim Gray et al. Part of the SQL standard Supported in Oracle, DB2, SQL ServerOracleDB2SQL Server ROLAP implementation City_idProduct_idSum_a City1Product 1520 City1product2230 City1product3100 City1ALL850 City2Product 110 City2product215 City2product310 City2ALL35 City3Product City3product21200 City3product31000 City3ALL3200 ALLProduct ALLproduct21445 ALLproduct31110 ALL 4085

The ROLLUP Operator SELECT city_id, product_id, SUM(amount) as sum_a FROM SALES GROUP BY city_id, product_id with ROLLUP Part of the SQL standard Supported in Oracle, DB2, SQL Server, MySQLOracleDB2 SQL ServerMySQL City_idProduct_idSum_a City1Product 1520 City1product2230 City1product3100 City1ALL850 City2Product 110 City2product215 City2product310 City2ALL35 City3Product City3product21200 City3product31000 City3ALL3200 ALL 4085

Tuning the Cube Materialized Views – To materialize the original cube and the result of important cube manipulations (those that are re-used often) Indexes – Speeding up foreign-key/primary-key joins Dimensions as index-organized tables (clustering index on primary key) Non-clustered index on foreign key in fact table – Indexing low-cardinality attributes Bitmap index (Oracle)Oracle – SQL Server columnstore indexes SQL Server columnstore indexes Compression – Speeding up scans, reducing DW footprint on 2 nd storage Column-Oriented Representation – Great for slicing, dicing – Great for compression – Great for leveraging RAM, Processor cache Parallelism – Work on dimensions in parallel, Speeding up scans – Tuning degree of parallelism in ORACLE, DB2ORACLEDB2

Data Warehouse Appliances Exadata Data Sheet Normal Table Scan vs. Exadata Smart Scan See B.Durrett’s slides

Column Stores Columnar representation – Compression & Scan efficiency – Tailored for RAM, processor cache utilization VectorWise SQL Server ColumnStore indexes

Data Mining 101 Boundaries of Data Management, Statistics and Machine Learning Finding Patterns in Large Data Sets – Associations Many buy Product1 and Product3 together – Classification Given some predefined classes (e.g.., StaysInBusiness, GoesOutOfBusiness) train a classifier to distinguish in which class a store belongs based on its sales records Might be used for prediction – Clustering Like classfication but the classes are not given beforehand. They are discovered by the clustering algorithm.

Associations An association is a correlation between values in the same or different columns – Noted Predicate1 => Predicate2 – Example: Purchases_Diaper => Purchases_Beer Confidence and Support – Confidence (rule): percentage of where Predicate2 is true when Predicate1 is true – Support (itemset): percentage of records where all attribute values needed by the rule are present Confidence and support must be over a given threshold so that an association holds

Classification Decision tree, Neural networks – Multiple variables analysis – Learning algorithm (training set) Example: Titanic survivorsTitanic survivors Sex == M Age > 9.5 Number of Siblings > 2.5 T F Survived (36%) T T F F Survived (2%) Died (61%) Died (2%)

Clustering K-means algorithm 1.Each (of the given K) cluster is given a centroid 2.Form clusters by assigning points to cluster with closest centroid (distance is defined) 3.Recompute cluster centroid 4.Repeat 2,3 until centroids do not move Other techniques – Hierarchical clustering (e.g., BIRCH), Support Vectors

Tuning for Mining Tuning Scans – Most algorithms require several passes over the data – Parallelism & compression Statistics features in systems – Predictive Analytics in SQL Server Predictive Analytics – Data mining features of DB2 Data mining features – Oracle DataMinerDataMiner