OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

Supporting End-User Access
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Data Warehousing M R BRAHMAM.
Jennifer Widom On-Line Analytical Processing (OLAP) Introduction.
Data Warehouse IMS5024 – presented by Eder Tsang.
1 Data Warehousing. 2 Data Warehouse A data warehouse is a huge database that stores historical data Example: Store information about all sales of products.
Dimensional Modeling – Part 2
Business Intelligence. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
CS346: Advanced Databases
Business Intelligence Instructor: Bajuna Salehe Web:
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Data warehousing Data Mining.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Cube Intro. Decision Making Effective decision making Goal: Choice that moves an organization closer to an agreed-on set of goals in a timely manner Goal:
BUS1MIS Management Information Systems Semester 1, 2012 Week 6 Lecture 1.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
Data Warehousing.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Operation Data Analysis Hints and Guidelines EGN 5621 Enterprise Systems Collaboration Summer B, 2014.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
1 On-Line Analytic Processing Warehousing Data Cubes.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
ADVANCED TOPICS IN RELATIONAL DATABASES Spring 2011 Instructor: Hassan Khosravi.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Scenario Management Data.
Data Warehousing.
Advanced Database Concepts
The Data Warehouse Chapter Operational Databases = transactional database  designed to process individual transaction quickly and efficiently.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 6 The Data Warehouse Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Data Analysis Decision Support Systems Data Analysis and OLAP Data Warehousing.
Business Intelligence Overview
On-Line Application Processing
Operation Data Analysis Hints and Guidelines
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
On-Line Analytic Processing
Data warehouse and OLAP
Data Warehouses Brief Overview Add ETL Copyright © 2011 Curt Hill.
Applying Data Warehouse Techniques
On-Line Analytical Processing (OLAP)
Typically data is extracted from multiple sources
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Supporting End-User Access
On-Line Application Processing
Retail Sales is used to illustrate a first dimensional model
Applying Data Warehouse Techniques
Applying Data Warehouse Techniques
Best Practices in Higher Education Student Data Warehousing Forum
Data Warehousing.
Presentation transcript:

OLAP On Line Analytic Processing

OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access to single rows of tables –current data only - current products, available flights –supports Operational level of organisational hierarchy

Analysis Support for –tactical (e.g. stock ordering in view of forecast demand –strategic (e.g. where to open a new store) Need to ‘understand the data’ - ‘Business Intelligence’

Requirements Operations –Summation –Statistical analysis - variation for standard –Ranking and percentages –Comparison over time - last year-to-date ‘Static data picture’ to avoid ‘inconsistent reads’ during analysis production, and comparisons between analyses Addition data - forecasting models, external data, structural data (location of tills, of products on shelves)

Analysis pre database 50% of code used to create complex bulk reports on how the company was performing inflexible, costly, untimely ‘raw data’ too large to retain, so aggregation required - restricts year-to year comparison to level of prior aggregation

OLAP/Data warehouse Specialised tools, data structures to support analysis ‘on line’ i.e. on demand Data will be a snap-shot to ensure consistency Hold base ‘fact’ table - raw basic facts - credit card transaction, item sale, booking + multiple analytic dimensions - –time, product, customer, store

SuperMarket ‘Basket’ data Fact - a single line on a till receipt –basket no –date –customer no –product code –till no –quantity –price Dimensions –Customer, Product, Time, Till/Store

Star Schema

Kinds of Attributes Measures –continuously varying –interval,ratio scales –able to be summed, ranked etc Dimensions –nominal scales - no ordering –may be classified in multiple ways Date/Time –interval scale - i.e ordered only –can be treated as dimension and classified

Product dimension Product e.g. size 9 Bata slippers –Product category - shoes Product group - clothing –Size e.g. shoe9 –Range e.g Bata –Product class e.g. cheap

Time Dimension Date e.g. 11 Dec 2002 –Day of Week e.g. Wednesday –Week e.g. 49 –Monthe.g. December Qtr e.g.3 –Year e.g –Promotion period e.g. Pre-Christmas Sales –Season e.g. Autumn

Snowflake schema

Query Processing Consider a typical analytic query –How do sales of clothing vary by day of week in stores in the SW region? –Select product.name,dow.name,sum(qty*price) –from sales, product, productCategory, productGroup,date,DOW –where ( the 5 join conditions) –and (productGroup.name=‘Clothing’) –group by DOW.name,product.name –order by DOW.name,product.name

Difficulties Fact table is huge - must be compressed as much as possible by reducing field sizes etc Since dimension data is smaller and more stable, OK to denormalise to reduce joins –date - add dow.id,month.id,year,fiscalYear,… –Balance required –Denormalised dimensions result in ‘Starflake schema’

Aggregation Simple case : –sales - date,product,cust,store,value –How many aggregations are possible –Store(5) Product(10) Customer(30)

Aggregation operations Cube - all possible aggregations –8 for 3 dimensions Roll-up - aggregate in order –e.g. Product,Store,Cust –4 for 3 dimensions Slice and Dice –Takes parts or slices of a cube of aggregations Drill down –Given an aggregation, expand an aggregated dimension e.g. Expand clothing sales analysis by City

Data Mining OLAP requires a priori assumptions about the categories of interest But a useful category may be ‘hidden’ - can it be discovered a posteriori ? –e.g. identify high risk motor insurance policies by attributes of policy - gender, age, type of vehicle, job, postcode Rule induction (machine learning) methods can be used

Finding relationships –looking for deviations from normal behaviour - e.g. to identify fraudulent transactions in a credit card company –looking for deviations from average e.g. Non- random combinations of goods in a basket - classic example is beer and nappies Requires heavy aggregations, statistical selection, rule induction

ETL Extract - Transform - Load OLAP databases are often said to be read only - but all need periodic updating with new data extracted from sources, validation, re-organising and load, whilst maintaining aggregations and index

Extract –new or changed data from the OLTP –changed product and structural data –external data External data may be in legacy systems, remote databases, flat files

Transform Filtering out bad transaction data Validating against database

Load Load new facts Re-de-normalising if product dimensions change Re-aggregation