2/10/05Salman Azhar: Database Systems1 On-Line Analytical Processing Salman Azhar Warehousing Data Cubes Data Mining These slides use some figures, definitions,

Slides:



Advertisements
Similar presentations
Data Warehousing and Data Mining J. G. Zheng May 20 th 2008 MIS Chapter 3.
Advertisements

OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Data Analysis. Overview Traditional database systems are tuned to many, small, simple queries. Some applications use fewer, more time-consuming, analytic.
OCS Infotech Proprietary & Confidential Typical BI solution Architecture.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Data Warehousing M R BRAHMAM.
Jennifer Widom On-Line Analytical Processing (OLAP) Introduction.
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
Lecture 1: Data Warehousing Based on the slides by Jeffrey D. Ullman and Hector Garcia-Molina at Stanford University 1.
SLIDE 1IS 257 – Fall 2011 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
1 On-Line Application Processing Warehousing Data Cubes Data Mining.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
SLIDE 1IS 257 – Fall 2010 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Tanvi Madgavkar CSE 7330 FALL Ralph Kimball states that : A data warehouse is a copy of transaction data specifically structured for query and analysis.
CS346: Advanced Databases
On-Line Application Processing Warehousing Data Cubes Data Mining 1.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
On-Line Analytic Processing Chetan Meshram Class Id:221.
OnLine Analytical Processing (OLAP)
Atlanta Microsoft Database Forum Introduction to Data Warehousing Concepts Brian Thomas Solution Builders, Inc. Presented by March 8, 2004
Cube Intro. Decision Making Effective decision making Goal: Choice that moves an organization closer to an agreed-on set of goals in a timely manner Goal:
1 On-Line Application Processing Warehousing Data Cubes Data Mining.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
1 Data Warehouses BUAD/American University Data Warehouses.
Data Warehousing.
Module 1: Introduction to Data Warehousing and OLAP
BI Terminologies.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
BUSINESS ANALYTICS AND DATA VISUALIZATION
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
DW-2: Designing a Data Warehousing System 용 환승 이화여자대학교
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
1 On-Line Analytic Processing Warehousing Data Cubes.
On-Line Application Processing Warehousing Data Cubes (Data Mining) (slides borrowed from Stanford)
ADVANCED TOPICS IN RELATIONAL DATABASES Spring 2011 Instructor: Hassan Khosravi.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Data Warehousing.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Databases 2 On-Line Application Processing: Warehousing, Data Cubes, Data Mining.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Defining Data Warehouse Structures Data Warehouse Data Access End User Data Access Data Sources Staging Area Data Marts Data Extract, Transform, and Load.
On-Line Application Processing
Data warehouse.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data Warehouse.
On-Line Analytic Processing
Data warehouse and OLAP
On-Line Analytic Processing
Chapter 13 The Data Warehouse
On-Line Analytical Processing (OLAP)
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Introduction of Week 9 Return assignment 5-2
On-Line Application Processing
Online analytical processing (OLAP) is a category of software technology that enables analysts, managers, and executives to gain insight into data through.
Presentation transcript:

2/10/05Salman Azhar: Database Systems1 On-Line Analytical Processing Salman Azhar Warehousing Data Cubes Data Mining These slides use some figures, definitions, and explanations from Elmasri- Navathe’s Fundamentals of Database Systems and Molina-Ullman-Widom’s Database Systems

2/10/05Salman Azhar: Database Systems2 Overview Traditional database systems tuned to many, small, simple queries Some newer “analytic” applications fewer, more time-consuming, complex queries New architectures developed to handle complex “analytic” queries efficiently

2/10/05Salman Azhar: Database Systems3 The Data Warehouse The most common form of data integration: Copy sources into a single DB (warehouse) and try to keep it up-to-date Usual method: periodic reconstruction of the warehouse, perhaps overnight Warehouse essential for analytic queries

2/10/05Salman Azhar: Database Systems4 OLTP Most database operations involve On- Line Transaction Processing (OTLP). Short, simple, frequent queries and/or modifications Each involving a small number of tuples. Examples… : Looking up a phone number on the web Sales at cash registers Selling airline tickets

2/10/05Salman Azhar: Database Systems5 OLAP Increasing importance of On-Line Application Processing (OLAP) queries Few, but complex queries --- may run for hours. Queries do not depend on having an absolutely up-to-date database. Sometimes called Data Mining

2/10/05Salman Azhar: Database Systems6 OLAP Examples 1. Amazon analyzes customer purchases by its customers to recommend with products of likely interest u Compares purchases between customers u Takes longer than customers are willing to wait 2. Wal-Mart looks for items with sales trends in a region or time period u Presents data to vendors u Used to determine ordering and inventory

2/10/05Salman Azhar: Database Systems7 Common Architecture Databases at branches handle OLTP Local databases copied to a central warehouse overnight (or periodically) Analysts use the warehouse for OLAP OLTP OLAP Analysts Transaction Users

2/10/05Salman Azhar: Database Systems8 Data Warehouse Data Access User Data Access Data Sources Data Input Staging Area Data Marts

2/10/05Salman Azhar: Database Systems9 Star Schemas A star schema common organization for data at a warehouse It consists of… Fact table : a very large accumulation of facts such as sales often “insert-only” Dimension tables : smaller, generally static information about the entities involved in the facts

2/10/05Salman Azhar: Database Systems10 Star Schema Fact Table Dimension TableEmployee_DimEmployee_Dim EmployeeKey EmployeeID... EmployeeID... Time_DimTime_Dim TimeKey TheDate... TheDate...Product_DimProduct_Dim ProductKey ProductID... ProductID... Customer_DimCustomer_Dim CustomerKey CustomerID... CustomerID... Shipper_DimShipper_Dim ShipperKey ShipperID... ShipperID... Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey TimeKey EmployeeKey ProductKey CustomerKey ShipperKey Sales Amount Unit Sales... Sales Amount Unit Sales...

2/10/05Salman Azhar: Database Systems11 Example: Star Schema Suppose we want to record in a warehouse information about every car sale: dealer, car, buyer, day, time, price paid The fact table is a relation: Sale(dealer, model, buyer, day, time, price)

2/10/05Salman Azhar: Database Systems12 Example, Continued The dimension tables include information about the dealer, car, and buyer “dimensions”: Dealer(dealer, city, zip) Car(model, manufacturer) Buyer( buyer, city, phone) Recall the fact table: Sale(dealer, model, buyer, day, time, price)

2/10/05Salman Azhar: Database Systems13 Dimensions and Dependent Attributes Two classes of fact-table attributes: Dimension attributes : the key of a dimension table Sale(dealer, model, buyer, day, time, price) Dependent attributes : a value determined by the dimension attributes of the row Sale(dealer, model, buyer, day, time, price) E.g., price determined by the combination of dealer, model, buyer, day, time

2/10/05Salman Azhar: Database Systems14 Example: Dependent Attribute price is determined by the combination of dimension attributes: dealer, car, buyer, and the time (combination of day and time attributes).

2/10/05Salman Azhar: Database Systems15 Approaches to Building Warehouses u ROLAP = “relational OLAP”: u Tune a relational DBMS to support star schemas u MOLAP = “multidimensional OLAP”: u Use a specialized DBMS with a model such as the “data cube”

2/10/05Salman Azhar: Database Systems16 ROLAP Techniques u Bitmap indexes : u For each key value of a dimension table (e.g., each model for relation Cars) u create a bit-vector telling which tuples of the fact table have that value u Materialized views : u Store the answers to several useful queries (views) in the warehouse itself u Stored views!

2/10/05Salman Azhar: Database Systems17 Typical OLAP Queries Often, OLAP queries begin with a “star join”: the natural join of the fact table with all or most of the dimension tables Recall the tables: Sales(dealer, model, buyer, day, time, price) Dealers(dealer, city, zip) Cars(model, manufacturer ) Buyers(buyer, city, phone) Example: SELECT * FROM Sales, Dealers, Cars, Buyers WHERE Sales.dealer = Dealers.dealer AND Sales.model = Cars.model AND Sales.buyer = Buyers.buyer;

2/10/05Salman Azhar: Database Systems18 Typical OLAP Queries The typical OLAP query will: 1. Start with a star join 2. Select for interesting tuples, based on dimension data 3. Group by one or more dimensions 4. Aggregate certain attributes of the result

2/10/05Salman Azhar: Database Systems19 Example: OLAP Query For each dealer in Indianapolis find the total sales of each car manufactured by BMW Filter: city = “Indianapolis” manf = “BMW” Grouping: by dealer and car Aggregation: Sum of price GROUP EXERCISE: Write the SQL Query Note: Do not turn over to the next page before attempting this exercise yourself!

2/10/05Salman Azhar: Database Systems20 Example: In SQL SELECT dealer, model, SUM(price) FROM Sales NATURAL JOIN Dealers NATURAL JOIN Cars WHERE Dealer.city = ’Indianapolis’ AND Car.manf = ’BMW’ GROUP BY dealer, model;

2/10/05Salman Azhar: Database Systems21 Using Materialized Views A direct execution of this query from Sales and the dimension tables could take too long If we create a materialized view that contains enough information, we may be able to answer our query much faster

2/10/05Salman Azhar: Database Systems22 Example: Materialized View Which views could help with our query? Key issues: 1. It must join Sales, Dealers, and Cars, at least 2. It must group by at least dealer and car 3. It must not select out Indianapolis Dealers or BMW Cars 4. It must not project out city or manf

2/10/05Salman Azhar: Database Systems23 Example --- Continued Here is a materialized view that could help: CREATE VIEW vSales(dealer, city, car, manf, sales) AS SELECT dealer, city, model, manf, SUM(price) sales FROM Sales NATURAL JOIN Dealers NATURAL JOIN Cars GROUP BY dealer, city, model, manf; Since dealer -> city and model -> manf, there is no real grouping. We need city and manf in the SELECT.

2/10/05Salman Azhar: Database Systems24 Example --- Concluded Here’s our query using the materialized view vSales: SELECT dealer, car, sales FROM vSales WHERE city = ’Indianapolis’ ANDmanf = ’BMW’;

2/10/05Salman Azhar: Database Systems25 MOLAP and Data Cubes Keys of dimension tables are the dimensions of a hypercube Example: for the Sales data, the four dimensions are Dealers, Cars, Buyers, and time Dependent attributes (e.g., price) appear at the points of the cube

2/10/05Salman Azhar: Database Systems26 Defining a Cube Q4 Q1Q2Q3 Time Dimension Products Dimension Detroit Denver Chicago Market Dimension Apples Cherries Grapes Atlanta Melons

2/10/05Salman Azhar: Database Systems27 Q4 Q1Q2Q3 Time Dimension Products Dimension Dallas Denver Chicago Markets Dimension Apples Cherries Grapes Atlanta Sales Fact Melons Querying a Cube

2/10/05Salman Azhar: Database Systems28 Apples Q4 Q1Q2Q3 Time Dimension Products Dimension Detroit Denver Chicago Atlanta Markets Dimension Melons Cherries Grapes Defining a Cube Slice

2/10/05Salman Azhar: Database Systems29 Working with Dimensions and Hierarchies Dimensions Allow You to Slice Dice Hierarchies Allow You to Drill Down Drill Up

2/10/05Salman Azhar: Database Systems30 Marginals The data cube also includes aggregation (typically SUM) along the margins of the cube The marginals include aggregations over one dimension, two dimensions,…

2/10/05Salman Azhar: Database Systems31 Example: Marginals Our 4-dimensional Sales cube includes the sum of price over each dealer, each car, each buyer, and each time unit (perhaps days) It would also have the sum of price over all dealer-model pairs, all dealer-buyer-day triples,…

2/10/05Salman Azhar: Database Systems32 Structure of the Cube Think of each dimension as having an additional value * A point with one or more *’s in its coordinates aggregates over the dimensions with the *’s. Example: Sales(“Auto Nation”, “Mini Cooper”, *, *) holds the sum over all Buyers and all time of the Mini Coopers bought at AutoNation

2/10/05Salman Azhar: Database Systems33 Drill-Down Drill-down = “de-aggregate” = break an aggregate into its constituents Example: having determined that Auto Nation sells very few BMW Cars, break down his sales by particular car

2/10/05Salman Azhar: Database Systems34 Roll-Up Roll-up = aggregate along one or more dimensions. Example: given a table of how many Mini Coopers each buyer buys at each dealer, roll it up into a table giving total number of Mini Coopers bought by each buyer

2/10/05Salman Azhar: Database Systems35 Materialized Data-Cube Views Data cubes invite materialized views that are aggregations in one or more dimensions Dimensions may not be completely aggregated an option is to group by an attribute of the dimension table

2/10/05Salman Azhar: Database Systems36 Example A materialized view for our Sales data cube might: 1. Aggregate by buyer completely 2. Not aggregate at all by car 3. Aggregate by time according to the week 4. Aggregate according to the city of the dealer

2/10/05Salman Azhar: Database Systems37 Data Mining Data mining is a popular term for queries that summarize big data sets in useful ways Examples: 1. Clustering all Web pages by topic 2. Finding characteristics of fraudulent credit-card use

2/10/05Salman Azhar: Database Systems38 Market-Basket Data An important form of mining from relational data involves market baskets sets of “items” that are purchased together as a customer leaves a store Summary of basket data is frequent itemsets sets of items that often appear together in baskets

2/10/05Salman Azhar: Database Systems39 Example: Market Baskets If people often buy bread and butter together, the store can: 1. Put bread and butter near each other and put potato chips between the two 2. Run a sale on bread and raise the price of butter

2/10/05Salman Azhar: Database Systems40 Finding Frequent Pairs The simplest case is when we only want to find “frequent pairs” of items. Assume data is in a relation Baskets(basket, item) The support thresholds is the minimum number of baskets in which a pair appears before we are interested

2/10/05Salman Azhar: Database Systems41 Frequent Pairs in SQL SELECT b1.item, b2.item FROM Baskets b1, Baskets b2 WHERE b1.basket = b2.basket AND b1.item < b2.item GROUP BY b1.item, b2.item HAVING COUNT(*) >= s; Look for two Basket tuples with the same basket and different items. First item must precede second, so we don’t count the same pair twice. Create a group for each pair of items that appears in at least one basket. Throw away pairs of items that do not appear at least s times.

2/10/05Salman Azhar: Database Systems42 Summary OLAP vs. OLTP Two different worlds Warehousing Data Cubes Data Mining Materialized views Storing aggregate data