1 Exercise Sheet 3 Exercise 7.: ROLAP Algebra Assume that a fact table SalesCube has 3 hierarchies with attributes  ear , Month M, Productgroup P and.

Slides:



Advertisements
Similar presentations
Chapter 13: Query Processing
Advertisements

Vorlesung Datawarehousing Table of Contents Prof. Rudolf Bayer, Ph.D. Institut für Informatik, TUM SS 2002.
1. Find the cost of each of the following using the Nearest Neighbor Algorithm. a)Start at Vertex M.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Clustering.
Digital Circuits.
Cluster Analysis: Basic Concepts and Algorithms
Hierarchical Clustering, DBSCAN The EM Algorithm
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Networks Prim’s Algorithm
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
Databases and Information Systems 1 Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: selectivity.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Manajemen Basis Data Pertemuan Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Chapter 4 The Greedy Approach. Minimum Spanning Tree A tree is an acyclic, connected, undirected graph. A spanning tree for a given graph G=, where E.
Set operators (UNION, UNION ALL, MINUS, INTERSECT) [SQL]
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Machine Learning: Symbol-based 10d More clustering examples10.5Knowledge and Learning 10.6Unsupervised Learning 10.7Reinforcement Learning 10.8Epilogue.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters.
4. Ad-hoc I: Hierarchical clustering
Cluster Analysis: Basic Concepts and Algorithms
Minimum Spanning Network: Brute Force Solution
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Using a straightedge, draw any triangle ABC a)Label the intersection of the perpendicular bisectors as the circumcenter. b)Measure & label the distance.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Clustering Unsupervised learning Generating “classes”
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Presenter : Parminder Jeet Kaur Discussion Lead : Kailang.
1 CUBE: A Relational Aggregate Operator Generalizing Group By By Ata İsmet Özçelik.
CS609 Introduction. Databases Current state? Future?
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
Prof. Bayer, DWH, Ch.4, SS Chapter 4: Dimensions, Hierarchies, Operations, Modeling.
Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.
© 1999 FORWISS General Research Report Implementation and Optimization Issues of the ROLAP Algebra F. Ramsak, M.S. (UIUC) Dr. V. Markl Prof. R. Bayer,
Prepared by: Mahmoud Rafeek Al-Farra
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.
1 Exercise Sheet 3 Exercise 7.: ROLAP Algebra Assume that a fact table SalesCube has 3 hierarchies with attributes  ear , Month M, Productgroup P and.
Substituting into expressions.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Clustering Approaches Ka-Lok Ng Department of Bioinformatics Asia University.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Multivariate statistical methods Cluster analysis.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining: Basic Cluster Analysis
Data Transformation: Normalization
Query Optimization Heuristic Optimization
Semi-Supervised Clustering
CSE 4705 Artificial Intelligence
UCLA, Winter Sample from CS240B Past Midterms
Algebra substitution.
Overview of Query Optimization
Database.
CSE572, CBS598: Data Mining by H. Liu
Chapter 4: Dimensions, Hierarchies, Operations, Modeling
GPX: Interactive Exploration of Time-series Microarray Data
CSE572, CBS572: Data Mining by H. Liu
Text Categorization Berlin Chen 2003 Reference:
Networks Prim’s Algorithm
Clustering The process of grouping samples so that the samples are similar within each group.
CSE572: Data Mining by H. Liu
Slides based on those originally by : Parminder Jeet Kaur
Presentation transcript:

1 Exercise Sheet 3 Exercise 7.: ROLAP Algebra Assume that a fact table SalesCube has 3 hierarchies with attributes  ear , Month M, Productgroup P and City C and the measure sales. Assume that the attributes have the following cardinalities:  = 3 M = 12 P = 500 C = 80 Exercise 7.1: Draw the (hierarchical) aggregation network

2 Exercise 7.2.: Construct the ROLAP expression to compute the average and maximal sales for the groups { ,P}, { , C} and {P} Exercise 7.3: Translate the ROLAP expression of Exercise 7.2 into a single SQL statement and estimate its cost = total number of tuples read + total number of tuples written if you assume that there is no optimization of this SQL statement. Exercise 7.4: Translate the SQL statement of Exercise 7.3 into several SQL statements employing auxiliary tables for intermediate results. Try to minimize the cost.

3 Solution 7.1: (M, P, C) = n (Y, P, C) (M, ALL, C) (M, P, ALL) (M, ALL, ALL) 36(Y, P, ALL) (Y, ALL, C) 240 (Y, ALL, ALL) 3 (ALL, ALL, ALL) (ALL, P, C) (ALL, ALL, C) 80(ALL, P, ALL) 500

4 Solution 7.2: POT(SalesCube,{{Y,P}, {Y,C},{P}}, {sum(sales), avg(sales)}) cost (Y,P) = n + 2* cost (Y,C) = n + 2* cost (Y,P) = n + 2* * reading and writing of intermediate results with insufficient cache

5 Solution 7.3: Select Y,’ALL’, P, ’ALL’ sum(sales), avg(sales) From SalesCube Group By Y,P Union Select Y, ’ALL’, ’ALL’, C, sum(sales), avg(sales) From SalesCube Group By Y,C Union Select ’ALL’, ’ALL’, P, ’ALL’ sum(sales), avg(sales) From SalesCube Group By P Cost assuming n fact tuples and sufficient cache: 3 * n // read ops + 3*500 // {Y,P} + 3*80 // {Y,C} // {P} = 3*n

6 Solution 7.4: Select Y,P,C, sum(sales), avg(sales) into YPC From SalesCube Group By Y,P,C; Select Y,P, sum(sales), avg(sales) into YP From YPC Group By Y,P; Select P, sum(sales), avg(sales) From YP Group By P Union Select * From YP Union Select Y,C, sum(sales), avg(sales) From YPC Group By YC; Cost assuming n fact tuples: size YPC is tuples n // read SalesCube + 3*500*80 // gen YPC + 3*500*80 // read YPC + 3*500 // gen YP + 3*500*2 // read YP + 3*500*80 // read YPC *3 + 3*80 // write result = n  n << 3*n !! (for realistic size of n)

7 Exercise 8: Clustering Exercise 8.1: Compute the NN distances for the following set of points and label the corresponding edges. A * D * BC *

8 Solution 8.2: Compute the mutual nearest neighbor distances for the points of Exercise 8.1 MND(A,B) = 3, MND(A,C)= 5, MND(A,D)=6 MND(B,C)=2, MND(B,D)=5, MND(C,D)=3 Solution 8.1. NN(A,B) = 1NN(A,C) = 2NN(A,D) = 3 NN(B,A) = 2NN(B,C) = 1NN(B,D) = 3 NN(C,A) = 3NN(C,B) = 1NN(C,D) = 2 NN(D,A) = 3NN(D,B) = 2NN(D,C) = 1

9 Solution 8.3: Minimal spanning tree EF G CD B A H I J Solution 8.4: 2 clusters: {J}, {A,B,C,D,E,F,G,H,I} 4 clusters: {J}, {A,B,C,I}, {H}, {D,E,F,G} or {J}, {A,B,C,H}, {I}, {D,E,F,G} 5 clusters: {J}, {A,B,C}, {I}, {H}, {D,E,F,G}

10 Exercise 8.5: Which clusters result from the k-means algorithm if we use the small circles as starting centroids for the clusters? EF G CoDo B A H I o oJ