Download presentation
Presentation is loading. Please wait.
Published byAubrey Kennedy Modified over 9 years ago
1
A New OLAP Aggregation Based on the AHC Technique DOLAP 2004 R. Ben Messaoud, O. Boussaid, S. Rabaséda Laboratoire ERIC – Université de Lyon 2 5, avenue Pierre-Mendès–France 69676, Bron Cedex – France http://eric.univ-lyon2.fr
2
November 13, 2004Ben Messaoud et al.2 Complex data 1 2 3 4 5 0 Definition: Data are considered complex if they are … Multi-formats: information can be supported by different kind of data (numeric, symbolic, texts, images, sounds, videos …) Multi-structures: structured, unstructured or semi-structured (relational databases, XML documents …) Multi-sources: data come from different sources (distributed databases, web …) Multi-modals: the same information can be described differently (data in different languages …) Multi-versions: data are updated through time (temporal databases, periodical inventory …)
3
November 13, 2004Ben Messaoud et al.3 General context 1 2 3 4 5 0 Complex data Huge volumes of complex data Warehousing complex data … OLAP facts as complex objects Analyze complex data Current OLAP tools aren’t suited to process complex data Data mining is able to process complex data like images, texts, videos … Coupling OLAP and data mining Analyze complex data on-line New operator OpAC: Operator of Aggregation by Clustering (AHC) Data mining OLAP Complex data MDBMS OpAC
4
November 13, 2004Ben Messaoud et al.4 Outline Complex data and general context Related work: Coupling OLAP and data mining Objectives of the proposed operator Formalization of the operator Implementation and demonstration Conclusion and future works 1 2 3 4 5 0
5
November 13, 2004Ben Messaoud et al.5 Three approaches for coupling OLAP and data mining First approach: Extending the query languages of decision support systems Second approach: Adapting multidimensional environment to classical data mining techniques Third approach: Adapting data mining methods for multidimensional data 1 2 3 4 5 0 Related work Data mining OLAP DBMS First approach Second approach Third approach
6
November 13, 2004Ben Messaoud et al.6 1 2 3 4 5 0 Data mining OLAP Related work These works proved that: Associating data mining to OLAP is a promising way to involve rich analysis tasks Data mining is able to extend the analysis power of OLAP Use data mining to enhance OLAP tools in order to process complex data OpAC: A new OLAP operator based on a data mining technique OpAC
7
November 13, 2004Ben Messaoud et al.7 Objectives Classic OLAP aggregation Vs OpAC aggregation Classic OLAP: Summarizes numerical data in a fewer number of values Computes additive measures (Sum, Average, Max, Min …) Example: Sales cube + Bellingham + Bremerton + Olympia + Redmond + Seattle + Berkeley + Beverly Hills + Los Angeles $700 $400 $850 $250 $320 $820 $910 $680 32 20 44 9 15 41 50 38 SalesCount - Washington - California $2520 $2410 SalesCount + Washington + California 120 129 $2520 $2410 SalesCount + Washington + California 120 129 1 2 3 4 5 0
8
November 13, 2004Ben Messaoud et al.8 Classic OLAP aggregation Vs OpAC aggregation OpAC aggregation: What about aggregating complex objects? How to aggregate images, texts or videos with classic OLAP tools? Complex objects are not additive OLAP measures … Orange coral Nebraska, USA Toco toucan Maldives ImagesSize 3560px 2340px 4434px 3260px ASM 0,016 0,021 0,014 0,012 Example: Images cube ? Objectives 1 2 3 4 5 0
9
November 13, 2004Ben Messaoud et al.9 How to aggregate complex objects? Using a data mining technique: AHC (Agglomerative Hierarchical Clustering) The AHC aggregates data The hierarchical aspect of the AHC Objectives 1 2 3 4 5 0
10
November 13, 2004Ben Messaoud et al.10 L1Normalized for high homogeneity L1Normalized for low entropy Very high High Medium Low Very low Very high High Medium Low Very low Entropy Homogeneity Images Objectives 1 2 3 4 5 0
11
November 13, 2004Ben Messaoud et al.11 Formalization 1 2 3 4 5 0 D i : the i th dimension of a data cube C h ij : the j th hirarchical level of the dimension D i g ijt : the t th modality of h ij g ijt g ijt h ij X X g ijt Measure of g srv crossed with g ijt where g srv h sr, s i and r is unique for each s The set of individuals: The set of variables: Dimension retained for individuals can’t generate variables Only one hierarchical level of a dimension is allowed to generate variables
12
November 13, 2004Ben Messaoud et al.12 Formalization 1 2 3 4 5 0 Evaluation tools Minimize the intra-cluster distances Maximize the inter-cluster distances Inter and intra-cluster inertia A 1, A 2, …, A k is a partition of P A i is the weight of A i G A i is the gravity center of A i I intra k I A i k i=1 I inter k P A i d G A i G k i=1
13
November 13, 2004Ben Messaoud et al.13 Very high High Medium Low Very low Very high High Medium Low Very low Entropy Homogeneity 1 2 3 4 5 0 500 0 100 200 300 400 7 6 5 4 3 2 1 - Inter-clusters - Intra-cluster Individuals: Modalities from the dimension of images Variables: L1Normalized values of images for all possible modalities of the entropy dimension L1Normalized values of images for all possible modalities of the homogeneity dimension Formalization
14
November 13, 2004Ben Messaoud et al.14 Formalization Results: Exploits the cube’s facts describing images to construct groups of similar complex objects Highlights significant groups of objects by a clustering technique Clusters –aggregates- are defined both from dimensions and measures of a data cube Implementation of a prototype 1 2 3 4 5 0
15
November 13, 2004Ben Messaoud et al.15 Implementation 1 2 3 4 5 0 Prototype: Data loading module: Connects to a data cube on Analysis Services of MS SQL Server Uses MDX queries to import information about the cube’s structure Extract data selected by the user Parameter setting interface: Assists the user to extract individuals and variables from the cube Selects modalities and measures Defines the clustering problem Clustering module: Allows the definition of the clustering parameters like dissimilarity metric and aggregation criterion Constructs the AHC Plots the results of the AHC on a dendrogram
16
November 13, 2004Ben Messaoud et al.16 Implementation 1 2 3 4 5 0 Images dataset: 3000 images collected from the web: Semantic annotation: Description, subject and theme Descriptors of texture like: ENT: Entropy CON: Contrast L1Normalized: Medium Color Characteristic … Three color channels: RGB
17
November 13, 2004Ben Messaoud et al.17 Implementation 1 2 3 4 5 0 Demonstration:
18
November 13, 2004Ben Messaoud et al.18 Conclusion 1 2 3 4 5 0 OpAC is a possible way to realize on-line analysis over complex data OpAC aggregates complex objects Aggregates –clusters- are defined from both dimensions and measures of a data cube Prototype available at : http://bdd.univ-lyon2.fr/?page=logiciel&id=5
19
November 13, 2004Ben Messaoud et al.19 Future works 1 2 3 4 5 0 The current evaluation tool may present some limits Use other evaluation indicators to evaluate the quality of partitions Assist user to find the best number of clusters Exploit the aggregates generated by OpAC in order to reorganize the cube’s dimensions Get a new cube with remarkable regions Use other data mining technique to enhance the OLAP power with explanation and prediction capabilities
20
November 13, 2004Ben Messaoud et al.20 The End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.