A New OLAP Aggregation Based on the AHC Technique DOLAP 2004 R. Ben Messaoud, O. Boussaid, S. Rabaséda Laboratoire ERIC – Université de Lyon 2 5, avenue.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Chapter 13 The Data Warehouse
Nov DOLAP 2002 McLean USA A Multidimensional and Multiversion Structure for OLAP Applications Mathurin Body 1,2, Maryvonne Miquel 2, Yvan Bédard.
C6 Databases.
ICS (072)Database Systems: A Review1 Database Systems: A Review Dr. Muhammad Shafique.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
CH 11 Multimedia IR: Models and Languages
What is Where? u Getting Started With Geographic Information Systems u Chapter 5.
Chapter 13 The Data Warehouse
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Introduction to Building a BI Solution 권오주 OLAPForum
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Copyright © 2014 Pearson Education, Inc. 1 It's what you learn after you know it all that counts. John Wooden Key Terms and Review (Chapter 6) Enhancing.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Chapter 11 Databases.
Data Warehouse & Data Mining
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Chapter 1 Introduction to Data Mining
Ahsan Abdullah 1 Data Warehousing Lecture-11 Multidimensional OLAP (MOLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
OnLine Analytical Processing (OLAP)
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Business Intelligence Zamaneh Jahed. What is Business Intelligence? Business Intelligence (BI) is a broad category of applications and technologies for.
Case 2: Emerson and Sanofi Data stewards seek data conformity
1 Data Warehouses BUAD/American University Data Warehouses.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
McGraw-Hill/Irwin ©2008,The McGraw-Hill Companies, All Rights Reserved Chapter 5 Data Resource Management.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
ICS (072)Database Systems: An Introduction & Review 1 ICS 424 Advanced Database Systems Dr. Muhammad Shafique.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
BUSINESS ANALYTICS AND DATA VISUALIZATION
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
Using to Save Lives Or, Using Digg to find interesting events. Presented by: Luis Zaman, Amir Khakpour, and John Felix.
Contact : Bernadette Bouchon-Meunier, Patrick Gallinari, Jean-Gabriel Ganascia LIP6, UPMC, 8 rue du Capitaine Scott, Paris, France
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
COSC 6340 Projects & Homeworks Spring Learn how to define tables Learn how to load and create an Oracle database Learn how to define user views.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
McGraw-Hill/Irwin ©2008,The McGraw-Hill Companies, All Rights Reserved Chapter 5 Data Resource Management.
Managing Data Resources File Organization and databases for business information systems.
Advanced Applied IT for Business 2
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Introduction Multimedia initial focus
Chapter 13 The Data Warehouse
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
MANAGING DATA RESOURCES
Data Mining Concept Description
Data Warehouse and OLAP
CSc4730/6730 Scientific Visualization
Introduction of Week 9 Return assignment 5-2
Topic 5: Cluster Analysis
Data Warehouse and OLAP
Presentation transcript:

A New OLAP Aggregation Based on the AHC Technique DOLAP 2004 R. Ben Messaoud, O. Boussaid, S. Rabaséda Laboratoire ERIC – Université de Lyon 2 5, avenue Pierre-Mendès–France 69676, Bron Cedex – France

November 13, 2004Ben Messaoud et al.2 Complex data Definition: Data are considered complex if they are … Multi-formats: information can be supported by different kind of data (numeric, symbolic, texts, images, sounds, videos …) Multi-structures: structured, unstructured or semi-structured (relational databases, XML documents …) Multi-sources: data come from different sources (distributed databases, web …) Multi-modals: the same information can be described differently (data in different languages …) Multi-versions: data are updated through time (temporal databases, periodical inventory …)

November 13, 2004Ben Messaoud et al.3 General context Complex data Huge volumes of complex data Warehousing complex data … OLAP facts as complex objects Analyze complex data Current OLAP tools aren’t suited to process complex data Data mining is able to process complex data like images, texts, videos … Coupling OLAP and data mining Analyze complex data on-line New operator OpAC: Operator of Aggregation by Clustering (AHC) Data mining OLAP Complex data MDBMS OpAC

November 13, 2004Ben Messaoud et al.4 Outline Complex data and general context Related work: Coupling OLAP and data mining Objectives of the proposed operator Formalization of the operator Implementation and demonstration Conclusion and future works

November 13, 2004Ben Messaoud et al.5 Three approaches for coupling OLAP and data mining First approach: Extending the query languages of decision support systems Second approach: Adapting multidimensional environment to classical data mining techniques Third approach: Adapting data mining methods for multidimensional data Related work Data mining OLAP DBMS First approach Second approach Third approach

November 13, 2004Ben Messaoud et al Data mining OLAP Related work These works proved that: Associating data mining to OLAP is a promising way to involve rich analysis tasks Data mining is able to extend the analysis power of OLAP Use data mining to enhance OLAP tools in order to process complex data OpAC: A new OLAP operator based on a data mining technique OpAC

November 13, 2004Ben Messaoud et al.7 Objectives Classic OLAP aggregation Vs OpAC aggregation Classic OLAP: Summarizes numerical data in a fewer number of values Computes additive measures (Sum, Average, Max, Min …) Example: Sales cube + Bellingham + Bremerton + Olympia + Redmond + Seattle + Berkeley + Beverly Hills + Los Angeles $700 $400 $850 $250 $320 $820 $910 $ SalesCount - Washington - California $2520 $2410 SalesCount + Washington + California $2520 $2410 SalesCount + Washington + California

November 13, 2004Ben Messaoud et al.8 Classic OLAP aggregation Vs OpAC aggregation OpAC aggregation: What about aggregating complex objects? How to aggregate images, texts or videos with classic OLAP tools? Complex objects are not additive OLAP measures … Orange coral Nebraska, USA Toco toucan Maldives ImagesSize 3560px 2340px 4434px 3260px ASM 0,016 0,021 0,014 0,012 Example: Images cube ? Objectives

November 13, 2004Ben Messaoud et al.9 How to aggregate complex objects? Using a data mining technique: AHC (Agglomerative Hierarchical Clustering) The AHC aggregates data The hierarchical aspect of the AHC Objectives

November 13, 2004Ben Messaoud et al.10 L1Normalized for high homogeneity L1Normalized for low entropy Very high High Medium Low Very low Very high High Medium Low Very low Entropy Homogeneity Images Objectives

November 13, 2004Ben Messaoud et al.11 Formalization D i : the i th dimension of a data cube C h ij : the j th hirarchical level of the dimension D i g ijt : the t th modality of h ij   g ijt  g ijt  h ij      X  X  g ijt    Measure of g srv crossed with g ijt    where g srv  h sr, s  i and r is unique for each s The set of individuals: The set of variables: Dimension retained for individuals can’t generate variables Only one hierarchical level of a dimension is allowed to generate variables

November 13, 2004Ben Messaoud et al.12 Formalization Evaluation tools Minimize the intra-cluster distances Maximize the inter-cluster distances Inter and intra-cluster inertia A 1, A 2, …, A k is a partition of   P  A i  is the weight of A i G  A i  is the gravity center of A i I intra  k    I  A i  k i=1 I inter  k    P  A i  d  G  A i   G    k i=1

November 13, 2004Ben Messaoud et al.13 Very high High Medium Low Very low Very high High Medium Low Very low Entropy Homogeneity Inter-clusters - Intra-cluster Individuals: Modalities from the dimension of images Variables: L1Normalized values of images for all possible modalities of the entropy dimension L1Normalized values of images for all possible modalities of the homogeneity dimension Formalization

November 13, 2004Ben Messaoud et al.14 Formalization Results: Exploits the cube’s facts describing images to construct groups of similar complex objects Highlights significant groups of objects by a clustering technique Clusters –aggregates- are defined both from dimensions and measures of a data cube Implementation of a prototype

November 13, 2004Ben Messaoud et al.15 Implementation Prototype: Data loading module: Connects to a data cube on Analysis Services of MS SQL Server Uses MDX queries to import information about the cube’s structure Extract data selected by the user Parameter setting interface: Assists the user to extract individuals and variables from the cube Selects modalities and measures Defines the clustering problem Clustering module: Allows the definition of the clustering parameters like dissimilarity metric and aggregation criterion Constructs the AHC Plots the results of the AHC on a dendrogram

November 13, 2004Ben Messaoud et al.16 Implementation Images dataset: 3000 images collected from the web: Semantic annotation: Description, subject and theme Descriptors of texture like: ENT: Entropy CON: Contrast L1Normalized: Medium Color Characteristic … Three color channels: RGB

November 13, 2004Ben Messaoud et al.17 Implementation Demonstration:

November 13, 2004Ben Messaoud et al.18 Conclusion OpAC is a possible way to realize on-line analysis over complex data OpAC aggregates complex objects Aggregates –clusters- are defined from both dimensions and measures of a data cube Prototype available at :

November 13, 2004Ben Messaoud et al.19 Future works The current evaluation tool may present some limits  Use other evaluation indicators to evaluate the quality of partitions  Assist user to find the best number of clusters Exploit the aggregates generated by OpAC in order to reorganize the cube’s dimensions  Get a new cube with remarkable regions Use other data mining technique to enhance the OLAP power with explanation and prediction capabilities

November 13, 2004Ben Messaoud et al.20 The End