Heterogeneous Data Warehouse Analysis and Dimensional Integration Marius Octavian Olaru XXVI Cycle Computer Engineering and Science Advisor: Prof. Maurizio.

Slides:



Advertisements
Similar presentations
Semi-automatic compound nouns annotation for data integration systems Tuesday, 23 June 2009 SEBD 2009 Sonia Bergamaschi Serena Sorrentino
Advertisements

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 1 DB unimo Searching for data and services F. Guerra 1, A. Maurino 2, M. Palmonari.
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 1 DB unimo Semantic Analysis for an Advanced ETL framework S.Bergamaschi 1, F.
Università di Modena e Reggio Emilia ;-)WINK Maurizio Vincini UniMORE Researcher Università di Modena e Reggio Emilia WINK System: Intelligent Integration.
TU e technische universiteit eindhoven / department of mathematics and computer science Modeling User Input and Hypermedia Dynamics in Hera Databases and.
Fast Algorithms For Hierarchical Range Histogram Constructions
Locating in fingerprint space: wireless indoor localization with little human intervention. Proceedings of the 18th annual international conference on.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Mapping Techniques and Visualization of Statistical Indicators Haitham Zeidan Palestinian Central Bureau of Statistics IAOS 2014 Conference.
Automatic Data Ramon Lawrence University of Manitoba
Chapter 13 The Data Warehouse
1 Data and Knowledge Management. 2 Data Management: A Critical Success Factor The difficulties and the process Data sources and collection Data quality.
Community Manager A Dynamic Collaboration Solution on Heterogeneous Environment Hyeonsook Kim  2006 CUS. All rights reserved.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Copyright © 2014 Pearson Education, Inc. 1 It's what you learn after you know it all that counts. John Wooden Key Terms and Review (Chapter 6) Enhancing.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Understanding Data Warehousing
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
David Chen IMS-LAPS University Bordeaux 1, France
Dept. Computer Science, Korea Univ. Intelligent Information System Lab. XML clustering methods Sohn Jong-Soo Intelligent Information.
25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.
IST SEWASIE SEWASIE 3rd Review March 14, 2005 SEWASIE Value Proposition and End User Demo Andreas Becks.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
1 Data Warehouses BUAD/American University Data Warehouses.
Dimitrios Skoutas Alkis Simitsis
Data Warehousing.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved CHAPTER 9 DECISION MAKING.
Chapter 25: Distributed Databases. Definitions Distributed Database – a collection of of multiple logically interrelated databases distributed over a.
© 2010 Health Information Management: Concepts, Principles, and Practice Chapter 5: Data and Information Management.
Aggregate Queries in Peer-to-Peer OLAP Mauricio Minuto Espil Faculty of Engineering Universidad Católica Argentina Alejandro A. Vaisman Computer Science.
ReSeTrus Development of a digital library technology based on redundancy elimination and semantic elevation, with special emphasis on trust management.
Chapter 4 Decision Support System & Artificial Intelligence.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
On Querying Versions of Multiversion Data Warehouse Tadeusz Morzy Robert Wrembel Poznań University of Technology Institute of Computing Science Poznań,
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Advanced Database Concepts
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
VERA AULIA ( ).  Oil palm is one of the major edible oil traded in the global market.  Oil palm tree will start to produce fruits within three.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Advanced Applied IT for Business 2
Chapter 13 The Data Warehouse
Chapter 5 Data Management
Data Warehouse.
MANAGING DATA RESOURCES
Data Warehouse and OLAP
C.U.SHAH COLLEGE OF ENG. & TECH.
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Data Warehouse and OLAP
Presentation transcript:

Heterogeneous Data Warehouse Analysis and Dimensional Integration Marius Octavian Olaru XXVI Cycle Computer Engineering and Science Advisor: Prof. Maurizio Vincini Co-Advisor: Prof. Sonia Bergamaschi International Doctorate School in Information and Communication Technologies Università degli Studi di Modena e Reggio Emilia 1

Outline Problem statement Definition and Business motivation Dimension Integration Dimension mapping Schema integration Instance integration Mapping quality analysis Heterogeneous Data Warehouse Analysis and Dimensional Integration 2

Problem Motivation – BusinessWise Several scenarios where managers need to combine strategic information Company merges & acquisitions Virtual Enterprises Networks of CO-Opetition (simultaneous collaboration & competition) Take strategic decisions based on information from ALL the companies Integrate the Business Intelligence repositories (Enterprise DWs) Heterogeneous Data Warehouse Analysis and Dimensional Integration 3

Problem Statement DW Integration = combine information from two or more heterogeneous DWs and provide users a unified view A specific, context dependent, data-integration problem The context is a priori knowledge about the schema that can be exploited for: schema matching, schema integration, schema resolution, etc… Heterogeneous Data Warehouse Analysis and Dimensional Integration 4

Contribution A complete, automatic methodology for the integration of heterogeneous DW dimensions Mapping discovery Schema integration Instance integration Heterogeneous Data Warehouse Analysis and Dimensional Integration 5

Phase 1 – mapping discovery Main observations Use context specific solutions vs. classical approaches (e.g., semantics) A priori knowledge about the dimension schemas (directed graphs) Different working groups represent information in a similar manner, according to the common understanding of the concept of interest Heterogeneous Data Warehouse Analysis and Dimensional Integration 6

Consider two dimensions as directed labelled graphs annotate with instance data – cardinality ratio between aggregation levels Generate a common structure recurring in the initial graphs Use it to identify pairs of common nodes Exact algorithm vs. Approximate algorithm [depends on the instances] Generate sets of complex semantic mappings Phase 1 – mapping discovery Heterogeneous Data Warehouse Analysis and Dimensional Integration 7

Phase 1 – Experimental Evaluation Three real DWs Attempted the mapping of pairs of dimensions Time dimensions Geographic dimensions Article dimensions Heterogeneous Data Warehouse Analysis and Dimensional Integration 8 daypostcodearticle overlappingε ε ε 75 %0,1 %16 %12 %0 %4 %

Complex Mappings Two mapped categories express the same concept at the same level of granularity (equi-level) It is possible to define more complex mappings for query reformulation in distributed environments 1 equi-level roll-up drill-down related Heterogeneous Data Warehouse Analysis and Dimensional Integration 9 1 Golfarelli, M., Mandreoli, F., Penzo, W., Rizzi, S., & Turricchia, E. (2011). OLAP Query Reformulation in Peer-to-Peer Data Warehousing. Information Systems, 37(5). Semantic validation using the Combined WordSense Disambiguation Technique (CWSD)

Phase 2 – Schema Integration When writing a global query on a network of heterogeneous peers, depending on schema compatibility, the query may be: Executed on all the peers, because all are compatible with the query Executed on a subset of the peers Executed only on the local node Resolute approaches Allow only compatible queries vs. Allow all queries Inform the user vs. Not inform the user Confusion vs. Misleading results Heterogeneous Data Warehouse Analysis and Dimensional Integration 10

Phase 2 – Schema Integration Solution: import compatible parts of remote dimension schemas Uniform dimensions = uniform queries Increased querying capabilities for local nodes Heterogeneous Data Warehouse Analysis and Dimensional Integration 11

Phase 3 – Instance Integration Two possible solutions: The dimension chase procedure (d-chase) Based on the chase algorithm for reasoning on functional dependencies Suitable for exact matching attribute values RELEVANT clustering approach Based on syntactic similarity, dominance measure and lexical similarity Suitable for realistic scenarios, where values may be slightly different E.g.: Emilia Romagna vs. E. Romagna Heterogeneous Data Warehouse Analysis and Dimensional Integration 12

Integration architectures The proposed approach is architecture independent Peer-to-peer network of DWs Federation of DWs Algorithm for constructing a global dimension from n matched dimensions Central DW (=union and reconciliation of all DWs) Main advantage: flexibility Heterogeneous Data Warehouse Analysis and Dimensional Integration 13

Dimension mapping properties The quality of integrated information depends on the accuracy of the mappings Three dimension mapping properties 1 : Coherency Soundness Consistency One dimension property 2 : Homogeneity: important for summarizability and materializing dependent GROUP BY queries Heterogeneous Data Warehouse Analysis and Dimensional Integration 14 1 Cabibbo, L., & Torlone, R. (2005). Integrating heterogeneous multidimensional databases. SSDBM Hurtado, C. A., Gutierrez, C., & Mendelzon, A. O. (2005). Capturing summarizability with integrity constraints in OLAP. ACM Transactions on Database Systems, 30(3)

Dimension mapping properties The generated mappings are always coherent Soundness and consistency are always maintained In some cases, soundness and consistency are obtained The properties are conservative, non degenerative. Heterogeneous Data Warehouse Analysis and Dimensional Integration 15 Beneventano, D., Olaru, M.-O., & Vincini, M. (2013). Analyzing Dimension Mappings and Properties in Data Warehouse Integration. ODBASE 2013 (LNCS 8185)

Checking Homogeneity Homogeneity/heterogeneity is independent Sufficient condition for maintaining homogeneity involving base categories was formulated in the thesis Heterogeneous Data Warehouse Analysis and Dimensional Integration 16

Conclusions A complete, heterogeneous DW dimension integration methodology Context specific problem = context specific solution: Mapping discovery = graph matching (+semantics) Schema integration: Solve heterogeneities Increase local querying capabilities Instance integration Exact approach Suitable for exact values (e.g., dictionaries) Clustering approach Suitable for real-life cases Good quality properties Heterogeneous Data Warehouse Analysis and Dimensional Integration 17

List of Relevant Publications International Conferences: Bergamaschi, S., Olaru, M.-O., Sorrentino, S., & Vincini, M. (2012). Dimension matching in Peer- to-Peer Data Warehousing. In DSS 2012 Olaru, M.-O., & Vincini, M. (2012a). A Dimension Integration Method for a Heterogeneous Data Warehouse Environment. In ICETE 2012 Olaru, M.-O. (2012). Partial Multi-dimensional Schema Merging in Heterogeneous Data Warehouses. In ER PhD Symposium Guerra, F., Olaru, M.-O., & Vincini, M. (2012). Mapping and Integration of Dimensional Attributes Using Clustering Techniques. In EC-WEB 2013 Beneventano, D., Olaru, M.-O., & Vincini, M. (2013). Analyzing Dimension Mappings and Properties in Data Warehouse Integration. ODBASE 2013 (LNCS 8185) Book Chapters Olaru, M.-O., & Vincini, M (2014). A Data Warehouse Integration Methodology in Support of Collaborative SMEs. In Organizational Transformations through Big Data Analytics [submitted] Heterogeneous Data Warehouse Analysis and Dimensional Integration 18

Thank you