Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

Slides:



Advertisements
Similar presentations
Nov DOLAP 2002 McLean USA A Multidimensional and Multiversion Structure for OLAP Applications Mathurin Body 1,2, Maryvonne Miquel 2, Yvan Bédard.
Advertisements

Department of Software and Computing Systems Physical Modeling of Data Warehouses using UML Sergio Luján-Mora Juan Trujillo DOLAP 2004.
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
From Relational to Semantics A Methodology Arka Mukherjee, Ph.D. Founder / CTO Global IDs David Schaengold Director,
Cognos 8 BI Transformer Fundamentals. Objectives  At the end of this module, you should be able to:  discuss the basics of OLAP analysis  discuss the.
Achieving Adaptivity for OLAP-XML Federations Torben Bach Pedersen Aalborg University Joint work with Dennis Pedersen, TARGIT.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Three Flavors of Data Science Data Simulations and Sensor Readings Catalog Data Metadata; descriptors of datasets, data products and other processing artifacts.
1 Workflow Description for Open Hypermedia Systems Sanjay Vivek, David C. De Roure Department of Electronics and Computer Science.
Introduction to Building a BI Solution 권오주 OLAPForum
LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto.
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.
Hierarchical Dwarfs for the Rollup-Cube Yannis Sismanis Antonios Deligiannakis Yannis Kotidis Nick Roussopoulos.
Understanding Analysis Services Architecture. Microsoft Data Warehousing Overview OLTP Source DTS DW Storage Analysis Services Clients OLE DB for OLAP,
An Information Theory based Modeling of DSMLs Zekai Demirezen 1, Barrett Bryant 1, Murat M. Tanik 2 1 Department of Computer and Information Sciences,
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.
Solving Business Problems In OLAP Services Using MDX – Part I Amir Netz – Dev Manager & Architect Ariel Netz – Program Manager SQL Server OLAP Services.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
A REFACTORING TOOL FOR DESIGN PATTERNS WITH MODEL TRANSFORMATIONS Zekai Demirezen Yasemin Topaloğlu Ege University Department of Computer Engineering
Key Challenges for Modeling Language Creation by Demonstration Hyun Cho, Jeff Gray Department of Computer Science University of Alabama Jules White Bradley.
CURE for Cubes: C ubing U sing a R OLAP E ngine Konstantinos Morfonios Yannis Ioannidis University of Athens VLDB 2006.
Using SAS® Information Map Studio
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
A Generic Solution for Warehousing Business Process Data Malu Castellanos Joint work with Fabio Casati, Umesh Dayal, Norman Salazar Dayal, Norman Salazar.
Comp 15 - Usability & Human Factors Unit 8a - Approaches to Design This material was developed by Columbia University, funded by the Department of Health.
A language to describe software texture in abstract design models and implementation.
MIS 673: Database Analysis and Design u Objectives: u Know how to analyze an environment and draw its semantic data model u Understand data analysis and.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Ahsan Abdullah 1 Data Warehousing Lecture-10 Online Analytical Processing (OLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
Federated Database Set Up Greg Magsamen ITK478 SIA.
Hybrid Transformation Modeling Integrating a Declarative with an Imperative Model Transformation Language Pieter Van Gorp
11th SSDBM, Cleveland, Ohio, July 28-30, 1999 Supporting Imprecision in Multidimensional Databases Using Granularities T. B. Pedersen 1,2, C. S. Jensen.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Project Demonstration Template Computer Science University of Birmingham.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
CSE300 EAIusing EJB & XML CSE Enterprise Application Integration Using EJB and XML by Prithwis Sett Mumtaz Lohawala Nayana Paranjape.
An Unstructured Semantic Mesh Definition Suitable for Finite Element Method Marek Gayer, Hannu Niemistö and Tommi Karhela
Query Optimization For OLAP-XML Federations Dennis Pedersen Karsten Riiis Torben Bach Pedersen Nykredit Center for Database Research Department of Computer.
Yu, et al.’s “A Model-Driven Development Framework for Enterprise Web Services” In proceedings of the 10 th IEEE Intl Enterprise Distributed Object Computing.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
Collection and storage of provenance data Jakub Wach Master of Science Thesis Faculty of Electrical Engineering, Automatics, Computer Science and Electronics.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Advanced Database Course Syllabus 1 Advanced Database System Lecturer : H.Ben Othmen.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Notes Over 1.2.
Compiler Design (40-414) Main Text Book:
Self Healing and Dynamic Construction Framework:
XACML and the Cloud.
Web Ontology Language for Service (OWL-S)
Implementing Language Extensions with Model Transformations
CSSSPEC6 SOFTWARE DEVELOPMENT WITH QUALITY ASSURANCE
Semantic Markup for Semantic Web Tools:
Chapter 13 The Data Warehouse
Implementing Language Extensions with Model Transformations
Presentation transcript:

Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University

# 2 Problem OLAP-systems are good for complex analysis queries Easy-to-use Fast Business, science... Problems with physical integration in existing OLAP systems Integrating new data requires (partial) cube rebuild => too slow Problems arise with d ynamic data Stock quotes, competitors prices, disease info... Data will often be available in Extended Markup Language (XML) format Weather data, map info, price lists, ……

# 3 Solution Allows the use of external XML data as virtual dimensions Decoration (extra info)  Type information. Selection  Condition on XML data Grouping  Categories by XML data Goal: flexible access to XML data from OLAP systems

# 4 Overview Contributions Architecture of the federation Linking OLAP and XML The federation query semantics The logical algebra The physical algebra Conversion from logical to physical plans Plan execution Query optimization The query optimizer Execution of an optimized plan Performance Conclusion

# 5 Contributions of This Paper Previous OLAP-XML federation efforts A logical algebra A partial, straight-forward implementation Problems with previous work The logical algebra does not accurately reflect query execution tasks Query optimization is based on an abstract level Implementation is very limited Novelties of this paper A physical algebra and simplified query semantics Practical query optimization techniques A full-function, robust query engine Experiments with the query engine

# 6 Architecture of the federation OLAP and XML components Auxiliary components Query engine Query analyzer Query optimizer Query evaluator

# 7 Linking OLAP and XML Links Relation between a set of dimension values and a set of XML nodes Level expressions / / specifies a concrete link usage Nation/Nlink/Population links nations to populations Nlink TimeOrders EC Year Quarter Month Customer Order Region Nation Supplier Quantity Denmark 5.3 Man. Brand Part Suppliers Nlink={(DK, n1), (CN, n2), (UK, n3)}

# 8 The Federation Query Semantics The logical algebra Decoration, Federation selection, Federation Generalized projection, The federation query language : SQL XM SELECT SUM(Quantities), Brand(Part), Nation/Nlink/Population FROM TC WHERE Nation/Nlink/Population<30 GROUP BY Brand(Part), Nation/Nlink/Population

# 9 The Physical Algebra Includes data retrieval and manipulation operators A physical plan models real execution tasks  i.e., when, where and how data is processed Nine physical operators  Querying the OLAP component  Cube selection and generalized projection  Data transfer between components  Fact-, dimension- and XML- transfer operators  Temporary data manipulations  Decoration, federation selection and generalized projection  Inlining XML data  Inlining

# 10 Querying the OLAP Component Cube selection Has no references to XML data Performs selection over the OLAP cube Intuitively, a SQL SELECT statement Cube generalized projection Has no references to XML data Rolls up dimensions and aggregate specified measures at specified levels Intuitively, a SQL SELECT statement with a GROUP BY clause

# 11 Data Transfer Between Components Fact-transfer Transfers the OLAP fact data to the temporary component The temporary facts then can be decorated Intuitively, a SQL SELECT INTO statement Dimension-transfer Transfers dimension data to the temporary component Used when higher level dimension data is required in the temporary component XML-transfer Transfers XML data to the temporary component Uses XPath expressions to identify XML nodes with decoration values

# 12 Temporary Data Manipulations Decoration Decorates the cube by adding a new dimension Intuitively, adds a table with dimension and decoration XML data SELECT * FROM t (supplier, nation) t 1, t (nation, population) t 2 WHERE t 1.nation = t 2.nation Federation selection Performs selection over the cube in the temporary component Intuitively, a SQL selection over the temporary tables SELECT t 1.* FROM t fact t 1, t (supplier, population) t 2 WHERE t 1.supplier = t 2.supplier and population<30 Federation generalized projection Rolls up and aggregates the cube in the temporary component Intuitively, a SQL selection with a GROUP BY clause SELECT SUM(Quantity), t 2.population FROM t fact t 1, t (supplier, population) t 2 WHERE t 1.supplier= t 2.supplier GROUP BY t 2.population

# 13 Inlining XML Data Denoted as Comparing federated data in the temporary component is expensive Inlining refers to integrating XML data into the OLAP selections A resulting predicate Only references dimension levels and constants Can be evaluated in the OLAP component NationPopulation DK5.3 CN UK19.1 Nation/Nlink/Population<30 Nation=‘DK’ OR Nation=‘UK’ +

# 14 From Logical to Physical Plans

# 15 Plan Execution QuantityExtPriceSupplierPartOrderDay S1P3112/12/ S2P44230/3/ S3P348/12/ S4P22010/11/93 NationPopulation DK5.3 CN UK19.1 SupplierNation S1DK S2DK S3CN S4UK 5.3DK CN 19.1UK PopulationNation UK CN DK Nation S3 S1 S2 S4 Supplier Population S3 S1 S2 S4 Supplier Population S3 S1 S2 S4 Supplier S410/11/9320P2 S230/3/9442P4 8/12/964P3S3 2/12/96P DayPartOrderExtPric e S117 SupplierQuantity /11/9320P /3/9442P4 2/12/96P DayPartOrderExtPrice PopulationQuantity PartBrand P2B2 P3B3 P4B4 QuantityPopulationBrand 175.3B B B2

# 16 The Query Optimizer Based on the Volcano optimizer Four phases optimization at one stage Logical equivalent plan enumeration One-to-one logical to physical conversion Estimating cost of physical plans: Cost-based plan space pruning

# 17 An Optimized Query Plan

# 18 Execution of the Optimized Plan NationPopulation DK5.3 CN UK19.1 S410/11/9320P2 S230/3/9442P4 2/12/96P DayPartOrderExtPrice S117 SupplierQuantity NationBrand 17DKB3 28DKB4 26UKB2 UKB2 DKB4 B Brand DK17 NationQuantity UKB2 DKB4 B Brand DK17 NationQuantity 5.3DK CN 19.1UK PopulationNation QuantityPopulationBrand 175.3B B B2

# 19 Performance One experiment compared: a. Our federated solution b. Physical integration c. Federating cached XML data Data 100M fact data based on TPC-H benchmark 11MB and 2KB XML data Queries Result: Comparable to b for small amounts of data Use c for large amounts of data

# 20 Related Work Generic data integration Relational, XML, semi-structured, OO,… + combinations Do not consider OLAP DB properties such as automatic aggregation, dimension hierarchies and correct aggregation OLAP-object federations Current solution offers much more general use of external data Current solution not restricted to rigid object schemas Current solution allows irregular data Previous OLAP-XML federation efforts A logical algebra A partial, straight-forward implementation

# 21 Conclusion OLAP handles schema changes and dynamic data poorly Solutions Logical federation of OLAP and XML A physical algebra models actual execution tasks Optimized query evaluation Experiments suggest feasibility Future work More optimization techniques Advanced evaluation techniques Co-operative development with OLAP query tool vendor