Achieving Adaptivity for OLAP-XML Federations Torben Bach Pedersen Aalborg University Joint work with Dennis Pedersen, TARGIT.

Slides:



Advertisements
Similar presentations
Research Issues in Web Services CS 4244 Lecture Zaki Malik Department of Computer Science Virginia Tech
Advertisements

Describing Complex Products as Configurations using APL Arrays.
W3C Workshop on Web Services Mark Nottingham
Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.
Nov DOLAP 2002 McLean USA A Multidimensional and Multiversion Structure for OLAP Applications Mathurin Body 1,2, Maryvonne Miquel 2, Yvan Bédard.
Lecture-7/ T. Nouf Almujally
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Distributed components
The KB on its way to Web 2.0 Lower the barrier for users to remix the output of services. Theo van Veen, ELAG 2006, April 26.
Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Distributed Systems Architectures
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
Interpret Application Specifications
Proxy Cache Leonid Romanovsky Olga Fomenko Winter 2003 Instructor: Konstantin Sinyuk.
Database System Development Lifecycle Transparencies
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 12 Slide 1 Distributed Systems Design 1.
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
N-Tier Architecture.
Client/Server Architectures
This chapter is extracted from Sommerville’s slides. Text book chapter
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
LAYING OUT THE FOUNDATIONS. OUTLINE Analyze the project from a technical point of view Analyze and choose the architecture for your application Decide.
Database Planning, Design, and Administration Transparencies
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
JavaScript, Fourth Edition Chapter 12 Updating Web Pages with AJAX.
1 소프트웨어공학 강좌 Chap 9. Distributed Systems Architectures - Architectural design for software that executes on more than one processor -
Objectives  Testing Concepts for WebApps  Testing Process  Content Testing  User Interface Testing  Component-level testing  Navigation Testing.
25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
4/2/03I-1 © 2001 T. Horton CS 494 Object-Oriented Analysis & Design Software Architecture and Design Readings: Ambler, Chap. 7 (Sections to start.
OnLine Analytical Processing (OLAP)
Distributed Systems: Concepts and Design Chapter 1 Pages
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
(Business) Process Centric Exchanges
Architectural Design lecture 10. Topics covered Architectural design decisions System organisation Control styles Reference architectures.
Database System Development Lifecycle 1.  Main components of the Infn System  What is Database System Development Life Cycle (DSDLC)  Phases of the.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Information Builders : SmartMart Seon-Min Rhee Visualization & Simulation Lab Dept. of Computer Science & Engineering Ewha Womans University.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
© Geodise Project, University of Southampton, Integrating Data Management into Engineering Applications Zhuoan Jiao, Jasmin.
Query Optimization For OLAP-XML Federations Dennis Pedersen Karsten Riiis Torben Bach Pedersen Nykredit Center for Database Research Department of Computer.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
CSC 480 Software Engineering Lecture 17 Nov 4, 2002.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Lab 301 Populating Template Data from a Third Party Data Source Justin Pava, Software Release Manager Andrew Schoonmaker, Software QA Engineer.
System Models Advanced Operating Systems Nael Abu-halaweh.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.
Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.
Improving searches through community clustering of information
CSC 480 Software Engineering
CHAPTER 3 Architectures for Distributed Systems
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Database Performance Tuning and Query Optimization
MANAGING DATA RESOURCES
Web Application Server 2001/3/27 Kang, Seungwoo. Web Application Server A class of middleware Speeding application development Strategic platform for.
Chapter 11 Database Performance Tuning and Query Optimization
Data Warehousing Concepts
Database System Architectures
Presentation transcript:

Achieving Adaptivity for OLAP-XML Federations Torben Bach Pedersen Aalborg University Joint work with Dennis Pedersen, TARGIT

Torben Bach Pedersen · DOLAP 2003 · Overview Background: OLAP-XML federations New challenges –XML data changes –Slow or unreliable XML sources –Schema changes in data sources –Other challenges Integration in TARGIT architecture Other applications of the techniques Conclusion and future work Related work

Torben Bach Pedersen · DOLAP 2003 · Data Warehousing & OLAP Multidimensional analysis: TARGIT Analysis

Torben Bach Pedersen · DOLAP 2003 · OLAP Good for complex ad hoc queries –Simple: natural, graphical queries –Fast: pre-aggregation A number of problems with physical integration –Short-term and varying data needs Population, product info,... –Dynamical data Stock quotes, competitor pricing,... –Data with limited access Competitor product info, public databases,...

Torben Bach Pedersen · DOLAP 2003 · OLAP-XML Federations OLAP -server Client Cube Traditional OLAP architecture:

Torben Bach Pedersen · DOLAP 2003 · OLAP-XML Federations Logical integration of XML data –External dimensions –External measures Data combined at query time Federation Client XML Cube

Torben Bach Pedersen · DOLAP 2003 · OLAP-XML Federations Logical integration of XML data –External dimensions –External measures Data combined at query time Transparent for users Flexible: many XML sources Quick: running in a few mins Data is always fresh Performance often comparable to physical integration Federation Client XML Cube

Torben Bach Pedersen · DOLAP 2003 · XPath Queries for Fetching XML 1984 Orwell Of Mice and Men Steinbeck /Books/Book[Author=”Steinbeck”]/Title Federation Client XML Cube XPath Dimension value

Torben Bach Pedersen · DOLAP 2003 · Old And New TARGIT Architecture

Torben Bach Pedersen · DOLAP 2003 · New Challenges Our previous work focused on basic aspects –Flexibility –General performance –Implementation New: what can go wrong? – need for adaptivity –XML data changes –XML sources slow or unreliable –Schema changes (XML, OLAP, federation) We often have no control over the XML sources A solution has broad interest: views over XML sources

Torben Bach Pedersen · DOLAP 2003 · XML Data Changes Basic federation –XML data is integrated at query time => XML data changes handled automatically However, XML data is cached for performance –Cache timeout value ensures fresh data (set manually or automatically) –0 cache timeout => always fetch from source Only few current XML databases inform about changes –Xyleme allows users to subscribe to changes –Only delta should be transferred

Torben Bach Pedersen · DOLAP 2003 · ICE: Information and Content Exchange Protocol proposed by W3C for automatically informing about and requesting changes –Supported by major vendors –Push: subscribe to changes and keep cache up-to-date –Pull: request changes from source at query time

Torben Bach Pedersen · DOLAP 2003 · Slow and Unreliable XML Sources Overload, maintenance, HW breakdown, attacks –Often we no influence on this Incremental presentation for user –What if source is too slow or no reply at all? Inform user that the system is not working…? Specification of alternative sources –Several queries per external dimension/measure –Increased fault tolerance, also better performance SourceServerClient

Torben Bach Pedersen · DOLAP 2003 · Slow and Unreliable XML Sources Start several queries and use the fastest –Always uses the fastest, but heavy load on sources –Use first response time as indicator for total time Start one query at a time Minimal load on sources, but slower Fed ?

Torben Bach Pedersen · DOLAP 2003 · Slow and Unreliable XML Sources Alternative sources of lower quality: better than no data? Alternatives –Expired cache data –Google, Xyleme, The WayBack Machine –Backup-disk, tape –Etc. SourceSpeedQuality Local cacheFastestFresh Original sourceFast?Freshest Expired cacheFastestOld Backup sourceFast/slowVery old

Torben Bach Pedersen · DOLAP 2003 · Slow and Unreliable XML Sources In practice? Sources with equal priority chosen at random

Torben Bach Pedersen · DOLAP 2003 · Result: Algorithm for Fetching XML Data

Torben Bach Pedersen · DOLAP 2003 · Experiments 1st experiment: fetching a 137 KB dimension –Start 8 queries, when first 3 respond, (cancel) last 5, when fastest query finish, (cancel) remaining 2 –Fast reply = good indication of overall speed 2nd experiment: search local cache, then Google cache

Torben Bach Pedersen · DOLAP 2003 · Schema Changes In XML Sources How to synchronize XML views after schema change? (solution described in separate paper) Bibliography Publisher PName Book Author AName Title Price /Bibliography/Author[AName=”Orwell”]/Book/Title Bibliography Publisher PName Book Author AName Title Price

Torben Bach Pedersen · DOLAP 2003 · Additional Challenges Changes to federation schema –Cache may be invalidated –Discard affected cache results (unproblematic) OLAP data changes –Cache may be invalidated –Less frequent than XML data changes => cache will often have expired anyway OLAP schema changes –Federated schema may be invalidated –Rare and easy to detect (and correct)

Torben Bach Pedersen · DOLAP 2003 · Integrating Techniques - Architecture

Torben Bach Pedersen · DOLAP 2003 · Integrating Techniques – Query Processing Query Evaluator splits query into XML+OLAP parts and determines query plan based on cost Execution Engine coordinates and executes plan Cache Manager maintains cache, e.g., through ICE XML Component interface fetches XML data, chooses between available XML sources (Algorithm 1) View Synchronizer handles schema changes Metadata Manager manages info about external dimensions and measures + XML component characteristics

Torben Bach Pedersen · DOLAP 2003 · Other Applications All XPath-based views on XML data Links to parts of XML documents Web pages Documents (DocBook) Software applications and many more… Automatic recreation of broken links Increased fault tolerance and performance using alternative sources ?

Torben Bach Pedersen · DOLAP 2003 · Conclusion and Future Work Operational problems in OLAP-XML federations XML data changes Slow and unreliable XML sources –Using several sources (Algorithm 1) –Experiment with Algorithm1 Techniques integrated into federation architecture Schema evolution and other challenges Future work –TARGIT implementation and testing –Using techniques in other applications

Torben Bach Pedersen · DOLAP 2003 · Related Work Data changes in XML/semistructured documents –Xyleme + Zhuge Schema changes in scientific documents –Not XML Adaptive/dynamic query optimization –Telegraph project –We use once per source, rather than per tuple Does not consider one or more of: OLAP+XML concepts, schema changes, slow and unreliable sources Own previous OLAP-XML work is not adaptive