Johannes Kepler University Linz Department of Business Informatics Data & Knowledge Engineering Altenberger Str. 69, 4040 Linz Austria/Europe www.dke.jku.at.

Slides:



Advertisements
Similar presentations
DBLABNational Taiwan Ocean University1/35 A Document-based Approach to Indexing XML Data Ya-Hui Chang and Tsan-Lung Hsieh Department of Computer Science.
Advertisements

File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Data Model driven applications using CASE Data Models as the nucleus of software development in a Computer Aided Software Engineering environment.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, IsraelFebruary 19, 2007 Efficient implementation of BP in.
Video Table-of-Contents: Construction and Matching Master of Philosophy 3 rd Term Presentation - Presented by Ng Chung Wing.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
1IMIC, 8/30/99 Constraint-Based Unicast and Multicast: Practical Issues Bala Rajagopalan NEC C&C Research Labs Princeton, NJ
Physical Database Monitoring and Tuning the Operational System.
Efficient Storage and Retrieval of Data
On Database Systems.
“DOK 322 DBMS” Y.T. Database Design Hacettepe University Department of Information Management DOK 322: Database Management Systems.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
TECHNIQUES FOR OPTIMIZING THE QUERY PERFORMANCE OF DISTRIBUTED XML DATABASE - NAHID NEGAR.
Module 17 Storing XML Data in SQL Server® 2008 R2.
Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
An Extension to XML Schema for Structured Data Processing Presented by: Jacky Ma Date: 10 April 2002.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.
Data Access Patterns Some of the problems with data access from OO programs: 1.Data source and OO program use different data modelling concepts 2.Decoupling.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
Querying Structured Text in an XML Database By Xuemei Luo.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
© Pearson Education Limited, Chapter 13 Physical Database Design – Step 4 (Choose File Organizations and Indexes) Transparencies.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
MD – Object Model Domain eSales Checker Presentation Régis Elling 26 th October 2005.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Methodology – Physical Database Design for Relational Databases.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Seminar on Dynamic Graphics for presenting Statistical Indicators 5-6 March 2007, Rome Eurostat approach to graphical representation of statistical data.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Session 1 Module 1: Introduction to Data Integrity
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
1 Review of report "LSDX: A New Labeling Scheme for Dynamically Updating XML Data"
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
BY: Mark Gruszecki.  What is a Recursive Query?  Definition(s) and Algorithm(s)  Optimization Techniques  Practical Issues  Impact of each Optimization.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Completeness Criteria for Object- Relational Database Systems by Won Kim April 2002 Sang Ho Lee School of Computing, Soongsil University
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Managing Data Resources File Organization and databases for business information systems.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Marilyn Wolf1 With contributions from:
Software Metrics 1.
Indexing Structures for Files and Physical Database Design
Methodology – Physical Database Design for Relational Databases
Chapter 11: Indexing and Hashing
2018, Spring Pusan National University Ki-Joune Li
Chapter 11: Indexing and Hashing
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Johannes Kepler University Linz Department of Business Informatics Data & Knowledge Engineering Altenberger Str. 69, 4040 Linz Austria/Europe IDAR 2007 A Generic Framework for Querying and Updating Secondary XML Index Structures Katharina Grün

2 Research Methodology

3 Become aware of problem Motivation Widespread use of XML XML databases for efficient query and update processing Require index structures on content and structure of documents primary index structure default index on whole document not optimized for specific queries secondary index structures created on demand on specific document fragments adapted to query workload  Framework for querying and updating secondary XML index structures (SCIENS)

4 Become aware of problem Running example ' '] //element(resource, Report)[author='Smith'] path: labelpath:

5 Become aware of problem Challenges Which secondary index structures are necessary? each kind of query is best supported by different index structure not possible to provide one index structure for each possible query How to integrate them into a common framework? each secondary index can index arbitrary properties of arbitrary fragments query and update processing must not depend on specific indices defined How to update them when documents change? document updates must be propagated to affected index structures incremental index maintenance algorithm

6 Become aware of problem Related work (1) XML databases limited support for secondary index structures XML index structures structure and/or content mostly primary index structure based on different models, proprietary structures Object-oriented index structures proprietary structures to support queries on path navigation and/or inheritance hierarchies Multidimensional index structures support several value dimensions do not consider structure

7 Become aware of problem Related work (2) Extensible indexing object-relational databases adapt index structures to different data types Indexing tasks Maintain secondary indices when documents are updated (KeyX 1 ) Select optimal index for specific query (XML Access Modules 2 ) Suggest set of indices for query workload (KeyX 1 )  currently no integrated approach for processing secondary index structures in an XML database 1) B.C.Hammerschmidt: KeyX: Selective Key-Oriented Indexing in Native XML Databases. Phd Thesis, University of Lübeck, ) Arion, A., Benzaken, V. and Manolescu, I.: XML Acess Modules: Towards Physical Data Independence in XML Databases. Ximep workshop, 2005.

8 Suggest solution SCIENS - Ideas  Structure and Content Indexing with Extensible, Nestable Structures Which secondary index structures are necessary? select a small set of index structures and adapt them to various properties nest index structures to reflect hierarchical queries How to integrate them into a common framework? provide an index model common index interface to query and update indices How to update them when documents change? index maintenance algorithm that determines updates for arbitrary indices based on update fragments and index definitions

9 Construct solution Index structures – one dimension (1) Value indexing hashtable or B+-tree on ' ' Structure indexing hashtable or B+-tree on path/labelpath/type //resource /project[1]//resource /project[2]/milestone[2]/resource

10 Construct solution Index structures – one dimension (2)

11 Construct solution Index structures – multiple dimensions (1) propertyexampleindex structure (value | ' ' and author='Smith' //project[]/milestone[]/resource ' ' kdb-tree 1 1) Robinson, J.: The KDB-tree: A search Structure for Large Multidimensional Dynamic Indexes. Sigmod, ACM Press, 1981.

12 Construct solution Index structures – multiple dimensions (2) propertyexampleindex structure ((value | structure) ∆ (value | structure))+ //project[]/milestone[]/resource ' ' //resource ' ' index nesting

13 Evaluate solution Comparison time (ms) I1 (date) I2 (date, hierarchy) I3 (date > hierarchy) I4 (hierarchy > date) Q1 (specific milestone) Q2 (specific project) Q3 (all) average queries and indices on milestone hierarchy and date e.g.  define index that best matches query workload

14 Construct solution Index framework (1) index search function consisting of a set of index entries provides interface to update and retrieve index entries index entry maps index keys (value, type, path,…) -> returned nodes TechnicalReport, Smith -> 3.2.1, 4.3.1,... index definition selects nodes to be indexed //element(resource, $V1)[author=$V2] represented as unordered tree pattern with index variables index structure specific data structure (hash table, prefix B+-tree, kdb-tree) one index can use several index structures (index nesting)

15 Construct solution Index framework (2) index configuration provides mapping from index to specific index structure associates with each index variable the index structure to be used $T1, $E2: kdb-tree $E2: hash table, $T1: B+-tree search configuration used to access index associates index key to be searched with each index variable generated by index selection tool $T1= Report, $E2= 'Smith'

16 Construct / evaluate solution Index maintenance propagate document updates to affected indices steps 1. find embeddings of index patterns in update fragments 2. execute queries 3. generate index entries [(TechnicalReport, 'Smith')  resource][(TechnicalReport, 'Tim')  resource] up to 9 times faster than existing approach (KeyX)

17 Conclusion select secondary index structures for XML extensible: various properties and operations on these properties nestable: adapt indices to hierarchical queries integrate index structures into framework hides indexing tasks from query and update processing tasks provides index model (common index interface) index maintenance algorithm propagate updates to index structures  flexibility to define indices that match the query workload