1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.

Slides:

Advertisements

Similar presentations

Ting Chen, Jiaheng Lu, Tok Wang Ling

Advertisements

Jiaheng Lu, Ting Chen and Tok Wang Ling National University of Singapore Finding all the occurrences of a twig.

APWeb 2004 Hangzhou, China 1 Labeling and Querying Dynamic XML Trees Jiaheng Lu and Tok Wang Ling School of Computing National University of Singapore.

1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Jiaheng Lu, Ting Chen, Tok Wang Ling National University of.

From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.

XML: Extensible Markup Language

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,

Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.

Structural Joins: A Primitive for Efficient XML Query Pattern Matching Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava,

DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.

Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.

1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.

TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.

1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.

Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.

1 COS 425: Database and Information Management Systems XML and information exchange.

Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,

1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.

XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.

1 Designing Valid XML Views Ya Bing Chen, Tok Wang Ling, Mong Li Lee Department of Computer Science National University of Singapore.

1 Prefix Path Streaming: a New Clustering Method for XML Twig Pattern Matching Ting Chen, Tok Wang Ling, Chee-Yong Chan School of Computing, National University.

XML-QL A Query Language for XML Charuta Nakhe

Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.

1 On View Support for a Native XML DBMS Ting Chen, Tok Wang Ling School of Computing, National University of Singapore Daofeng Luo, Xiaofeng Meng Information.

1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002.

1 Maintaining Semantics in the Design of Valid and Reversible SemiStructured Views Yabing Chen, Tok Wang Ling, Mong Li Lee Department of Computer Science.

A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.

DASWIS NF-SS: A Normal Form for Semistructured Schemata Xiaoying Wu, Tok Wang Ling, Sin Yeung Lee, Mong Li Lee National University of Singapore.

Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA.

Database Management 9. course. Execution of queries.

A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.

Querying Structured Text in an XML Database By Xuemei Luo.

NaLIX Natural Language Interface for querying XML Huahai Yang Department of Information Studies Joint work with Yunyao Li and H.V. Jagadish at University.

Dimitrios Skoutas Alkis Simitsis

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.

Database Systems Part VII: XML Querying Software School of Hunan University

5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.

____________________________ XML Access Control for Semantically Related XML Documents & A Role-Based Approach to Access Control For XML Databases BY Asheesh.

Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.

The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.

Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.

Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.

XML Labling and Query Optimization Sigmod

Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.

APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.

Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.

1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.

Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.

Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented.

Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.

1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.

1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.

Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.

XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,

Efficient processing of path query with not-predicates on XML data

Holistic Twig Joins: Optimal XML Pattern Matching

Associative Query Answering via Query Feature Similarity

OrientX: an Integrated, Schema-Based Native XML Database System

(b) Tree representation

Early Profile Pruning on XML-aware Publish-Subscribe Systems

MCN: A New Semantics Towards Effective XML Keyword Search

Structure and Content Scoring for XML

Introduction to XML IR XML Group.

Presentation transcript:

1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China

2 Outline Introduction Preliminary PTwigStack Conclusion

3 Outline Introduction Preliminary PTwigStack Conclusion

4 Introduction(1) XML has been used extensively as a standard for information representation and exchange More and more data is stored and exchanged with XML format Effective and efficient querying of XML data is indispensable

5 Introduction(2) Using standard query language (XPath or XQuery) How can we write a proper query when: –the structure or schema is not fully available or –Extracting information from different data sources with different structure bibliography(1) bib(2)bib(…) book(4)year(3) 1999title(5)author(6) article(7) author(9)title(8) XMLJoe author(10) MaryXML Bob book titleauthor Q

6 Introduction (4) Using keyword based query For example[1] –Find title and author of the publications bibliography(1) bib(2)bib(…) book(4)year(3) 1999title(5)author(6) article(7) author(9)title(8) XMLJoe author(10) MaryXML Bob The answer is : (5,6), (8,9,10) [1]Y. Li, C. Yu, and H. V. Jagadish. Schema-Free XQuery. In Proceedings of VLDB2004, pages 72-83, 2003

7 Introduction (5) Using keyword based query How if node 6 and 8 are removed from the document –Find title and author of the publications bibliography(1) bib(2)bib(…) book(4)year(3) 1999title(5) article(7) author(9) Joe author(10) MaryXML The answer is : (5,9,10) Meaningless Result (5,NULL), (NULL,9,10) Correct answer

8 Introduction (6) Using Partially Specified Twig Query (PSTQ) [2] –Can provide users the most flexibility But –No existing method can process a PSTQ efficiently [2]Heuristic Containment Check of Partial Tree-Pattern Queries in the Presence of Index Graphs, CIKM, 2006

9 Introduction(7) Objective –A concise but effective way to specify more flexible semantics constrains in a twig query –An efficient approach to process a PSTQ holistically without deriving twig queries and process them one by one Scan Once: Each stream whose elements’ tag appears in the twig pattern is scanned only once. No redundant output: None of the intermediate path solutions is useless Bounded space complexity: The space required by the algorithm is bounded by a factor which is independent of source document size.

10 Outline Introduction Preliminary –Holistic Twig Join –Partially Specified Twig Query PTwigStack Conclusion

11 Preliminary- Holistic Twig Join[3] Query Processing –Output useful Path Solutions –Merge all path solutions to get final results Data Structure –Each query node is associated with a stack and an element stream Benefits –No useless path solutions R a1 b1 a2 b2c1 A BC Q XML document [3]N. Bruno, N. Koudas, and D. Srivastava: Holistic twig joins: Optimal XML pattern matching. TechnicalR eport Columbia University March 2002

12 Preliminary- Partially Specified Twig Query[2] Q1 consists of two partial paths (PP), p1 and p2 In p1, Y is descendant of W In p2, W and A are being at the same path p1 share W with p2 “*” means p2 is output path [2]Heuristic Containment Check of Partial Tree-Pattern Queries in the Presence of Index Graphs, CIKM, 2006 Q1 Compared with Twig Query: –Some nodes are specified with being at the same path relationship with other nodes, but not the precedence relationship Compared with keyword based query: –Each part of the query can be a path expression, but not just keyword Benefits of using PSTQ: –Users can specify query with whatever partial knowledge they have whenever possible

13 Preliminary- Partially Specified Twig Query Query Processing of PSTQ: A naïve method –Deriving Twig Queries –Processing each twig query Problem of the naïve method –Processing cost is too high –Eliminating redundant results A B C A C B B A C A BC Q Q1Q2Q3Q4 a1 b1 c1 Xml document

14 Outline Introduction Preliminary PTwigStack Conclusion

15 PTwigStack __PSTQ Expression Extending XPath by adding an operator – “ ” is used to denote being at the same path relationship A B is equivalent to A//B or B//A A B C ? A B C A B C A C B C A B C B A B AC Q Q1Q2Q3Q4Q5 B A C B C A Q6Q7

16 PTwigStack Objective –Scan Once –No redundant output –Bounded space complexity Problems –Which query node should be processed first? –Which element should be processed first? –How to guarantee no useless path solutions from being produced? b1 a1a2 c1 b2 b3 Document B A C A B C A C B B A C A BC QQ1Q2Q3Q4 According to special order in the given Query Element with solution extension Element which cannot participate in answers will not be pushed into stack

17 PTwigStack Problems(1) –Which query node should be processed first? –Deep first order –ABC–ABC b1 a1a2 c1 b2 b3 Document B A C A B C A C B B A C A BC QQ1Q2Q3Q4

18 PTwigStack Problems(2) –Which element should be processed first? –The element with Partial Solution Extension b1 a1a2 c1 b2 b3 Document B A C A B C A C B B A C A BC QQ1Q2Q3Q4 Partial Solution Extension –We say a query node q has a PSE iff q satisfies any one of the following conditions: If q is a leaf node, C q does not equal to NULL. If q is not a leaf node, for each q’ ∈ children(q) –If q//q’, then C q is ancestor of C q’ a1 c1

19 PTwigStack Problems(2) –Which element should be processed first? –The element with Partial Solution Extension b1 a1a2 c1 b2 b3 Document B A C A B C A C B B A C A BC QQ1Q2Q3Q4 Partial Solution Extension –We say a query node q has a PSE iff q satisfies any one of the following conditions: If q is a leaf node, C q does not equal to NULL. If q is a non-leaf node, for each q’ ∈ children(q) –If q//q’, then C q is ancestor of C q’ –If q q’ (being at the same path) and q’ has a PSE, then C q can cover C q’ or be covered by C q’, or C q.end < C q’.start b1 a1 c1 c0 a1 b1 c1 a1 b1 c1

20 PTwigStack Problems(2) –Which element should be processed first? –The element with Partial Solution Extension b1 a1a2 c1 b2 b3 Document B A C A B C A C B B A C A BC QQ1Q2Q3Q4 Partial Solution Extension –We say a query node q has a PSE iff q satisfies any one of the following conditions: If q is a leaf node, C q does not equal to NULL. If q is a non-leaf node, for each q’ ∈ children(q) –If q//q’, then C q is ancestor of C q’ –If q q’ (being at the same path) and q’ has a PSE, then C q can cover C q’ or be covered by C q’, or C q.end < C q’.start –If q q’ and q’ hasn’t PSE, let p be descendent of q’ which has PSE, then Cq.start<Cp.start

21 PTwigStack Feature of Partial Solution Extension –If E has a PSE, E must have a Solution Extension of some twig queries derived from the given PSTQ, which means C E may participate in final results. Usage of Partial Solution Extension –Guiding the executing of PTwigStack

22 PTwigStack Problems(3) –How to guarantee no useless path solutions from being produced? Prevent useless elements from being pushed into stack –What is useless element? cannot satisfy query requirement with top elements in correlated stacks or head element in each element stream c1 b1 a1 Document B A C a1 Document c1 a0 b1 a1 b1c1 Document

23 PTwigStack Data Structure –Stack Each query node is also associated with a stack to compactly represent temporal results –Tag index Each query node is associated with an element stream

24 PTwigStack PTwigStack(root) // the first stage 1while not end(root) 2 q = getNext(root) 3 Clean All Stacks related with q and output relevant path solutions 4 If Cq can be pushed into Stack Sq 5 Push(Sq, Cq) 6 Processing other elements C q’ iteratively where q’ is child of q in the query and C q’.start < C q.start 7. Output all possible path solutions 8. Advance(Cq) //the second stage 9MergeAllPathSolution(); 6

25 PTwigStack b1 a1 a3 c2 B A C c1 b2a2 B A C A B C A C B B A C A BC QQ1Q2Q3Q4 Output: Final Result: PTwigStack(root) // the first stage 1.while not end(root) 2. q = getNext(root) 3. Clean All Stacks related with q and output path solutions 4. If Cq can be pushed into Stack Sq 5. Push(Sq, Cq) 6. Processing other elements Cq’ iteratively where q’ is child of q in the query and Cq’.start < Cq.start 7. Output all possible path solutions 8. Advance(Cq) //the second stage 9.MergeAllPathSolution();

26 PTwigStack b1 a1 a3 c2 B A C c1 b2a2 B A C A B C A C B B A C A BC QQ1Q2Q3Q4 Output: Final Result: PTwigStack(root) // the first stage 1.while not end(root) 2. q = getNext(root) 3. Clean All Stacks related with q and output path solutions 4. If Cq can be pushed into Stack Sq 5. Push(Sq, Cq) 6. Processing other elements Cq’ iteratively where q’ is child of q in the query and Cq’.start < Cq.start 7. Output all possible path solutions 8. Advance(Cq) //the second stage 9.MergeAllPathSolution(); c1

27 PTwigStack b1 a1 a3 c2 B A C c1 b2a2 B A C A B C A C B B A C A BC QQ1Q2Q3Q4 a1 b1 Output: Final Result: PTwigStack(root) // the first stage 1.while not end(root) 2. q = getNext(root) 3. Clean All Stacks related with q and output path solutions 4. If Cq can be pushed into Stack Sq 5. Push(Sq, Cq) 6. Processing other elements Cq’ iteratively where q’ is child of q in the query and Cq’.start < Cq.start 7. Output all possible path solutions 8. Advance(Cq) //the second stage 9.MergeAllPathSolution();

28 PTwigStack b1 a1a3 c2 B A C c1 b2a2 B A C A B C A C B B A C A BC QQ1Q2Q3Q4 a1 b1 Output: Final Result: PTwigStack(root) // the first stage 1.while not end(root) 2. q = getNext(root) 3. Clean All Stacks related with q and output path solutions 4. If Cq can be pushed into Stack Sq 5. Push(Sq, Cq) 6. Processing other elements Cq’ iteratively where q’ is child of q in the query and Cq’.start < Cq.start 7. Output all possible path solutions 8. Advance(Cq) //the second stage 9.MergeAllPathSolution();

29 PTwigStack b1 a1 a3 c2 B A C c1 b2 a2 B A C A B C A C B B A C A BC QQ1Q2Q3Q4 a1 b1c2 Output: Final Result: a1c2 PTwigStack(root) // the first stage 1.while not end(root) 2. q = getNext(root) 3. Clean All Stacks related with q and output path solutions 4. If Cq can be pushed into Stack Sq 5. Push(Sq, Cq) 6. Processing other elements Cq’ iteratively where q’ is child of q in the query and Cq’.start < Cq.start 7. Output all possible path solutions 8. Advance(Cq) //the second stage 9.MergeAllPathSolution();

30 PTwigStack b1 a1 a3 c2 B A C c1 b2 a2 B A C A B C A C B B A C A BC QQ1Q2Q3Q4 a1 b1 Output: Final Result: a1c2 a1b2 b2 PTwigStack(root) // the first stage 1.while not end(root) 2. q = getNext(root) 3. Clean All Stacks related with q and output path solutions 4. If Cq can be pushed into Stack Sq 5. Push(Sq, Cq) 6. Processing other elements Cq’ iteratively where q’ is child of q in the query and Cq’.start < Cq.start 7. Output all possible path solutions 8. Advance(Cq) //the second stage 9.MergeAllPathSolution();

31 PTwigStack b1 a1 a3 c2 B A C c1 b2a2 B A C A B C A C B B A C A BC QQ1Q2Q3Q4 a1 b1 Output: Final Result: a1c2 a1b2 a1b1 a1b1c2 a1b2c2 PTwigStack(root) // the first stage 1.while not end(root) 2. q = getNext(root) 3. Clean All Stacks related with q and output path solutions 4. If Cq can be pushed into Stack Sq 5. Push(Sq, Cq) 6. Processing other elements Cq’ iteratively where q’ is child of q in the query and Cq’.start < Cq.start 7. Output all possible path solutions 8. Advance(Cq) //the second stage 9.MergeAllPathSolution();

32 PTwigStack Properties: –Each element is scanned only once –Each element in stack must participate in at least one final result –No “Eliminating Operation” for redundant results –Space bounded by |Q|×L where L is the longest path in the XML source document and |Q| is the number of nodes in the given query Q

33 Outline Introduction Preliminary PTwigStack Conclusion

34 Conclusion We propose a concise but effective way to express the semantics of being at the same path by expanding XPath We propose a new concept, Partial Solution Extension, to guide the executing of getNext We propose a new holistic join method to process a PSTQ with root node

35 Future Work The above method cannot be applied directly to query without being specified with root node, e.g. –#[//A]//B –#[//A//B]//C –#[//A B]//C Possible Solution –Implementing special algorithm to process a PSTQ without being specified with root node (using Dewey code) –Using ORASS[4] to construct a twig query with more semantics constrains (using range code) [4] Gillian Dobbie, Wu Xiaoying, Tok Wang Ling, Mong Li Lee: ORA-SS: An Object-Relationship- Attribute Model for Semistructured Data TR21/00, Technical Report, Department of Computer Science, National University of Singapore, December 2000.

36 Thank You ! Q & A