Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Slides:



Advertisements
Similar presentations
COSC2007 Data Structures II Chapter 10 Trees I. 2 Topics Terminology.
Advertisements

Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.
Spring Part III: Introduction to XPath XML Path Language.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Introduction to XSLT. What is XML? Design Goals of XML XML Format XML Declaration ElementsAttributes.
Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
Boosting XML filtering through a scalable FPGA-based architecture A. Mitra, M. Vieira, P. Bakalov, V. Tsotras, W. Najjar.
©Brooks/Cole, 2003 Chapter 12 Abstract Data Type.
An Algorithm for Streaming XPath Processing with Forward and Backward Axes Charles Barton, Philippe Charles, Deepak Goyal, Mukund Raghavchari IBM T. J.
XPath Eugenia Fernandez IUPUI. XML Path Language (XPath) a data model for representing an XML document as an abstract node tree a mechanism for addressing.
1 Conditional XPath, the first order complete XPath dialect Maarten Marx Presented by: Einav Bar-Ner.
Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.
1 Overview of XPath. 2 XPATH XPath expressions are used to locate nodes in XML documents The list of nodes located by an XPath expression is called a.
A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
©Brooks/Cole, 2003 Chapter 12 Abstract Data Type.
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
XML files (with LINQ). Introduction to LINQ ( Language Integrated Query ) C#’s new LINQ capabilities allow you to write query expressions that retrieve.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
CSE3201/CSE4500 XPath. 2 XPath A locator for elements or attributes in an XML document. XPath expression gives direction.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
1/17 ITApplications XML Module Session 7: Introduction to XPath.
CSE3201/CSE4500 Information Retrieval Systems
XPath XPath is used to navigate through elements and attributes in an XML document. XPath is a major element in W3C's XSLT standard - and XQuery and XPointer.
1 XPath XPath became a W3C Recommendation 16. November 1999 XPath is a language for finding information in an XML document XPath is used to navigate through.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
HKU CSIS DB Seminar: HKU CSIS DB Seminar: Efficient Filtering of XML Documents for Selective Dissemination of Information Mehmet Altinel, Micheal J. Franklin.
Streaming Processing of Large XML Data Jana Dvořáková, Filip Zavoral processing of large XML data using XSLT with optimal memory complexity formal model.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
Intro to XML Originally Presented by Clifford Lemoine Modified by Box.
1 XSLT An Introduction. 2 XSLT XSLT (extensible Stylesheet Language:Transformations) is a language primarily designed for transforming the structure of.
BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)
Database Systems Part VII: XML Querying Software School of Hunan University
Recursive Data Structures and Grammars  Themes  Recursive Description of Data Structures  Grammars and Parsing  Recursive Definitions of Properties.
WPI, MOHAMED ELTABAKH PROCESSING AND QUERYING XML 1.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
XML Refresher Course Bálint Joó School of Physics University of Edinburgh May 02, 2003.
Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.
Data Structures: Advanced Damian Gordon. Advanced Data Structure We’ll look at: – Linked Lists – Trees – Stacks – Queues.
XML Access Control Koukis Dimitris Padeleris Pashalis.
Streaming XPath Engine Oleg Slezberg Amruta Joshi.
1 Typing XQuery WANG Zhen (Selina) Something about the Internship Group Name: PROTHEO, Inria, France Research: Rewriting and strategies, Constraints,
1 XML Data Management XPath Principles Werner Nutt.
1 JAXP & XPATH. Objectives 2  XPath  JAXP Processing of XPath  Workshops.
Chapter 12 Abstract Data Type. Understand the concept of an abstract data type (ADT). Understand the concept of a linear list as well as its operations.
More XML XPATH, XSLT CS 431 – February 23, 2005 Carl Lagoze – Cornell University.
Session II Chapter 3 – Chapter 3 – XPath Patterns & Expressions Chapter 4 – XPath Functions Chapter 15 – XPath 2.0http://
Martin Kruliš by Martin Kruliš (v1.1)1.
CSE3201/CSE4500 XPath. 2 XPath A locator for items in XML document. XPath expression gives direction of navigation.
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia.
XML Extensible Markup Language
1 XPath Queries on Streaming Data Feng Peng and Sudarshan S. Chawathe İsmail GÜNEŞ Ayşe GENÇ
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Extension of linked list
Chapter 12 Abstract Data Type.
Efficient Evaluation of XQuery over Streaming Data
Compressing XML Documents with Finite State Automata
Querying and Transforming XML Data
Table-driven parsing Parsing performed by a finite state machine.
High-Performance XML Filtering with YFilter
Efficient Filtering of XML Documents with XPath Expressions
XML in Web Technologies
(b) Tree representation
XML Path Language Andy Clark 17 Apr 2002.
XML Data Introduction, Well-formed XML.
Early Profile Pruning on XML-aware Publish-Subscribe Systems
Presentation transcript:

Querying Streaming XML Data

Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution  How to build up a solution to a given query  Features of the system

Streaming XML  XML – standard for information exchange.  Some XML documents only available in streaming format.  Streaming is like reading data from a tape drive.  Used in Stock Market, News, Network Statistics.  Predecessor systems used to filter documents.

Structure of an XPath Query  Consists of a Location path and an Output Expression (name).  Location path consists of closure axis(//), node test (book) and predicate (year>2000).  e.g. //book[year>2000]/name

Features of our Approach  Efficient  Easy to understand design.  Design of BPDT is tricky

Common Problems faced First 6. A Second 12. A 13. B Query: /pub[year=2002]/book[price<11]/author

Common Problems faced First 6. A Second 12. A 13. B Query: /pub[year=2002]/book[price<11]/author Element satisfies the path

Common Problems faced First 6. A Second 12. A 13. B Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure??

Common Problems faced First 6. A Second 12. A 13. B Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure?? Test passed. But year=2002?

Common Problems faced First 6. A Second 12. A 13. B Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure?? Test passed. But year=2002? Buffer both A & B

Common Problems faced First 6. A Second 12. A 13. B Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure?? Test passed. But year=2002? Failed price<11. Remove Buffer both A & B

Common Problems faced First 6. A Second 12. A 13. B Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure?? Test passed. But year=2002? Failed price<11. Remove Buffer both A & B Test passed. Output

Problems caused by closure axis X 5. A Y Z 12. B Query: //pub[year=2002]//book[author]//name Pub [year=2002]book[author] Line 2TrueLine 7False Line 2TrueLine 10True Line 9FalseLine 10True

Problems caused by closure axis X 5. A Y Z 12. B Query: //pub[year=2002]//book[author]//name Pub [year=2002]book[author] Line 2TrueLine 7False Line 2TrueLine 10True Line 9FalseLine 10True Fails year=2002

Problems caused by closure axis X 5. A Y Z 12. B Query: //pub[year=2002]//book[author]//name Pub [year=2002]book[author] Line 2TrueLine 7False Line 2TrueLine 10True Line 9FalseLine 10True Fails year=2002 Passes year=2002

Problems caused by closure axis X 5. A Y 9. B Z 13. B Query: //pub[year=2002]//book[author]//name Pub [year=2002]book[author] Line 2TrueLine 7False Line 2TrueLine 10True Line 9FalseLine 10True Fails year=2002 Passes year=2002 Lets add author. Result?

Handling XML Stream  Input – well formed XML stream.  Use SAX API to parse XML.  Events belong to  Begin = {(a, attrs, d)}  End = {(/a, d)}  Text = {(a, text(), d)}  XML Stream: {e 1,e 2,…,e i,…} ¦ e i Є Begin υ End υ Text

Grammar for XPath Queries  Q  N + [/O]  N  [/¦//] tag [F]  F  [FO [ OP constant ] ]  FO ¦ tag ¦ text()  O ¦ text()  OP  > ¦ ≥ ¦ = ¦ < ¦ ≥ ¦ ≠ ¦ contains  XPath query of the form N 1 N 2 …N n /O  Cant handle Reverse Axis, Positional Functions.

Solution to Query Query: /pub[year=2002]/book[price<11]/author PDAPDT

Basic PushDown Transducer (BPDT)  Similar to PushDown Automata  Actions defined on Transition Arcs  Finite set of states  A Start state  A set of final states  Set of input symbols  Set of Stack symbols

 Book – Author: Buffer for future: Begin event of Author.  Book – Author: Remove from Buffer: End event of Book.  Book – Author: Output result if predicates true: Begin event of Author. Building a BPDT Query: /pub[year>2000]/book[author]/name/text() Consider location step: /book[author]

Basic Building Blocks XPath Expression: /tag[child]

Buffer Operations needed  Enqueue(x): Add x to the end of the queue.  Clear(): Removes all items from the queue.  Flush(): Outputs all items in the queue in FIFO order.  Upload(): Moves all items to the end of the queue of a parent BPDT.  No Dequeue operation needed.

Basic Building Blocks XPath Expression:

Basic Building Blocks XPath Expression: /tag[text()=val]

Basic Building Blocks XPath Expression:

Basic Building Blocks XPath Expression: /tag[child=val]

A sample BPDT Query: /pub[year>2000]

Building a solution HPDT for Query: //pub[year>2000]//book[author]//name/text()

HPDT Structure  Each BPDT in HPDT has: Position  BPDT POSITION (l,K) :- l = depth of BPDT in HPDT, K = sequence # from right to left  BPDT Position (i-1,k) – has right child BPDT position (i,2k) – connected to NA state  BPDT Position(i-1,k) – has left child BPDT position (I,2k+1) – connected to True state.  BPDT Position (i, 2 i – 1) – means predicates in higher level BPDT’s evaluate to true Buffer – potential results Stack – stack of elements (SAX) events Depth Vector

Example Query X 5. A Y Z 12. B Query: //pub[year=2002]//book[author]//name root pubbookname paths from $1 to $14

System Features

Reference  Feng Peng and Sudarshan Chawate. XPath Queries on Streaming Data. In SIGMOD 2003.

Thank You ???