1 Syntax-directed Transformations of XML Streams Stefanie Scherzinger joint work with Alfons Kemper.

Slides:



Advertisements
Similar presentations
XML/RDF 2 RDF/XML Resource Description Framework Resource Property Value c:semanticweb c:author c:corby Syntaxe XML.
Advertisements

XML Algebra By Sailaja P. KReSIT IIT Bombay. 30/09/2000 Sailaja P., KReSIT XML Workshop, IITBombay 2 Algebra and the World of DB zWhy Algebra yGives semantics.
Building Dynamic Market Places Using HyperQueries Christian Wiesner Peter Winklhofer Alfons Kemper Universität Passau.
The Institute for Learning and Research Technology is a national centre of excellence in the development and use of technology-based methods in teaching,
1 Web Data Management Path Expressions. 2 In this lecture Path expressions Regular path expressions Evaluation techniques Resources: Data on the Web Abiteboul,
Just to get it right...  We already have an ITS: the XML ITS  We will discuss another ITS: the RIM ITS.
Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams Bernhard Stegmaier (TU München) Joint work with.
2015/5/5 A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML Ning Zhang(University of Waterloo) Varun Kacholia(Indian Institute.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
© 2002 by Prentice Hall 1 SI 654 Database Application Design Winter 2003 Dragomir R. Radev.
Friday, September 4 th, 2009 The Systems Group at ETH Zurich XML and Databases Exercise Session 6 courtesy of Ghislain Fourny/ETH © Department of Computer.
Managing XML and Semistructured Data Lecture 8: Query Languages - XML-QL Prof. Dan Suciu Spring 2001.
Xquery Introduction by Examples. Sources XQuery 1.0: An XML Query LanguageW3C Working Draft 22 August 2003 Don Chamberlin’s Sigmod03 talk:
IS432: Semi-Structured Data Dr. Azeddine Chikh. 7. XQuery.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Efficient XML Stream Processing with Automata and Query Algebra A Master Thesis Presentation Student: Advisor: Reader: Jinhui Jian Prof. Elke A. Rundensteiner.
1 חלק XQuery :IV XML Query. 2 ביבליוגרפיה - DTD 3 ביבליוגרפיה – books.xml TCP/IP Illustrated Stevens W. Addison-Wesley Advanced Programming in.
Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.
1 Database Research at the UW  Faculty: Alon Halevy and Dan Suciu. A dozen Ph.D students  Related faculty: Oren Etzioni, Pedro Domingos, Dan Weld and.
About XML/Xquery/RDF 4/1. TEXT Structured (relational) Data XML Less Structure More Structure.
Managing XML and Semistructured Data Lecture 16: Indexes Prof. Dan Suciu Spring 2001.
CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:
Dan Suciu Univ. of Washington Querying XML Streams1 From Searching Text to Querying XML Streams Dan Suciu
Managing XML and Semistructured Data
XML QUERY LANGUAGE Prepared by Prof. Zaniolo, Hung-chih Yang, Ling-Jyh Chen Modified by Fernando Farfán.
Managing XML and Semistructured Data Lecture 14: Constraints and Keys Prof. Dan Suciu Spring 2001.
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
Managing XML and Semistructured Data Lecture 1: Preliminaries and Overview Prof. Dan Suciu Spring 2001.
Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Advisor: Prof. Zaniolo Hung-chih Yang Ling-Jyh Chen XML Query Language.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML Represented by: Ai Mu Based on the paper written by Ning Zhang, Varun.
Dan SuciuXML Toolkit1 From Searching Text to Querying XML Streams Dan Suciu
Introduction to XML and XQuery Guangjun (Kevin) Xie.
1 XTree for Declarative XML Querying Zhuo Chen, Tok Wang Ling, Mengchi Liu, and Gillian Dobbie January 2004.
Introduction to XQuery Bun Yue Professor, CS/CIS UHCL.
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
CSE 636 Data Integration Fall 2006 XML Query Languages XPath.
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
Transactions, Relational Algebra, XML February 11 th, 2004.
An Efficient Inverted Index Technique for XML Documents using RDBMS Prepared by Devrim Yıldırım Original paper by Chiyoung Seo.
Event Detection and Notification in the World-Wide Sensor Web Magdalena Balazinska with Evan Welbourne, Garret Cole, Nodira Khoussainova, Julie Letchner,
IS432 Semi-Structured Data Lecture 6: XQuery Dr. Gamal Al-Shorbagy.
September 2000XML Workshop, IIT Bombay Indexing of XML Data Raghuraman Rangarajan KReSIT, IIT Bombay.
1 חלק XQuery :IV XML Query. 2 ביבליוגרפיה - DTD 3 ביבליוגרפיה – books.xml TCP/IP Illustrated Stevens W. Addison-Wesley Advanced Programming in.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Dan SuciuXML Toolkit1 XMLTK: An XML Toolkit for Scalable XML Stream Processing I. Avila-Campillo, T.J. Green, A. Gupta, M. Onizuka, D. Raven, D. Suciu.
Testing How this works. Joint working Best to go buy.
Lecture 14: Relational Algebra Projects XML?
Creating an XSLT Style Sheet for Formatting Data
XML path expressions CSE 350 Fall 2003.
Managing XML and Semistructured Data
Management of XML and Semistructured Data
Managing XML and Semistructured Data
StreamGlobe: P2P Stream Sharing
Managing XML and Semistructured Data
Managing XML and Semistructured Data
Introduction to Database Systems CSE 444
Semi-Structured data (XML Data MODEL)
Building Trustworthy Semantic Web
Building Trustworthy Semantic Web
Semi-Structured data (XML)
Presentation transcript:

1 Syntax-directed Transformations of XML Streams Stefanie Scherzinger joint work with Alfons Kemper

Data on the Web Serge Abiteboul Peter Buneman Dan Suciu... <!ELEMENT book (year,title,author,author*) 1. Very long XML documents. 3. Schema information is available. 2. Applications need to be completely main-memory based. XML Stream Processing

3 XML Query Languages //book[year=2003]/title { for $x in input()//book where $x/year=2003 return {$x/title} {$x/author} } XPath XQuery XSLT Schema knowledge necessary to specify query!

4 TransformX Attribute Grammars 1.(Suitable) extended regular tree grammar, e.g. DTD 2.Add attribution functions (Java code) 3.Parser generator produces Java code: Validates the input Evaluates the attribution functions 4.Compile and execute

5 Extended Regular Tree Grammars Grammar G = (Nt,T,P, bib ) NonterminalsNt = { bib, pub, year,title, author } TerminalsT = {bib,book,year,title,author,PCDATA} bib ::= bib( pub * ) pub ::= book( year.title. author. author * ) pub ::= article ( year.title.author.author * ) year ::= year( PCDATA ) title ::= title( PCDATA ) author ::= author( PCDATA )  L(G)

6 Example: Task 1999 Data on the Web Serge Abiteboul Peter Buneman Dan Suciu... 1 Data on the Web 1999 Serge Abiteboul Peter Buneman Dan Suciu... 1.Re-label root to “books” 2.Retrieve all books, but not articles 3.For each book, output numerical identifier title, year, and authors input:output:

7 Example: TransformX Attribute Grammar

8 definition section rules section class-member section attribution functions

9

10 Grammar provides  context information  potential for optimization

11 Extended Regular Tree Grammars Grammar G = (Nt,T,P, bib ) NonterminalsNt = { bib, pub, year,title, author } TerminalsT = {bib,book,year,title,author,PCDATA} bib ::= bib( pub * ) pub ::= book( year.title. author. author * ) pub ::= article ( year.title.author.author * ) year ::= year( PCDATA ) title ::= title( PCDATA ) author ::= author( PCDATA )  L(G) Abbreviation:  (pub * )=( book  article)*

12 TDLL(1) Grammars ERTG where rhs is  or  (regular expression) is one-unambiguous: a*.a  a.a* a.b*  a.c*  a.(b*  c*)  deterministic parsing with one token lookahead  parse tree can be unambiguously constructed with lookahead of one token:  DTDs are a dialect of TDLL(1) grammars Lee, Mani, Murata, 2000.

13 Strong One-Unambiguity strongly one-unambiguous Koch, Scherzinger, 2003.

14 Syntax in the Abstract Attributed TDLL(1) grammar, i.e., each production 1.is of one of the four forms: n :: = t(  ) n :: = { f $[ } t(  ) n :: = t(  ) { f $] } n :: = { f $[ } t(  ) { f $] } 2.if  is an attributed regular expression, then for the regular expression  without the attribution functions:  (  ) must be strongly one-unambiguous

15 Example

16 Parse Tree

17 Attributed Parse Tree

18 Attributed Parse Tree bib book year title author     year title author

19 Attributed Parse Tree bib book year title author     year title author

20 bib book year title author     year title author L-attributed Grammars

21 bib book year title author     year title author

22 bib book year title author     year title author

23 bib book year title author     year title author

24 bib book year title author     year title author

25 bib book year title author     year title author

26

27 In Practice

28 In Practice

29 accessible from within attribution functions Class Members

30 transfer information between attribution functions TransformX Attributes

31 The TransformX Parser Generator Translation to Java source code: 1.The validator module –validate input –output attribution functions as encountered in attributed extended parse tree  generated in O(|G| 3 ) 2.The evaluator module –evaluate attribution functions –store attributes on stack  generated in O(1)

32 Experiments Prototype: C++ implementation, generates Java code Experiments: 1.Validate the input 2.Output the input 3.Evaluate example Data: Books and articles, datasets MB Memory consumption: 12 MB

33 Conclusion & Summary TransformX attribute grammars  specify many queries conveniently  often more convenient than SAX  grammar may reveal potential for optimization TransformX parser generator  little runtime-overhead (validation+attributes) Prototype implementation

34 Selected Related Work XML and Attribute Grammars M. Benedikt, C.Y. Chang, W. Fan, J. Freire, and R. Rastogi. “Capturing both Types and Constraints in Data Integration“. SIGMOD’03. M. Benedikt, C.Y. Chan, W. Fan, R. Rastogi, S. Zhen, and A. Zhou. “DTD-Directed Publishing with Attribute Translation Grammars“. VLDB’02. C. Koch and S. Scherzinger: “Attribute Grammars for Scalable Query Processing on XML Streams“, DBPL’03. F. Neven and J. van de Bussche. “Expressiveness of Structured Document Query Languages Based on Attribute Grammars“. JACM, Jan S. Nishimura and K. Nakano. “XML Stream Transformer Generation Through Program Composition and Dependency Analysis“. Science of Computer Programming, One-unambiguous Regular Languages Brüggemann-Klein and D. Wood. “One- Unambiguous Regular Languages“. Information and Computation, Strong One-unambiguity C. Koch and S. Scherzinger: “Attribute Grammars for Scalable Query Processing on XML Streams“, DBPL’03. TDLL(1) Grammars D. Lee, M. Mani, and M. Murata. “Reasoning about XML Schema Languages using Formal Language Theory.“ Technical Report RJ Log 95071, IBM Research, Nov Lex&Yacc J. R. Levine, T. Mason, D. Brown. “lex&yacc“. O‘Reilly, 1992.

35 Thank you