Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Syntax-directed Transformations of XML Streams Stefanie Scherzinger joint work with Alfons Kemper.

Similar presentations


Presentation on theme: "1 Syntax-directed Transformations of XML Streams Stefanie Scherzinger joint work with Alfons Kemper."— Presentation transcript:

1 1 Syntax-directed Transformations of XML Streams Stefanie Scherzinger joint work with Alfons Kemper

2 2 1999 Data on the Web Serge Abiteboul Peter Buneman Dan Suciu... <!ELEMENT book (year,title,author,author*) 1. Very long XML documents. 3. Schema information is available. 2. Applications need to be completely main-memory based. XML Stream Processing

3 3 XML Query Languages //book[year=2003]/title { for $x in input()//book where $x/year=2003 return {$x/title} {$x/author} } XPath XQuery XSLT Schema knowledge necessary to specify query!

4 4 TransformX Attribute Grammars 1.(Suitable) extended regular tree grammar, e.g. DTD 2.Add attribution functions (Java code) 3.Parser generator produces Java code: Validates the input Evaluates the attribution functions 4.Compile and execute

5 5 Extended Regular Tree Grammars Grammar G = (Nt,T,P, bib ) NonterminalsNt = { bib, pub, year,title, author } TerminalsT = {bib,book,year,title,author,PCDATA} bib ::= bib( pub * ) pub ::= book( year.title. author. author * ) pub ::= article ( year.title.author.author * ) year ::= year( PCDATA ) title ::= title( PCDATA ) author ::= author( PCDATA )  L(G)

6 6 Example: Task 1999 Data on the Web Serge Abiteboul Peter Buneman Dan Suciu... 1 Data on the Web 1999 Serge Abiteboul Peter Buneman Dan Suciu... 1.Re-label root to “books” 2.Retrieve all books, but not articles 3.For each book, output numerical identifier title, year, and authors input:output:

7 7 Example: TransformX Attribute Grammar

8 8 definition section rules section class-member section attribution functions

9 9

10 10 Grammar provides  context information  potential for optimization

11 11 Extended Regular Tree Grammars Grammar G = (Nt,T,P, bib ) NonterminalsNt = { bib, pub, year,title, author } TerminalsT = {bib,book,year,title,author,PCDATA} bib ::= bib( pub * ) pub ::= book( year.title. author. author * ) pub ::= article ( year.title.author.author * ) year ::= year( PCDATA ) title ::= title( PCDATA ) author ::= author( PCDATA )  L(G) Abbreviation:  (pub * )=( book  article)*

12 12 TDLL(1) Grammars ERTG where rhs is  or  (regular expression) is one-unambiguous: a*.a  a.a* a.b*  a.c*  a.(b*  c*)  deterministic parsing with one token lookahead  parse tree can be unambiguously constructed with lookahead of one token:  DTDs are a dialect of TDLL(1) grammars Lee, Mani, Murata, 2000.

13 13 Strong One-Unambiguity strongly one-unambiguous Koch, Scherzinger, 2003.

14 14 Syntax in the Abstract Attributed TDLL(1) grammar, i.e., each production 1.is of one of the four forms: n :: = t(  ) n :: = { f $[ } t(  ) n :: = t(  ) { f $] } n :: = { f $[ } t(  ) { f $] } 2.if  is an attributed regular expression, then for the regular expression  without the attribution functions:  (  ) must be strongly one-unambiguous

15 15 Example

16 16 Parse Tree

17 17 Attributed Parse Tree

18 18 Attributed Parse Tree bib book year title author     year title author

19 19 Attributed Parse Tree bib book year title author     year title author

20 20 bib book year title author     year title author L-attributed Grammars

21 21 bib book year title author     year title author

22 22 bib book year title author     year title author

23 23 bib book year title author     year title author

24 24 bib book year title author     year title author

25 25 bib book year title author     year title author

26 26

27 27 In Practice

28 28 In Practice

29 29 accessible from within attribution functions Class Members

30 30 transfer information between attribution functions TransformX Attributes

31 31 The TransformX Parser Generator Translation to Java source code: 1.The validator module –validate input –output attribution functions as encountered in attributed extended parse tree  generated in O(|G| 3 ) 2.The evaluator module –evaluate attribution functions –store attributes on stack  generated in O(1)

32 32 Experiments Prototype: C++ implementation, generates Java code Experiments: 1.Validate the input 2.Output the input 3.Evaluate example Data: Books and articles, datasets 31-122 MB Memory consumption: 12 MB

33 33 Conclusion & Summary TransformX attribute grammars  specify many queries conveniently  often more convenient than SAX  grammar may reveal potential for optimization TransformX parser generator  little runtime-overhead (validation+attributes) Prototype implementation

34 34 Selected Related Work XML and Attribute Grammars M. Benedikt, C.Y. Chang, W. Fan, J. Freire, and R. Rastogi. “Capturing both Types and Constraints in Data Integration“. SIGMOD’03. M. Benedikt, C.Y. Chan, W. Fan, R. Rastogi, S. Zhen, and A. Zhou. “DTD-Directed Publishing with Attribute Translation Grammars“. VLDB’02. C. Koch and S. Scherzinger: “Attribute Grammars for Scalable Query Processing on XML Streams“, DBPL’03. F. Neven and J. van de Bussche. “Expressiveness of Structured Document Query Languages Based on Attribute Grammars“. JACM, Jan. 2002. S. Nishimura and K. Nakano. “XML Stream Transformer Generation Through Program Composition and Dependency Analysis“. Science of Computer Programming, 2005. One-unambiguous Regular Languages Brüggemann-Klein and D. Wood. “One- Unambiguous Regular Languages“. Information and Computation, 1998. Strong One-unambiguity C. Koch and S. Scherzinger: “Attribute Grammars for Scalable Query Processing on XML Streams“, DBPL’03. TDLL(1) Grammars D. Lee, M. Mani, and M. Murata. “Reasoning about XML Schema Languages using Formal Language Theory.“ Technical Report RJ 10197 Log 95071, IBM Research, Nov. 2000. Lex&Yacc J. R. Levine, T. Mason, D. Brown. “lex&yacc“. O‘Reilly, 1992.

35 35 Thank you


Download ppt "1 Syntax-directed Transformations of XML Streams Stefanie Scherzinger joint work with Alfons Kemper."

Similar presentations


Ads by Google