Presentation is loading. Please wait.

Presentation is loading. Please wait.

Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

Similar presentations


Presentation on theme: "Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution."— Presentation transcript:

1 Querying Streaming XML Data

2 Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution  How to build up a solution to a given query  Features of the system

3 Streaming XML  XML – standard for information exchange.  Some XML documents only available in streaming format.  Streaming is like reading data from a tape drive.  Used in Stock Market, News, Network Statistics.  Predecessor systems used to filter documents.

4 Structure of an XPath Query  Consists of a Location path and an Output Expression (name).  Location path consists of closure axis(//), node test (book) and predicate (year>2000).  e.g. //book[year>2000]/name

5 Features of our Approach  Efficient  Easy to understand design.  Design of BPDT is tricky

6 Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author

7 Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author Element satisfies the path

8 Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure??

9 Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure?? Test passed. But year=2002?

10 Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure?? Test passed. But year=2002? Buffer both A & B

11 Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure?? Test passed. But year=2002? Failed price<11. Remove Buffer both A & B

12 Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure?? Test passed. But year=2002? Failed price<11. Remove Buffer both A & B Test passed. Output

13 Problems caused by closure axis 1. 2. 3. 4. X 5. A 6. 7. 8. Y 9. 10. 11. Z 12. B 13. 14. 1999 15. 16. 17. 2002 18. 19. Query: //pub[year=2002]//book[author]//name Pub [year=2002]book[author] Line 2TrueLine 7False Line 2TrueLine 10True Line 9FalseLine 10True

14 Problems caused by closure axis 1. 2. 3. 4. X 5. A 6. 7. 8. Y 9. 10. 11. Z 12. B 13. 14. 1999 15. 16. 17. 2002 18. 19. Query: //pub[year=2002]//book[author]//name Pub [year=2002]book[author] Line 2TrueLine 7False Line 2TrueLine 10True Line 9FalseLine 10True Fails year=2002

15 Problems caused by closure axis 1. 2. 3. 4. X 5. A 6. 7. 8. Y 9. 10. 11. Z 12. B 13. 14. 1999 15. 16. 17. 2002 18. 19. Query: //pub[year=2002]//book[author]//name Pub [year=2002]book[author] Line 2TrueLine 7False Line 2TrueLine 10True Line 9FalseLine 10True Fails year=2002 Passes year=2002

16 Problems caused by closure axis 1. 2. 3. 4. X 5. A 6. 7. 8. Y 9. B 10. 11. 12. Z 13. B 14. 15. 1999 16. 17. 18. 2002 19. 20. Query: //pub[year=2002]//book[author]//name Pub [year=2002]book[author] Line 2TrueLine 7False Line 2TrueLine 10True Line 9FalseLine 10True Fails year=2002 Passes year=2002 Lets add author. Result?

17 Handling XML Stream  Input – well formed XML stream.  Use SAX API to parse XML.  Events belong to  Begin = {(a, attrs, d)}  End = {(/a, d)}  Text = {(a, text(), d)}  XML Stream: {e 1,e 2,…,e i,…} ¦ e i Є Begin υ End υ Text

18 Grammar for XPath Queries  Q  N + [/O]  N  [/¦//] tag [F]  F  [FO [ OP constant ] ]  FO  @attribute ¦ tag [@attribute] ¦ text()  O  @attribute ¦ text()  OP  > ¦ ≥ ¦ = ¦ < ¦ ≥ ¦ ≠ ¦ contains  XPath query of the form N 1 N 2 …N n /O  Cant handle Reverse Axis, Positional Functions.

19 Solution to Query Query: /pub[year=2002]/book[price<11]/author PDAPDT

20 Basic PushDown Transducer (BPDT)  Similar to PushDown Automata  Actions defined on Transition Arcs  Finite set of states  A Start state  A set of final states  Set of input symbols  Set of Stack symbols

21  Book – Author: Buffer for future: Begin event of Author.  Book – Author: Remove from Buffer: End event of Book.  Book – Author: Output result if predicates true: Begin event of Author. Building a BPDT Query: /pub[year>2000]/book[author]/name/text() Consider location step: /book[author]

22 Basic Building Blocks XPath Expression: /tag[child]

23 Buffer Operations needed  Enqueue(x): Add x to the end of the queue.  Clear(): Removes all items from the queue.  Flush(): Outputs all items in the queue in FIFO order.  Upload(): Moves all items to the end of the queue of a parent BPDT.  No Dequeue operation needed.

24 Basic Building Blocks XPath Expression: /tag[@attr=val]

25 Basic Building Blocks XPath Expression: /tag[text()=val]

26 Basic Building Blocks XPath Expression: /tag[child@attr=val]

27 Basic Building Blocks XPath Expression: /tag[child=val]

28 A sample BPDT Query: /pub[year>2000]

29 Building a solution HPDT for Query: //pub[year>2000]//book[author]//name/text()

30 HPDT Structure  Each BPDT in HPDT has: Position  BPDT POSITION (l,K) :- l = depth of BPDT in HPDT, K = sequence # from right to left  BPDT Position (i-1,k) – has right child BPDT position (i,2k) – connected to NA state  BPDT Position(i-1,k) – has left child BPDT position (I,2k+1) – connected to True state.  BPDT Position (i, 2 i – 1) – means predicates in higher level BPDT’s evaluate to true Buffer – potential results Stack – stack of elements (SAX) events Depth Vector

31 Example Query 1. 2. 3. 4. X 5. A 6. 7. 8. Y 9. 10. 11. Z 12. B 13. 14. 1999 15. 16. 17. 2002 18. 19. Query: //pub[year=2002]//book[author]//name root pubbookname 12 711 121011 19 1011 3 paths from $1 to $14

32 System Features

33 Reference  Feng Peng and Sudarshan Chawate. XPath Queries on Streaming Data. In SIGMOD 2003.

34 Thank You ???


Download ppt "Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution."

Similar presentations


Ads by Google