Download presentation
Presentation is loading. Please wait.
1
Querying Streaming XML Data
2
Layout of the presentation Introduction Common Problems faced Solution proposed Basic Building blocks of the solution How to build up a solution to a given query Features of the system
3
Streaming XML XML – standard for information exchange. Some XML documents only available in streaming format. Streaming is like reading data from a tape drive. Used in Stock Market, News, Network Statistics. Predecessor systems used to filter documents.
4
Structure of an XPath Query Consists of a Location path and an Output Expression (name). Location path consists of closure axis(//), node test (book) and predicate (year>2000). e.g. //book[year>2000]/name
5
Features of our Approach Efficient Easy to understand design. Design of BPDT is tricky
6
Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author
7
Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author Element satisfies the path
8
Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure??
9
Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure?? Test passed. But year=2002?
10
Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure?? Test passed. But year=2002? Buffer both A & B
11
Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure?? Test passed. But year=2002? Failed price<11. Remove Buffer both A & B
12
Common Problems faced 1. 2. 3. 4. 12.00 5. First 6. A 7. 10.00 8. 9. 10. 14.00 11. Second 12. A 13. B 14. 12.00 15. 16. 2002 17. 18. Query: /pub[year=2002]/book[price<11]/author Element satisfies the path Failure?? Test passed. But year=2002? Failed price<11. Remove Buffer both A & B Test passed. Output
13
Problems caused by closure axis 1. 2. 3. 4. X 5. A 6. 7. 8. Y 9. 10. 11. Z 12. B 13. 14. 1999 15. 16. 17. 2002 18. 19. Query: //pub[year=2002]//book[author]//name Pub [year=2002]book[author] Line 2TrueLine 7False Line 2TrueLine 10True Line 9FalseLine 10True
14
Problems caused by closure axis 1. 2. 3. 4. X 5. A 6. 7. 8. Y 9. 10. 11. Z 12. B 13. 14. 1999 15. 16. 17. 2002 18. 19. Query: //pub[year=2002]//book[author]//name Pub [year=2002]book[author] Line 2TrueLine 7False Line 2TrueLine 10True Line 9FalseLine 10True Fails year=2002
15
Problems caused by closure axis 1. 2. 3. 4. X 5. A 6. 7. 8. Y 9. 10. 11. Z 12. B 13. 14. 1999 15. 16. 17. 2002 18. 19. Query: //pub[year=2002]//book[author]//name Pub [year=2002]book[author] Line 2TrueLine 7False Line 2TrueLine 10True Line 9FalseLine 10True Fails year=2002 Passes year=2002
16
Problems caused by closure axis 1. 2. 3. 4. X 5. A 6. 7. 8. Y 9. B 10. 11. 12. Z 13. B 14. 15. 1999 16. 17. 18. 2002 19. 20. Query: //pub[year=2002]//book[author]//name Pub [year=2002]book[author] Line 2TrueLine 7False Line 2TrueLine 10True Line 9FalseLine 10True Fails year=2002 Passes year=2002 Lets add author. Result?
17
Handling XML Stream Input – well formed XML stream. Use SAX API to parse XML. Events belong to Begin = {(a, attrs, d)} End = {(/a, d)} Text = {(a, text(), d)} XML Stream: {e 1,e 2,…,e i,…} ¦ e i Є Begin υ End υ Text
18
Grammar for XPath Queries Q N + [/O] N [/¦//] tag [F] F [FO [ OP constant ] ] FO @attribute ¦ tag [@attribute] ¦ text() O @attribute ¦ text() OP > ¦ ≥ ¦ = ¦ < ¦ ≥ ¦ ≠ ¦ contains XPath query of the form N 1 N 2 …N n /O Cant handle Reverse Axis, Positional Functions.
19
Solution to Query Query: /pub[year=2002]/book[price<11]/author PDAPDT
20
Basic PushDown Transducer (BPDT) Similar to PushDown Automata Actions defined on Transition Arcs Finite set of states A Start state A set of final states Set of input symbols Set of Stack symbols
21
Book – Author: Buffer for future: Begin event of Author. Book – Author: Remove from Buffer: End event of Book. Book – Author: Output result if predicates true: Begin event of Author. Building a BPDT Query: /pub[year>2000]/book[author]/name/text() Consider location step: /book[author]
22
Basic Building Blocks XPath Expression: /tag[child]
23
Buffer Operations needed Enqueue(x): Add x to the end of the queue. Clear(): Removes all items from the queue. Flush(): Outputs all items in the queue in FIFO order. Upload(): Moves all items to the end of the queue of a parent BPDT. No Dequeue operation needed.
24
Basic Building Blocks XPath Expression: /tag[@attr=val]
25
Basic Building Blocks XPath Expression: /tag[text()=val]
26
Basic Building Blocks XPath Expression: /tag[child@attr=val]
27
Basic Building Blocks XPath Expression: /tag[child=val]
28
A sample BPDT Query: /pub[year>2000]
29
Building a solution HPDT for Query: //pub[year>2000]//book[author]//name/text()
30
HPDT Structure Each BPDT in HPDT has: Position BPDT POSITION (l,K) :- l = depth of BPDT in HPDT, K = sequence # from right to left BPDT Position (i-1,k) – has right child BPDT position (i,2k) – connected to NA state BPDT Position(i-1,k) – has left child BPDT position (I,2k+1) – connected to True state. BPDT Position (i, 2 i – 1) – means predicates in higher level BPDT’s evaluate to true Buffer – potential results Stack – stack of elements (SAX) events Depth Vector
31
Example Query 1. 2. 3. 4. X 5. A 6. 7. 8. Y 9. 10. 11. Z 12. B 13. 14. 1999 15. 16. 17. 2002 18. 19. Query: //pub[year=2002]//book[author]//name root pubbookname 12 711 121011 19 1011 3 paths from $1 to $14
32
System Features
33
Reference Feng Peng and Sudarshan Chawate. XPath Queries on Streaming Data. In SIGMOD 2003.
34
Thank You ???
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.