2004/5/281 Approximate Counting of Frequent Query Patterns over XQuery Stream Liang Huai Yang, Mong Li Lee, Wynne HSU DASFAA 2004 Speaker:Ming Jing Tsai
2 Introduction Efficient approach to improve XML management system Cache frequently retrieved results Frequent query patterns application Search engine XML query system
3 Preliminaries S = QPT 1,QPT 2, …,QPT N Query pattern trees(QPT) Label:{ “ * ”, ” // ” } ∪ tagset Rooted subtree(RST) root(RST) = root(QPT) RST V ’ QPT V, RST E ’ QPT E
4 QPT book titleauthorprice book title author price fn ln book title section QPT 1 QPT 2 QPT 3 book titleauthorprice RST
5 Approximate Counting rst.count app ≧ (σ-ε)N rst.count app ≧ rst.count true -Εn XQuery stream divided into buckets of w = bcurrent =
6 D-GQPT book title author 54 fn ln 7 8 section price title RST 3 book titleauthorprice book titleauthorprice 1,2,-1,3,-1,8,-1
7 D-GQPT book title author 54 fn ln 7 8 section price title RST 3 book titleauthorprice book titleauthorprice 1,2,-1,4,-1,9,-1
8 ECTree G join G rmlne = G join G rmlne G join G rmlne G join G rmlne = G join G rmlne 1 368
9 Candidate Generation Rightmost active leaf node expansion G rmlne ( )= G join ( )= | = X j = i+1, …,N
10 Prune RST K+1 doesn ’ t exist in ECTree RST k+1.Δ = b current - β | RST K+1.tidlist| < β prune RST K+1 exists in ECTree RST K+1.count app = RST K+1. count app +|RST K+1.tidlist| RST K+1.count app + RST k+1.Δ < b current prune Join result with RST K+1 subtree induced by RST K+1
11 AppXQSMiner
12 AppXQSMiner
13 ECTree G join G rmlne = G join G rmlne G join G rmlne G join G rmlne = G join G rmlne 1 368
14 Experiment P4 2.4GHz, 1GB RAM, WINXP DBLP DTD:98 nodes Shakespears ’ Play DTD: 23 nodes
15 Experiment error=0.1 σ
16 Experiment error = 0.1 σ
17 Experiment sup = 0.005
18 Experiment sup = 0.005
19 Experiment error = 0.05 σ
20 Experiment error = 0.05 σ