Presentation is loading. Please wait.

Presentation is loading. Please wait.

ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Keyword Search on Relational Data Streams Alexander Markowetz Yin.

Similar presentations


Presentation on theme: "ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Keyword Search on Relational Data Streams Alexander Markowetz Yin."— Presentation transcript:

1 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Keyword Search on Relational Data Streams Alexander Markowetz Yin Yang Dimitris Papadias Hong Kong University of Science and Technology

2 2 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Replacing SQL With Keywords Query: “Tarantino, Travolta”

3 3 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. SQL queries & Operator Trees Select * From actors A1, plays P1, movies M, plays P2, actors A2 Where A1.name = Tarantino and A2.name = Travolta and A1.aid = P1.aid and P1.mid = M.mid and M.mid = P2.mid and P2.aid = A2.aid Select * From directors D, movies M, plays P, actors A Where D.name = Tarantino and A.name = Travolta and D.did = M.did and M.mid = P.mid and P.aid = A.aid

4 4 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Data Graph G Nodes = Tuples Edges = “can be joined” Supports query processing –[ Bhalotia et al., ICDE, 2002 ]

5 5 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. MTJNT Sub-graph of G –Contain all keywords –Minimal Answer R-KWS query Limited to T max nodes –Longer joins = irrelevant results [Hristidis et al. VLDB 2002]

6 6 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Candidate Networks (CN) Abstractions of MTJNT

7 7 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Operator Trees for Candidate Networks CN Output: MTJNT OP-Tree Leaves = Selections Inner nodes = Joins

8 8 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Challenges… …of S-KWS, with regards to R-KWS: Stream Semantics Efficient CN-Generation Optimized Operator Execution Stream Specific Issues

9 9 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Instantaneous Data Graph

10 10 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Architecture

11 11 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. CN-Generation Target: Avoid duplicate CN Idea: Model CN as unique tree Pre-order generation

12 12 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Operator Mesh

13 13 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Demand Driven Operator Execution (I)

14 14 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Demand Driven Operator Execution (II)

15 15 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Dynamic Mesh Expansion Create operators when input is available Destroy, if no input

16 16 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Purging Expired Tuples from Buffers Positive / Negative –Negative tuples remove positives Sliding windows 1.Bottom-Up Similar to Pos./Neg. 2.Lazy Expired tuples remain in buffers Purged during join execution

17 17 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Mesh Migration Schema changes at runtime Stream removal –Trivial Stream Arrival –Generate new operator mesh, on top of old –Preserve intermediate results –Observer node order

18 18 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Window Size PM excels in memory, FM in speed Forest always ranks last –Omitted from remaining slides CPU ConsumptionPeak Memory

19 19 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Keyword Frequency No impact on number of tuples or joinability More MTJNT –Require CPU and memory for production and storage FM looses memory advantage –Tuples travel higher in mesh CPU ConsumptionPeak Memory

20 20 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Number of Streaming Relations Linear growth in mesh –No more then 4 neighbors in schema Relative performance as expected CPU ConsumptionPeak Memory

21 21 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Join Selectivity Quadratic increase in number of graph edges –Increasing intermediate results and MTJNT –Increasing CPU and memory consumption More tuples travel up the mesh: –PM (re-) generates more operators; hence requires more CPU CPU ConsumptionPeak Memory

22 22 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Number of Keywords Exponential growth in the complexity of the operator mesh –Reflected in initialization time Most operators in large meshes commonly idle –Increased savings through PM CPU ConsumptionPeak Memory

23 23 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Maximal size of results T max Similar impact to |K| –Exponential growth of mesh - Reflected by initialization phase Exponential growth of results CPU ConsumptionPeak Memory

24 24 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Demand Driven Operator Execution Reduced CPU –Avoided unnecessary joins Reduced memory consumption –Avoided storage of many intermediate results CPU ConsumptionPeak Memory

25 25 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Conclusion Benefits larger than R-KWS More intricate than R-KWS Contributions: –S-KWS semantics –CN generation –S-KWS query processing Full Mesh Partial Mesh Optimizations

26 26 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Future Work Advanced Demand Driven Operator Ex. Graph Based S-KWS Parallel queries Top-k results Combination with R-KWS

27 27 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. … for Listening Questions? Thank You …


Download ppt "ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Keyword Search on Relational Data Streams Alexander Markowetz Yin."

Similar presentations


Ads by Google