Download presentation
Presentation is loading. Please wait.
Published byNoreen Lindsay Hawkins Modified over 9 years ago
1
ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Keyword Search on Relational Data Streams Alexander Markowetz Yin Yang Dimitris Papadias Hong Kong University of Science and Technology
2
2 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Replacing SQL With Keywords Query: “Tarantino, Travolta”
3
3 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. SQL queries & Operator Trees Select * From actors A1, plays P1, movies M, plays P2, actors A2 Where A1.name = Tarantino and A2.name = Travolta and A1.aid = P1.aid and P1.mid = M.mid and M.mid = P2.mid and P2.aid = A2.aid Select * From directors D, movies M, plays P, actors A Where D.name = Tarantino and A.name = Travolta and D.did = M.did and M.mid = P.mid and P.aid = A.aid
4
4 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Data Graph G Nodes = Tuples Edges = “can be joined” Supports query processing –[ Bhalotia et al., ICDE, 2002 ]
5
5 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. MTJNT Sub-graph of G –Contain all keywords –Minimal Answer R-KWS query Limited to T max nodes –Longer joins = irrelevant results [Hristidis et al. VLDB 2002]
6
6 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Candidate Networks (CN) Abstractions of MTJNT
7
7 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Operator Trees for Candidate Networks CN Output: MTJNT OP-Tree Leaves = Selections Inner nodes = Joins
8
8 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Challenges… …of S-KWS, with regards to R-KWS: Stream Semantics Efficient CN-Generation Optimized Operator Execution Stream Specific Issues
9
9 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Instantaneous Data Graph
10
10 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Architecture
11
11 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. CN-Generation Target: Avoid duplicate CN Idea: Model CN as unique tree Pre-order generation
12
12 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Operator Mesh
13
13 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Demand Driven Operator Execution (I)
14
14 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Demand Driven Operator Execution (II)
15
15 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Dynamic Mesh Expansion Create operators when input is available Destroy, if no input
16
16 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Purging Expired Tuples from Buffers Positive / Negative –Negative tuples remove positives Sliding windows 1.Bottom-Up Similar to Pos./Neg. 2.Lazy Expired tuples remain in buffers Purged during join execution
17
17 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Mesh Migration Schema changes at runtime Stream removal –Trivial Stream Arrival –Generate new operator mesh, on top of old –Preserve intermediate results –Observer node order
18
18 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Window Size PM excels in memory, FM in speed Forest always ranks last –Omitted from remaining slides CPU ConsumptionPeak Memory
19
19 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Keyword Frequency No impact on number of tuples or joinability More MTJNT –Require CPU and memory for production and storage FM looses memory advantage –Tuples travel higher in mesh CPU ConsumptionPeak Memory
20
20 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Number of Streaming Relations Linear growth in mesh –No more then 4 neighbors in schema Relative performance as expected CPU ConsumptionPeak Memory
21
21 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Join Selectivity Quadratic increase in number of graph edges –Increasing intermediate results and MTJNT –Increasing CPU and memory consumption More tuples travel up the mesh: –PM (re-) generates more operators; hence requires more CPU CPU ConsumptionPeak Memory
22
22 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Number of Keywords Exponential growth in the complexity of the operator mesh –Reflected in initialization time Most operators in large meshes commonly idle –Increased savings through PM CPU ConsumptionPeak Memory
23
23 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Maximal size of results T max Similar impact to |K| –Exponential growth of mesh - Reflected by initialization phase Exponential growth of results CPU ConsumptionPeak Memory
24
24 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Demand Driven Operator Execution Reduced CPU –Avoided unnecessary joins Reduced memory consumption –Avoided storage of many intermediate results CPU ConsumptionPeak Memory
25
25 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Conclusion Benefits larger than R-KWS More intricate than R-KWS Contributions: –S-KWS semantics –CN generation –S-KWS query processing Full Mesh Partial Mesh Optimizations
26
26 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. Future Work Advanced Demand Driven Operator Ex. Graph Based S-KWS Parallel queries Top-k results Combination with R-KWS
27
27 ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, 2007. … for Listening Questions? Thank You …
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.