Download presentation
Presentation is loading. Please wait.
1
Continuous Data Stream Processing MAKE Lab Date: 2006/03/07 Post-Excellence Project Subproject 6
2
Continuous Data Stream Processing 2 Clustering engine Clustering engine Music metadata Music metadata Music Virtual Channel … 1 1 N N 2 2 … Music collections Internet V.C. player V.C. player Filtering engine Filtering engine Music channel simulator Music channel simulator Interface Profile monitor Profile monitor Channel monitor Channel monitor Favorite channel Favorite channel Cluster monitor Cluster monitor Cluster coordinator Cluster coordinator Peer search engine Peer search engine Profile database Profile database MusicXML database MusicXML database XML Filtering engine XML Filtering engine
3
Continuous Data Stream Processing 3 Research Directions Streaming Data Management Mining Filtering Temporal Query Processing Spatial Query Processing Aggregate Query Processing Frequent Tree Pattern Mining Frequent Itemset Mining (sliding window) Sequence Query Matching Episode Query Matching Range Search KNN Search Top-K Search Closed Tree Pattern Mining Frequent Itemset Mining (landmark model)
4
Continuous Data Stream Processing 4 Sequence Query Matching Given a set of sequence queries (SQs), how to continuously monitor the event stream for them and report the segments that are approximate answers of certain queries as soon as the segments arrive according to the error bounds of the queries? Event Stream ······················ Sequence Query , ε=1
5
Continuous Data Stream Processing 5 Episode Query Matching Knowledge Discovery from Telecommunication Network Alarm Databases [ICDE96] If an alarm of type A occurs, then an alarm of type B occurs within 30 seconds with probability 0.8 If alarms of types A and B occurs within 5 seconds, then a alarm of type C occurs within 60 seconds with probability 0.7 If an alarm of type A precedes an alarm of type B, and C precedes D, all within 15 seconds, then E will follow within 4 minutes with probability 0.6 A A B 5 seconds CD A B 15 seconds
6
Continuous Data Stream Processing 6 Top-K Query Suppose there are two continuous queries and . Then, another continuous query is registered. Coordinator Server 1 Server 2Server 3 Server4 Queries Which two web documents are the most popular across the first and second servers? Which two web documents are the most popular across the third and fourth servers? Which two web documents are the most popular across the second and third servers?
7
Continuous Data Stream Processing 7 Main Difficulties Heavy Communication Cost The serve only updates its current data when necessary Multiple Continuous Queries Most papers focus on one-time top-k queries or single continuous top-k query Information sharing is necessary
8
Continuous Data Stream Processing 8 Search engine Search engine V.C. player V.C. player V.C. player V.C. player user profile, channel V.C. player recommended channel selected channel Vote Mechanism Spatial Query Processing Continuous queries for moving objects in high- dimensional space Range search KNN search user profile
9
Continuous Data Stream Processing 9 Problem Definition Given a set of objects with their positions on a N- dimension (N>20) region. The set of objects is highly dynamic: each object can move in an unrestricted fashion, i.e., we do not assume any pattern of motion Continuously monitoring the results of each query point Range Query KNN Query
10
Continuous Data Stream Processing 10 Main Difficulties Heavy Communication Cost The object updates occur only when the results for some queries might change Safe Region [SIGMOD05] Incremental Update Efficiently maintain the effective results Multiple Continuous Queries Decide the quarantine area for each query Mixed Types of Queries Support both the range query and the KNN query Q1Q1 Q2Q2 Q1 Q2 Q1Q2
11
Continuous Data Stream Processing 11 Range Query Query Q: (x,y), r Cell C A: max < r B: min r max C: min > r max: dis(query,cell) min: dis(query,cell)
12
Continuous Data Stream Processing 12 Range Query (Cont.) Moving Query MQ How to maintain the Result for a MQ?
13
Continuous Data Stream Processing 13 Range Query (Cont.) When to update? Q1Q2Q3 AAAAAA AABAAB AACAAC No update and no recalculate Update and recalculate for some queries No update and no recalculate We only need to consider those objects marked with B flag = 0/1 Client Server Q1Q2Q3
14
Continuous Data Stream Processing 14 Range Query (Cont.) For a range query Q Result list O3O5O7 Affected queries Q2Q4Q7 A For a cell C Q3Q6Q9 B C2 Covered cells C2 C3C4C5 A C2C7C9 B Query Motion
15
Continuous Data Stream Processing 15 KNN Query Query Q: (x,y), 3 update the order Object Update re-computation update the order
16
Continuous Data Stream Processing 16 KNN Query (Cont.) Query Q: (x,y), 3 Query Q ’ : (x ’,y ’ ), r r = d ’ max d’ max
17
Continuous Data Stream Processing 17 KNN Query (Cont.) Query Q: (x,y), 3 d max d query Query Q ’ : (x ’,y ’ ), r r = d max +d query
18
Continuous Data Stream Processing 18 KNN Query (Cont.) Query Q: (x,y), 3 d max d cell Query Q ’ : (x ’,y ’ ), r r = d max +d cell
19
Continuous Data Stream Processing 19 Tree Pattern Mining As the trees stream in, find out the subtrees that occur more than θ·N times, where N is the number of trees received so far and 0 ≦ θ ≦ 1 STMer Frequent Tree Patterns T1 T3 T2
20
Continuous Data Stream Processing 20 Closed Tree Pattern Mining Mining closed frequent subtrees over data streams a subtree is closed if none of its proper supertrees has the same support as its A B C D A B C B C D closed ABCD B D B C B C D A B C 23322322 frequent subtrees A B 2
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.