Download presentation
Presentation is loading. Please wait.
Published byRoy Bryan Modified over 9 years ago
1
BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams Jinwon Lee Y. Lee, S. Kang, S. Lee, H. Jin, B. Kim and J. Song (Korea Advanced Institute of Science and Technology)
2
2 Outline Border Monitoring Query (BMQ) BMQ-Index Experiments Related work Conclusion
3
3 GPSs Sensors Data stream monitoring Emerging Computing Environment 1110 121312 14 Data stream Continuous range queries Q 1 : 10 < value Q 2 : 11 < value < 13 ……. ◀ Remote Medical Service ◀ Disaster Prevention Flood Warning Earthquake Prediction Building Monitoring Traffic light control ▲ Automatic Home Automatic Ventilation Automatic Temperature Control Automatic Humidity Control ◀ Logistics Management Thief-proofing Catalog Advertisement ◀ Location-based Service Tracking (Friends, Employee) Vehicle Monitoring Intelligent Transportation
4
4 Motivating Service Scenario #1 Stock trading SAMSUNG stock price during 23 days from Nov. 16 th to Dec. 23 rd, 2005 Expensive !! ( > $640) Time Cheap !! ( < $600) buy sell buy Monitor stock data streams crossing the borders !!
5
5 Motivating Service Scenario #2 Location-based advertisement Going out Send a special lunch menu to people within 1km during lunch time !! Coming into Monitor location data streams crossing the borders !! Coupon Pet-Care
6
6 Border Monitoring Query To monitor data streams crossing the borders –Essential concern in many practical applications Users’ main interest Useful to automatically trigger or stop relevant actions BMQ (Border Monitoring Query) –A new type of continuous range query !! –It reports only data crossing the borders of a query range (= coming into or going out from the query range) RMQ (Region Monitoring Query) – Conventional continuous range query – It reports all matching data within a query range
7
7 Problem: Scalability !! A large number of BMQs can be issued Millions of stock investors will register their own queries Millions of stores will register their own queries + A huge volume of data streams are rapidly incoming + Fast response is also essential for users How can we process BMQs over data streams efficiently? –(1) Naïve approach Individual BMQ processing at each data update Lack of scalability !! –(2) Based on existing mechanisms for RMQ evaluation Shared RMQ processing by indexing queries Costly post-processing !!
8
8 Solution Approach: BMQ-Index Shared processing –By query indexing approach BMQ-Index is built on registered BMQs Upon a data arrival, only border-crossed queries are quickly searched for Achieves a high level of scalability !! Q 1, Q 2 (border-crossed queries) Registered BMQs Q 1 : 10 < value Q 2 : 11 < value < 13 ……. BMQ-Index 14 Data tuple
9
9 Solution Approach: BMQ-Index Incremental processing –By incremental access method Use previous search step for the next search Successive searches are significantly accelerated !! Keep information only needed for incremental search Low storage cost !! Q 1, Q 2 (border-crossed queries) Registered BMQs Q 1 : 10 < value Q 2 : 11 < value < 13 ……. BMQ-Index Series of data tuples 10 121312 14 Locality of data streams !!
10
10 One-dimensional BMQ-Index (Example) +Q1+Q1 ∞ +Q3 Q1 +Q4 Q3 +Q5 Stream_IDNode pointer IBM … Q2 Q4 Q5 0 10 15 20 5 25 30 35 45 Stream Table Linked list Q5 Q4 Q3 Q2 Q1 Registered BMQs 0 10 5 20 15 0 25 30 35 45 reasonable price range (unit: $) $10 $30 Notify me whenever the IBM stock price is coming into or going out from my reasonable price range !! +Q2
11
11 Search Operation in One-dimension (Example) Q5 Q4 Q3 Q2 Q1 ∞ 0 10 15 20 5 25 30 35 45 0 10 5 20 15 0 25 30 35 45 Case 2) 21 37 -Q2, -Q4, +Q5 Traverse BMQ-Index to the right Case 3) 21 8 +Q3, -Q4, +Q1 Traverse BMQ-Index to the left Case 1) 21 23 No border-crossed query No node traversal 37 21 8 Stream_IDNode pointer IBM … 23 +Q1+Q1+Q3 Q1 +Q4 Q3 +Q5 Q2 Q4 Q5 +Q2 : previous data value (v t-1 ) : current data value (v t )
12
12 Multi-dimensional BMQ-Index StreamIDVPXPX PYPY s1(v X1, v Y1 )RS-X 2 RS-Y 2 s2(v X2, v Y2 )RS-X 3 RS-Y 5 s3(v X3, v Y3 )RS-X 5 RS-Y 4 Stream Table b Y7 {Q 1 } {Q 2 } {Q 1 } {Q 3 } {Q 2 } Q1Q1 Q2Q2 Q3Q3 RS-X List RS-Y List RS-X 5 RS-X 6 RS-X 7 RS-X 4 RS-X 3 RS-X 2 {} -DQSet-X i {} RS-Y 2 RS-Y 3 RS-Y 4 RS-Y 5 RS-Y 6 RS-Y 7 +DQSet-Y i -DQSet-Y i {Q 1 } {Q 2 } {Q 3 } {} {Q 1 } {Q 3 } {Q 2 } +DQSet-X i {} b X0 b X1 b X2 b X3 b X4 b X5 b X7 b Y1 b Y2 b Y3 b Y4 b Y5 b Y6 b X6 RS-X 1 {} RS-Y 1 b Y0 v(s1) v(s2) v1(s3) v3(s3) v2(s3) QueryIDRange Q1Q1 (b X1, b X3, b Y1, b Y4 ) Q2Q2 (b X2, b X6, b Y2, b Y6 ) Q3Q3 (b X4, b X5, b Y3, b Y5 ) Query Table
13
13 Search Operation in Multi-dimension Overall flow Performance Analysis (d-dimension) –Search performance (((d–1)d) one-dimensional search time) –Storage cost (d one-dimensional storage cost) RS-X list.search() (xc, yc) RS-Y list.search() ±XQSet ±YQSet cross-check with Y-dimension cross-check with X-dimension Union xc yc ±YBMQSet ±XBMQSet QSet ± Per-dimension search Validation through cross-check Union of per-dimension results
14
14 Experiments Workload generation –Stock trading scenario (one-dimensional case) Data stream generation (Korea stock market[9]) –Fluctuation level: 0.01% ~ 0.1% –2000 stream sources, 1000 tuples in each stream Query generation –Lower bound: randomly chosen (1 ~ 10 6 ) –Width of queries: 1 ~ 10 times larger than FL –Number of queries: 10,000 ~ 100,000 Comparisons –An approach based on state-of-the-arts RMQ-Index (CEI[CIKM’05] and IS-list[Information System’96]) Performance metrics –Average search time per data tuple (millisecond) –Index storage size (Mbyte)
15
15 Search performance Effects of the number of queries (W=0.1%, FL=0.01%) Effects of the widths of queries (N=100000, FL=0.01%)
16
16 Storage cost Effects of the number of queries (W=0.1%) Effects of the widths of queries (N=100000) BMQ-Index: twice IS-list: log (# of queries) times CEI: all grids covered by a query range
17
17 Related Work Semantics –CQL (Continuous Query Language developed by STREAM project) General concept to transform a Relation to a Stream BMQ is a specific class of continuous range query Shared and Incremental Processing Previous researchDifference Data stream processing Tree-based (1-D: [2][4][5][14]) - O(log N) search performance - O(NlogN) storage cost Grid-based (1-D: [17], 2-D:[6][13]) - Better search performance than tree-based - Require more storage cost Spatio- temporal database SINA[11] (shared and incremental) - Disk-based algorithm - Not purely incremental access method GPAC[12] (incremental) - Not for shared processing Generally not feasible for BMQs !!
18
18 Conclusion Summary –Characterize a new type of continuous range query Border Monitoring Query (BMQ) Useful and practical in many emerging applications –One- and multi-dimensional BMQ-Index Evaluates a large number of BMQs in a shared and incremental manner, thereby achieving excellent search performance and low storage cost
19
19 Thank you Question?
20
Backup slide
21
21 Performance Analysis 1-dimensional BMQ-Index –Search performance (2 N q FL) –Storage cost (2N q + N d ) d-dimensional BMQ-Index –Search performance (((d–1)d) 2N q FL), only 2 times when d=2 –Storage cost (d(2N q + N d ) + N q ) N q = Number of queries N d = Number of data streams
22
22 Cross checking Algorithm –For +XQSet check whether v t is located between the Y predicates –For –XQSet check whether v t-1 is located between the Y predicates –YQSet is checked with X-dimension by a similar manner
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.