Adaptive Stream Filters for Entity-based Queries with Non-value Tolerance VLDB 2005 Reynold Cheng (Speaker) Ben Kao, Alan Kwan Sunil Prabhakar, Yicheng.

Slides:



Advertisements
Similar presentations
Quality-aware Data Collection in Energy Harvesting WSN Nga Dang Elaheh Bozorgzadeh Nalini Venkatasubramanian University of California, Irvine.
Advertisements

Quality Aware Privacy Protection for Location-based Services Zhen Xiao, Xiaofeng Meng Renmin University of China Jianliang Xu Hong Kong Baptist University.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Coverage by Directional Sensors Jing Ai and Alhussein A. Abouzeid Dept. of Electrical, Computer and Systems Engineering Rensselaer Polytechnic Institute.
Introduction to Wireless Sensor Networks
Byzantine Generals Problem: Solution using signed messages.
Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.
U-DBMS: A Database System for Managing Constantly-Evolving Data (VLDB 2005) Reynold Cheng Hong Kong Polytechnic University.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
SIA: Secure Information Aggregation in Sensor Networks Bartosz Przydatek, Dawn Song, Adrian Perrig Carnegie Mellon University Carl Hartung CSCI 7143: Secure.
Distributed Set-Expression Cardinality Estimation Abhinandan Das (Cornell U.) Sumit Ganguly (I.I.T. Kanpur) Minos Garofalakis (Bell Labs.) Rajeev Rastogi.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Towards Feasibility Region Calculus: An End-to-end Schedulability Analysis of Real- Time Multistage Execution William Hawkins and Tarek Abdelzaher Presented.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
M ERCURY : A Scalable Publish-Subscribe System for Internet Games Ashwin R. Bharambe, Sanjay Rao & Srinivasan Seshan Carnegie Mellon University.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Adaptive Sampling for Sensor Networks Ankur Jain ٭ and Edward Y. Chang University of California, Santa Barbara DMSN 2004.
Communication-Efficient Distributed Monitoring of Thresholded Counts Ram Keralapura, UC-Davis Graham Cormode, Bell Labs Jai Ramamirtham, Bell Labs.
ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Computer Science Department Stony Brook University.
Adaptive Sampling in Distributed Streaming Environment Ankur Jain 2/4/03.
Distributed Slicing in Dynamic Systems A. Fernández, V. Gramoli, E. Jiménez, A-M. Kermarrec, M. Raynal.
Lecture 3 Aug 31, 2011 Goals: Chapter 2 (algorithm analysis) Examples: Selection sorting rules for algorithm analysis discussion of lab – permutation generation.
Lecture 3 Feb 7, 2011 Goals: Chapter 2 (algorithm analysis) Examples: Selection sorting rules for algorithm analysis Image representation Image processing.
Models and Issues in Data Streaming Presented By :- Ankur Jain Department of Computer Science 6/23/03 A list of relevant papers is available at
SIGMOD’03 Evaluating Probabilistic Queries over Imprecise Data Reynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar Department of Computer Science, Purdue.
Top-k Monitoring in Wireless Sensor Networks Minji Wu, Jianliang Xu, Xueyan Tang, and Wang-Chien Lee IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
DEXA 2005 Quality-Aware Replication of Multimedia Data Yicheng Tu, Jingfeng Yan and Sunil Prabhakar Department of Computer Sciences, Purdue University.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
Myopic Policies for Budgeted Optimization with Constrained Experiments Javad Azimi, Xiaoli Fern, Alan Fern Oregon State University AAAI, July
Reynold Cheng†, Eric Lo‡, Xuan S
1 On Querying Historical Evolving Graph Sequences Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren,
Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 A Statistics-Based Sensor Selection.
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 Probabilistic Continuous Update.
MINT Views: Materialized In-Network Top-k Views in Sensor Networks Demetrios Zeinalipour-Yazti (Uni. of Cyprus) Panayiotis Andreou (Uni. of Cyprus) Panos.
HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.
Machine Learning Approach to Report Prioritization with an Application to Travel Time Dissemination Piotr Szczurek Bo Xu Jie Lin Ouri Wolfson.
1 A Bidding Protocol for Deploying Mobile Sensors GuilingWang, Guohong Cao, and Tom LaPorta Department of Computer Science & Engineering The Pennsylvania.
ICS280 Presentation by Suraj Nagasrinivasa (1) Evaluating Probabilistic Queries over Imprecise Data (SIGMOD 2003) by R Cheng, D Kalashnikov, S Prabhakar.
Dave McKenney 1.  Introduction  Algorithms/Approaches  Tiny Aggregation (TAG)  Synopsis Diffusion (SD)  Tributaries and Deltas (TD)  OPAG  Exact.
Kaleidoscope – Adding Colors to Kademlia Gil Einziger, Roy Friedman, Eyal Kibbar Computer Science, Technion 1.
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
1 RECONSTRUCTION OF APPLICATION LAYER MESSAGE SEQUENCES BY NETWORK MONITORING Jaspal SubhlokAmitoj Singh University of Houston Houston, TX Fermi National.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
To Tune or not to Tune? A Lightweight Physical Design Alerter Nico Bruno, Surajit Chaudhuri DMX Group, Microsoft Research VLDB’06.
Presented By, Shivvasangari Subramani. 1. Introduction 2. Problem Definition 3. Intuition 4. Experiments 5. Real Time Implementation 6. Future Plans 7.
Monitoring k-NN Queries over Moving Objects Xiaohui Yu University of Toronto Joint work with Ken Pu and Nick Koudas.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
1 The Threshold Join Algorithm for Top-k Queries in Distributed Sensor Networks D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos, V. Kalogeraki, V. Tsotras.
Physical clock synchronization Question 1. Why is physical clock synchronization important? Question 2. With the price of atomic clocks or GPS coming down,
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar.
Approximate NN queries on Streams with Guaranteed Error/performance Bounds Nick AT&T labs-research Beng Chin Ooi, Kian-Lee Tan, Rui National.
Offering a Precision- Performance Tradeoff for Aggregation Queries over Replicated Data Paper by Chris Olston, Jennifer Widom Presented by Faizaan Kersi.
Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Song Liu‡, Sunil Prabhakar†, and Bin Yao‡ † Department of Computer.
I owa S tate U niversity Laboratory for Advanced Networks (LAN) Coverage and Connectivity Control of Wireless Sensor Networks under Mobility Qiang QiuAhmed.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Introduction to Wireless Sensor Networks
Spatio-temporal Pattern Queries
Reynold Cheng (Speaker) Ben Kao, Alan Kwan Sunil Prabhakar, Yicheng Tu
Physical clock synchronization
Efficient Processing of Top-k Spatial Preference Queries
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

Adaptive Stream Filters for Entity-based Queries with Non-value Tolerance VLDB 2005 Reynold Cheng (Speaker) Ben Kao, Alan Kwan Sunil Prabhakar, Yicheng Tu The Hong Kong Polytechnic University The University of Hong Kong Purdue University

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 2 Data Streams and Applications Data Stream Management Systems (DSMS) Data Stream Management Systems (DSMS) –Sensor networks, location-based applications –STREAM [ABB03], STEAM [HAFME03], AURORA [ACC03], CACQ [MSH02] Stream applications Stream applications –Telecom call records –Network security [BO03] –Habitat monitoring [MPS02] –Structural health monitoring ContinuousQueries

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 3 DSMS Model User Query Processing Unit Central Processor Continuous Query Result (Refreshed if needed) stream Network Real-time, Response Time requirement Massive, Fast Limited memory, CPU, network bandwidth

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 4 Trading Accuracy for Query Timeliness A user may accept an answer with a carefully controlled error tolerance A user may accept an answer with a carefully controlled error tolerance –wide-area resource accounting –load-balancing in replicated servers The system exploits error tolerance to reduce communication and computation costs The system exploits error tolerance to reduce communication and computation costs

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 5 Value-based Tolerance Often assumed in literature [ OJW03, JCW04 ] Often assumed in literature [ OJW03, JCW04 ] Maximum error is a numerical value  specified by user Maximum error is a numerical value  specified by user MAX Query: Return sensor id with the highest temperature MAX Query: Return sensor id with the highest temperature Guarantee the sensor id returned has temperature value not lower than  from that of the true answer Guarantee the sensor id returned has temperature value not lower than  from that of the true answer

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 6 Is Selecting  Easy? Location-based application: a user inquires about his closest neighbor Location-based application: a user inquires about his closest neighbor –Should the tolerance be 0.1, 1, or 100 meters? Sensor network collects humidity, temperature, UV- index, wind speed Sensor network collects humidity, temperature, UV- index, wind speed –Does user know the range of error for each type? Multi-dimensional data streams (e.g., location) Multi-dimensional data streams (e.g., location) Multimedia data streams (e.g., CCTV images) Multimedia data streams (e.g., CCTV images)

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 7 Is Selecting  for MAX Query easy? Suppose a user accepts an object that ranks 2 nd or above.  small If  is too small……  large If  is too large……  ideal The ideal  …… Tolerance wasted Error unacceptable

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 8 Rank-based Tolerance Express error tolerance as a rank Express error tolerance as a rank Error tolerance = no. of positions the returned sensor could rank below the highest one Error tolerance = no. of positions the returned sensor could rank below the highest one More intuitive and easier to specify More intuitive and easier to specify Rank-based tolerance = 1

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 9 Non-Value Tolerance Rank-based tolerance is non-value- tolerance Rank-based tolerance is non-value- tolerance –numerical value  not used Fraction-based Tolerance Fraction-based Tolerance –False Positive F + (t): % of returned answers that are incorrect at time t –False Negative F - (t): % of correct answers not returned at time t –F + (t) ≤  + ; F - (t) ≤  -

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 10 Entity-based Queries Return sets of object ids, not numerical values [CKP03] Return sets of object ids, not numerical values [CKP03] Rank-based queries: order of stream values decides the final answer Rank-based queries: order of stream values decides the final answer –e.g., top-k query, k-nearest-neighbor query Non-rank-based queries: order of stream values is not important Non-rank-based queries: order of stream values is not important –e.g., range query Non-value tolerance matches entity-based queries! Non-value tolerance matches entity-based queries!

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 11 Continuous Query Classification

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 12 Adaptive Filter [OJW03]: Initialization Phase Constraint Assignment Unit Data Stream 1 Filter Bounds User-defined Tolerance Data Stream 2 Data Stream 3 [l 3,u 3 ] [l 2,u 2 ] [l 1,u 1 ] Answer tolerance is met as long as no update is generated Answer tolerance is met as long as no update is generated Query Processing Unit Approximate Answer

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 13 Adaptive Filter: Maintenance Phase Constraint Assignment Unit New Filter Bound User-defined Tolerance Update (v 2 >u 2 or v 2 < l 2 ) Data Stream 1 (v 1 ) Data Stream 2 (v 2 ) Data Stream 3 (v 3 ) [l 3,u 3 ] [l 2,u 2 ] [l 1,u 1 ] [l 2,u 2 ] Request Value v 3 Tolerance violated! trigger Maintenance Phase Query Processing Unit Approximate Answer Corrected Approximate Answer

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 14 Contributions Apply filter bounds to rank-based / non-rank-based queries subject to rank-based / fraction-based tolerance to reduce message costs Correctness proofs, cost analysis and experimental evaluation of each protocol

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 15 Filter Bound Protocols RTPFT-RP FT-NRP ZT-RPZT-NRP

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 16 Non-Rank-based Queries S6S6 S5S5 S2S2 S7S7 S4S4 S8S8 S1S1 S3S3 Ordered Values Answer Set Example: 1D Range Query Range = [10, 30]

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 17 Fraction-based Tolerance S6S6 S5S5 S2S2 S7S7 S4S4 S8S8 S1S1 S3S3 Range of Q = [l, u] Ordered Values Update False Positive False Negative

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 18 Fraction-based Tolerance Answer actually returned A(t) E + (t) True answer at time t |A(t)|-E + (t)E - (t) = |A(t)| - E + (t) + E - (t)

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 19 Initialization Phase –Given ε + and ε - 1.Collect current stream values 2.For streams satisfying the range query Calculate no. of streams (E max + ) that can be false positives Calculate no. of streams (E max + ) that can be false positives Assign false +ve filters [-∞, + ∞] to E max streams Assign false +ve filters [-∞, + ∞] to E max streams Assign [l,u] to remaining ones Assign [l,u] to remaining ones 3.For streams failing the range query Calculate no. of streams (E max - ) that can be false negatives Calculate no. of streams (E max - ) that can be false negatives Assign false -ve filters [+∞, +∞] to E max - streams Assign false -ve filters [+∞, +∞] to E max - streams Assign [l,u] to remaining ones Assign [l,u] to remaining ones –Tolerance is satisfied if no new updates are received At any time t without update, F + (t) ≤  + F - (t) ≤  - At any time t without update, F + (t) ≤  + F - (t) ≤  -

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 20 Maintenance Phase: Good Update S6S6 S5S5 S2S2 S7S7 S4S4 S8S8 S1S1 S3S3 Insert S 7 into A(t c ) Insert S 7 into A(t c ) F + and F - drop F + and F - drop F + (t c ) < F + (t 0 ) ≤  + F + (t c ) < F + (t 0 ) ≤  + F - (t c ) < F - (t 0 ) ≤  - F - (t c ) < F - (t 0 ) ≤  - Tolerance is met Tolerance is met time t c time t 0 Filter [l,u] Range of Q = [l, u]

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 21 Maintenance Phase: Bad Update Remove S i from A(t c ) Remove S i from A(t c ) F + (t c ) ≤  + and F - (t c ) ≤  - may not be true F + (t c ) ≤  + and F - (t c ) ≤  - may not be true Quality of answer becomes worse Quality of answer becomes worse Procedure Fix to maintain tolerance Procedure Fix to maintain tolerance S6S6 S5S5 S2S2 S4S4 S8S8 S1S1 S3S3 time t c time t 0 Filter [l,u] Range of Q = [l, u] S7S7

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 22 Fix: Consulting False Positive Filter S6S6 S5S5 S2S2 S7S7 S4S4 S8S8 S1S1 S3S3 Select stream S 4  A(t c ) with [-∞, +∞] filter Select stream S 4  A(t c ) with [-∞, +∞] filter Request S 4 for its updated value Request S 4 for its updated value If V 4  [l, u] If V 4  [l, u] –install [l, u] filter to S 4 –prove that F + (t c ) ≤  + and F - (t c ) ≤  - are satisfied If V 4  [l, u], consult a false – ve filter If V 4  [l, u], consult a false – ve filter Worst case: 5 messages Worst case: 5 messages [-∞, +∞] Filter [-∞, +∞] Range of Q = [l, u]

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 23 Filter Bound Protocols for Rank- based Queries k-NN query is a representative of NN, Min, Max k-NN query is a representative of NN, Min, Max Fraction-based tolerance / k-NN query Fraction-based tolerance / k-NN query –View a k-NN query as a range query, by using the kth nearest neighbor as the “ range ” –Adapt fraction-based tolerance/range query Rank-based tolerance / k-NN query Rank-based tolerance / k-NN query –Maintain knowledge about (k+r) th and (k+r+1) st item –Filter bound is defined by the average of the (k+r) th and (k+r+1) st item

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 24 Experiments Compare Compare –No filter is used at all –Filter protocols with zero tolerance –Our tolerance-based protocols Measure total no. of messages required for executing a continuous query Measure total no. of messages required for executing a continuous query

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 25 Experimental Setup Real Data Real Data –30 days of wide-area traces of TCP connections based on TCP trace [ITA20] Synthetic Data Synthetic Data –Generated by CSIM 18 –Data value: Uniform distribution –Fluctuation of updates: Normal distribution –Interarrival time of updates: Exponential distribution

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 26 Fraction-based Tolerance for Range Query with Real Data

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 27 Fraction-based Tolerance for Range Query with Synthetic Data

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 28 Conclusions Value-based tolerance can be difficult to specify for continuous queries in stream systems Value-based tolerance can be difficult to specify for continuous queries in stream systems Rank-based and fraction-based tolerance Rank-based and fraction-based tolerance Applied to rank- queries and non-rank- queries Applied to rank- queries and non-rank- queries Filter bound protocols translate non-value- tolerance to filter bounds Filter bound protocols translate non-value- tolerance to filter bounds Experiments illustrate protocol effectiveness Experiments illustrate protocol effectiveness Please contact Reynold Cheng for details

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 29 Contact Information Reynold Cheng Hong Kong Polytechnic University

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 30 Issues of Running Out of Filters If all false positive and false negative filters run out, the system degrades to one in which no tolerance is exploited If all false positive and false negative filters run out, the system degrades to one in which no tolerance is exploited To improve performance, initialization phase may be executed again To improve performance, initialization phase may be executed again Experiments over long-running queries Experiments over long-running queries

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 31 Long-Running Queries

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 32 Talk Outline Non-value-based Tolerance Non-value-based Tolerance Filter Bound Framework Filter Bound Framework Filter Bound for Fraction-based Tolerance for Non-rank-based Queries Filter Bound for Fraction-based Tolerance for Non-rank-based Queries Experimental Results Experimental Results Conclusions Conclusions

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 33 Talk Outline Non-value-based Tolerance Non-value-based Tolerance Filter Bound Framework Filter Bound Framework Filter Bound for Fraction-based Tolerance for Non-rank-based Queries Filter Bound for Fraction-based Tolerance for Non-rank-based Queries Experimental Results Experimental Results Conclusions Conclusions

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 34 Talk Outline Non-value-based Tolerance Non-value-based Tolerance Filter Bound Framework Filter Bound Framework Filter Bound for Fraction-based Tolerance for Non-rank-based Queries Filter Bound for Fraction-based Tolerance for Non-rank-based Queries Experimental Results Experimental Results Conclusions Conclusions

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 35 Tolerance & Filter Bounds [OJW03] User Query Processing Unit Processor Continuous Query + Tolerance Result with Error Guarantee Constraint Assignment Unit stream dropped data constraint Filter bound [l,u] Update sent only when value crosses [l,u]

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 36 Fraction-based Tolerance Answer actually returned A(t) E + (t) True answer at time t |A(t)|-E + (t)E - (t) = |A(t)| - E + (t) + E - (t)

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 37 Zero Tolerance S6S6 S5S5 S2S2 S7S7 S4S4 S8S8 S1S1 S3S3 Update

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 38 Zero-Tolerance Protocol (ZT-NRP) Given a range query [l,u] Given a range query [l,u] Initialization Phase Initialization Phase –Emit [l,u] to each stream source Maintenance Phase Maintenance Phase –For any stream source, if its value crosses [l,u], send its new value to the server –No message from server is needed Generates a lot of updates! Generates a lot of updates!

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 39 Fix: Consulting False Positive Filter S6S6 S5S5 S2S2 S7S7 S4S4 S8S8 S1S1 S3S3 [-∞, +∞] Filter [-∞, +∞]

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 40 Fix Step 2: Consulting False -ve Filter S6S6 S5S5 S2S2 S7S7 S4S4 S8S8 S1S1 S3S3 If S 4  A(t c ) If S 4  A(t c ) –remove S 4 from A(t) –Select stream S 7  A(t c ) with [+∞, +∞] filter –If V 7  [l, u], insert S 7 into answer set –install the [l, u] filter to S 7 –Prove that F + (t c ) ≤  + and F - (t c ) ≤  - are satisfied Worst case: 5 messages Worst case: 5 messages [-∞, +∞] Filter [-∞, +∞] [+∞, +∞] Filter [+∞, +∞] Range of Q = [l, u]

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 41 Fix Step 2: Consulting False -ve Filter S6S6 S5S5 S2S2 S7S7 S4S4 S8S8 S1S1 S3S3 [-∞, +∞] Filter [-∞, +∞] [+∞, +∞] Filter [+∞, +∞]

Cheng,Kao,Prabhakar,Kwan,TuAdaptive Stream Filters for Entity-based Queries with Non-Value Tolerance 42 False +ve / -ve Filters Selection Heuristic