Qi Han, Matthew Ba Nguyen Sandy Irani, Nalini Venkatasubramanian

Slides:



Advertisements
Similar presentations
Energy Efficient Data Collection In Distributed Sensor Environments Qi Han, Sharad Mehrotra, Nalini Venkatasubramanian {qhan, sharad,
Advertisements

Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms Chenyang Lu, John A. Stankovic, Gang Tao, Sang H. Son Presented by Josh Carl.
Dialogue Policy Optimisation
Hadi Goudarzi and Massoud Pedram
Best-Effort Top-k Query Processing Under Budgetary Constraints
Fast Algorithms For Hierarchical Range Histogram Constructions
Bidding Protocols for Deploying Mobile Sensors Reporter: Po-Chung Shih Computer Science and Information Engineering Department Fu-Jen Catholic University.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
A Cost Driven Approach to Information Collection for Mobile Environments Qi Han Nalini Venkatasubramanian Department of Information and Computer Science.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
Extending Network Lifetime for Precision-Constrained Data Aggregation in Wireless Sensor Networks Xueyan Tang School of Computer Engineering Nanyang Technological.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Mariam Salloum (YP.com) Xin Luna Dong (Google) Divesh Srivastava (AT&T Research) Vassilis J. Tsotras (UC Riverside) 1 Online Ordering of Overlapping Data.
Experimental Evaluation
Impact of Problem Centralization on Distributed Constraint Optimization Algorithms John P. Davin and Pragnesh Jay Modi Carnegie Mellon University School.
End-to-End Delay Analysis for Fixed Priority Scheduling in WirelessHART Networks Abusayeed Saifullah, You Xu, Chenyang Lu, Yixin Chen.
DEXA 2005 Quality-Aware Replication of Multimedia Data Yicheng Tu, Jingfeng Yan and Sunil Prabhakar Department of Computer Sciences, Purdue University.
Virtual Network Mapping: A Graph Pattern Matching Approach Yang Cao 1,2, Wenfei Fan 1,2, Shuai Ma University of Edinburgh 2 Beihang University.
Network Aware Resource Allocation in Distributed Clouds.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Optimal Scheduling of File Transfers with Divisible Sizes on Multiple Disjoint Paths Mugurel Ionut Andreica Polytechnic University of Bucharest Computer.
Topology aggregation and Multi-constraint QoS routing Presented by Almas Ansari.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 A Statistics-Based Sensor Selection.
Quality of Service Karrie Karahalios Spring 2007.
Fair Class-Based Downlink Scheduling with Revenue Considerations in Next Generation Broadband wireless Access Systems Bader Al-Manthari, Member, IEEE,
Scheduling policies for real- time embedded systems.
ENERGY-EFFICIENT FORWARDING STRATEGIES FOR GEOGRAPHIC ROUTING in LOSSY WIRELESS SENSOR NETWORKS Presented by Prasad D. Karnik.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
OPERETTA: An Optimal Energy Efficient Bandwidth Aggregation System Karim Habak†, Khaled A. Harras‡, and Moustafa Youssef† †Egypt-Japan University of Sc.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Kaleidoscope – Adding Colors to Kademlia Gil Einziger, Roy Friedman, Eyal Kibbar Computer Science, Technion 1.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
SCALABLE INFORMATION-DRIVEN SENSOR QUERYING AND ROUTING FOR AD HOC HETEROGENEOUS SENSOR NETWORKS Paper By: Maurice Chu, Horst Haussecker, Feng Zhao Presented.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Offering a Precision- Performance Tradeoff for Aggregation Queries over Replicated Data Paper by Chris Olston, Jennifer Widom Presented by Faizaan Kersi.
Mingze Zhang, Mun Choon Chan and A. L. Ananda School of Computing
Prof. Yu-Chee Tseng Department of Computer Science
TEMPLE UNIVERSITY Deadline-Sensitive Mobile Data Offloading via Opportunistic Communications Guoju Gaoa, Mingjun Xiao∗a, Jie Wub, Kai Hana, Liusheng Huanga.
Authors: Jiang Xie, Ian F. Akyildiz
Delay-Tolerant Networks (DTNs)
Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics
Introduction to Wireless Sensor Networks
A paper on Join Synopses for Approximate Query Answering
Location Cloaking for Location Safety Protection of Ad Hoc Networks
Copyright © Cengage Learning. All rights reserved.
Drum: A Rhythmic Approach to Interactive Analytics on Large Data
16th International World Wide Web Conference Speeding up Adaptation of Web Service Compositions Using Expiration Times John Harney, Prashant Doshi LSDIS.
Algorithm Analysis CSE 2011 Winter September 2018.
Server Allocation for Multiplayer Cloud Gaming
Period Optimization for Hard Real-time Distributed Automotive Systems
Lottery Scheduling Ish Baid.
Spatial Online Sampling and Aggregation
StreamApprox Approximate Computing for Stream Analytics
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Instructor: Shengyu Zhang
Robustness of wireless ad hoc network topologies
Robustness of wireless ad hoc network topologies
MURI Kickoff Meeting Randolph L. Moses November, 2008
Managing Economies of Scale in a Supply Chain Cycle Inventory
CS-447– Computer Architecture Lecture 20 Cache Memories
Continuous Density Queries for Moving Objects
Hash Functions for Network Applications (II)
Objectives 6.1 Estimating with confidence Statistical confidence
Objectives 6.1 Estimating with confidence Statistical confidence
Presentation transcript:

Time Sensitive Computation of Aggregate Functions over Distributed Imprecise Data Qi Han, Matthew Ba Nguyen Sandy Irani, Nalini Venkatasubramanian Distributed Systems Middleware Group http://www.ics.uci.edu/~dsm School of Information and Computer Science University of California-Irvine

Motivation Real time applications often make decisions based on timely results aggregated over distributed data Example: real-time fault localization and problem diagnosis How many nodes are overloaded? (count) What is the total latency along the path N1N2    Nk? (sum) What is the bottleneck (minimum link bandwidth) along a path N1N2    Nk? (min) Recent advances in communication, mobile computing and embedded systems have enabled a variety of real-time distributed applications such as stock information exchange, environmental sensing etc. These applications often make decisions based on timely results aggregated over distributed data from varying source. For example, as distributed systems and networks continue to grow in size and complexity, real-time fault localization becomes more challenging. It is crucial to build an active real time information gathering system that can ask the right question at the right time.

timeliness, accuracy and cost-effectiveness Challenges Continuous stream of fast changing source data Diverse user requirements in terms of data accuracy and service timeliness Effective utilization of underlying computation, communication and storage resources competing goals of timeliness, accuracy and cost-effectiveness Providing a real-time information architecture poses several challenges. Firstly, information sources provide a continuous stream of data that can dynamically vary over time. The information may need to be captured and stored rapidly and accurately. Secondly, users requiring access to this data present diverse requirements in terms of accuracy of the data and timeliness of the service. Thirdly, the collection should be done in an unobtrusive manner. Therefore, we are faced with the competing goals of timeliness, accuracy and cost. Fortunately, many applications are willing to tolerate information imprecision and bounded delivery latency. We would like to exploit these accuracy and latency margins to ensure that most applications receive information at the desired level of accuracy and timeliness while minimizing resource consumption.

This paper… Previous research This paper addresses Tradeoff between accuracy and cost storing data in ranges Tradeoffs between accuracy, cost and timeliness (single data item) – RTSS’2003 This paper addresses Tradeoffs between accuracy, cost and timeliness for aggregate functions (count, sum, min) Probing part of the sources might be sufficient How the server selects an appropriate subset of sources to probe so that the overall probing cost is minimized without violating accuracy and timeliness requirements of aggregate queries Given the data intensive nature of the system, a database is a must for the system to function efficiently. Range-based approach… Directly applying the approaches to distributed real time applications will not be effective. Since queries will not only have accuracy constraints, but also time constraints which specify the latest time by which the results of an aggregate query are expected to be available. This paper complements previous work in RTSS 2003 by addressing how the server… Specifically,

Problem Characterization s1: E1 [L1,U1] s2: E2 [L2,U2] si: Ei [Li,Ui] Sn: En [Ln,Un] …... server c1/d1 c2/d2 ci/di cn/dn Database [L1,U1],[L2,U2] … [Ln,Un] Queries: f(s1,s2,…,si…,sn), A, D A: accuracy constraint D: time constraint Example: min(s1,s2…,sn), 3, 1 minute System model: Source model: Query model: T_f varies (depending on whether it is parallel or sequential probing) After probing l=u=e Cost and latency in probing each source vary A: the actual value differs from the returned answer by at most A source 1 source 2 source i source n

Time-sensitive Computation of Aggregate Functions Compute the function based on stored values in the database If the answer meets accuracy constraint: done Otherwise: select probing set Only consider SD: the subset of sources whose diD Batch selection The entire set of sources to probe is selected before the probes actually occur Iterative selection The source is probed one at a time In order to achieve the goal of minimizing the probing cost under time and accuracy constraints of user queries, we compute the function based on stored approximations in the database. If the answer does not satisfy the accuracy constraints of the user request, we decide on a set of sources to probe for exact values in order to improve the answer precision. Two basic approaches to probing set selection can be applied. In batch selection, the precision constraint must be guaranteed for any possible precise values for the sources in the probing set; iterative selection is an online approach, the function is evaluated every time after a source is probed and stopped when the answer is precise enough. The answer gradually refines to be more precise over time. In this case, the goal is to shrink the answer as fast as possible.

Batch Selection of Source Probing Set For Count (Batch_COUNT) Problem: calculate the number of source values that fall inside the range r=[l,u]: fcount=|{si|si[l,u]}| l r u Inside s1 s2 Outside s3 s4 s5 Uncertain s6 Algorithm: if |U|>A: we must probe |U|-A sources to determine the function within the desired accuracy If |USD||U|-A: order the sources in |USD| according to increasing cost and probe the first |U|-A sources in this ordering o/w: cannot determine the the function to within the desired accuracy We can divide the sources into three sets: Inside, outside and uncertain.

Batch Selection of Source Probing Set for Sum (BATCH_SUM) If we compute the function based on the stored intervals in the database, then the smallest possible sum occurs when all values are the lower bounds, and the largest possible sum occurs when all values are the upper bounds.

Batch Selection of Source Probing Set for Min (BATCH_MIN) Example:

Issues in Computing min In the case of computing count or sum Know in advance exactly the benefit of probing any particular source, thus can decide in advance which sources to probe count: probing any source decreases the uncertainty by 1 sum: probing source si decreases the uncertainty by ui-li In the case of computing min The number of probes required may vary depending on the values of the sources Example: s1=[0,5], s2=[1,6], A=1 If probe s1 and e1=2, then min=[1,2], done If probe s1 and e1=5, then min=[1,5], so s2 must be probed

Iterative Selection of Source Probing Set for Min BATCH_MIN is a worst case analysis which assumes that the values returned always maximize the remaining uncertainty. Now we consider an average-case approach in which we assume that the value of each source is distributed uniformly over its range.

Performance Evaluation Baseline policies compared with GREEDY: probe all LAZY: probe none RANDOM Performance metrics Cost Accuracy ratio (a/A): measures how close the answer interval a matches the accuracy constraint A Latency ratio (d/D): measure how close the time d spent answering the query matches the time constraint D Accuracy satisfaction ratio: the percentage of queries with their accuracy constraints met Deadline satisfaction ratio: the percentage of queries with their time constraints met When a/A<=1, the accuracy requirement is met; the smaller, the more accurate.

Basic Performance Results for Computing Count Not surprisingly, GREEDY achieves the best answer accuracy at the price of highest cost and latency. In contrast, LAZY provides the most coarse answer instantly by not probing any sources. BATCH exhibits similar answer accuracy and latency to RANDOM with slightly lower probing cost. However, more queries meet their deadlines by using BATCH. This is because BATCH gives higher priority to time constraints than accuracy constraints, I.e., the best possible answer (in terms of accuracy) is provided only if the deadline is met.

Basic Performance Results for Computing Min Comparing to RANDOM, the deadline satisfaction ratio of BATCH is much higher than RANDOM, since it does not probe those sources whose probing latency is higher than D. Iterative selection probes sources sequentially, the latency of a query is the sum of each probing latency. Given the same D the number of sources to be probed is decreased than BATCH, which leads to higher accuracy ratio (less accurate result) and lower accuracy satisfaction ratio.

Performance of Computing Count under varying accuracy constraints When A is small, only probing sources in S_D cannot provide a satisfactorily accurate answer, that is why the first part of the curve is horizontal; When A increases, fewer probings can provide accurate answer.

Performance of Computing Count under varying time constraints The several turning points in the curve of the probing cost matches exactly the several stages of the algorithms. We only probe S_D. Smaller deadline leads to small S_D, therefore probing cost increases as deadline increases. At the same time, the accuracy ratio decreases but still larger than 1. When the deadline reaches a point where we can select a subset of S-D to probe, the probing cost is decreased since we select those with smaller probing costs. When the deadline increases further, no more improvement will be obtained since the same subset of sources will be probed to meet the accuracy and time constraints.

Conclusions The worst case analysis (batch selection algorithms) provides a bound on the cost to satisfy queries regardless of the exact values of sources More sophisticated models such as Gaussian distribution can be used to capture the change of source values Also interesting to conduct competitive analysis of algorithms for answering aggregate queries Worst case analysis assumes that the values returned always maximize the remaining uncertainty