Models and Issues in Data Streaming Presented By :- Ankur Jain Department of Computer Science 6/23/03 A list of relevant papers is available at

Slides:



Advertisements
Similar presentations
Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.
Advertisements

Energy Efficient Data Collection In Distributed Sensor Environments Qi Han, Sharad Mehrotra, Nalini Venkatasubramanian {qhan, sharad,
Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.
1 11. Streaming Data Management Chapter 18 Current Issues: Streaming Data and Cloud Computing The 3rd edition of the textbook.
Distributed Top-K Monitoring. Outline Introduction Related work Algorithm for distributed Top-K monitoring Experiments Summary.
A Data Stream Management System for Network Traffic Management Shivnath Babu Stanford University Lakshminarayanan Subramanian Univ. California, Berkeley.
1 Routing Techniques in Wireless Sensor networks: A Survey.
1 Distributed Adaptive Sampling, Forwarding, and Routing Algorithms for Wireless Visual Sensor Networks Johnsen Kho, Long Tran-Thanh, Alex Rogers, Nicholas.
1 Improving the Performance of Distributed Applications Using Active Networks Mohamed M. Hefeeda 4/28/1999.
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Source-Adaptive Multilayered Multicast Algorithms for Real- Time Video Distribution Brett J. Vickers, Celio Albuquerque, and Tatsuya Suda IEEE/ACM Transactions.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Adaptive Sampling for Sensor Networks Ankur Jain ٭ and Edward Y. Chang University of California, Santa Barbara DMSN 2004.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.
Adaptive Sampling in Distributed Streaming Environment Ankur Jain 2/4/03.
OSMOSIS Final Presentation. Introduction Osmosis System Scalable, distributed system. Many-to-many publisher-subscriber real time sensor data streams,
1 PODS 2002 Motivation. 2 PODS 2002 Data Streams data sets Traditional DBMS – data stored in finite, persistent data sets data streams New Applications.
Extending Network Lifetime for Precision-Constrained Data Aggregation in Wireless Sensor Networks Xueyan Tang School of Computer Engineering Nanyang Technological.
SIGMOD'061 Energy-Efficient Monitoring of Extreme Values in Sensor Networks Adam Silberstein Kamesh Munagala Jun Yang Duke University.
Adaptive Stream Resource Management Using Kalman Filters Aug UCLA DB seminar.
Energy-efficient Self-adapting Online Linear Forecasting for Wireless Sensor Network Applications Jai-Jin Lim and Kang G. Shin Real-Time Computing Laboratory,
Top-k Monitoring in Wireless Sensor Networks Minji Wu, Jianliang Xu, Xueyan Tang, and Wang-Chien Lee IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
Peer-to-peer file-sharing over mobile ad hoc networks Gang Ding and Bharat Bhargava Department of Computer Sciences Purdue University Pervasive Computing.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
Packet Filtering. 2 Objectives Describe packets and packet filtering Explain the approaches to packet filtering Recommend specific filtering rules.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
1 Secure Cooperative MIMO Communications Under Active Compromised Nodes Liang Hong, McKenzie McNeal III, Wei Chen College of Engineering, Technology, and.
Processing Monitoring Queries on Mobile Objects Lecture for COMS 587 Department of Computer Science Iowa State University.
Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.
Firewall and Internet Access Mechanism that control (1)Internet access, (2)Handle the problem of screening a particular network or an organization from.
HERO: Online Real-time Vehicle Tracking in Shanghai Xuejia Lu 11/17/2008.
Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,
Power Save Mechanisms for Multi-Hop Wireless Networks Matthew J. Miller and Nitin H. Vaidya University of Illinois at Urbana-Champaign BROADNETS October.
Higashino Lab. Maximizing User Gain in Multi-flow Multicast Streaming on Overlay Networks Y.Nakamura, H.Yamaguchi and T.Higashino Graduate School of Information.
Patch Based Mobile Sink Movement By Salman Saeed Khan Omar Oreifej.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.
Benjamin AraiUniversity of California, Riverside Reliable Hierarchical Data Storage in Sensor Networks Song Lin – Benjamin.
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,
SENSOR NETWORKS BY Umesh Shah Mayuresh Patil G P Reddy GUIDES Prof U.B.Desai Prof S.N.Merchant.
Energy-Efficient Monitoring of Extreme Values in Sensor Networks Loo, Kin Kong 10 May, 2007.
Adaptive Query Processing in Data Stream Systems Paper written by Shivnath Babu Kamesh Munagala, Rajeev Motwani, Jennifer Widom stanfordstreamdatamanager.
Data Stream Management Systems
Secure In-Network Aggregation for Wireless Sensor Networks
Multiuser Receiver Aware Multicast in CDMA-based Multihop Wireless Ad-hoc Networks Parmesh Ramanathan Department of ECE University of Wisconsin-Madison.
Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, ESRI Vana Kalogeraki, AUEB
Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.
1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard.
Adaptivity in continuous query systems Luis A. Sotomayor & Zhiguo Xu Professor Carlo Zaniolo CS240B - Spring 2003.
On Optimal Geographic Routing in Wireless Networks with Holes and Non-Uniform Traffic Sundar Subramanian, Sanjay Shakkottai and Piyush Gupta INFOCOM 2007.
@ Carnegie Mellon Databases 1 Finding Frequent Items in Distributed Data Streams Amit Manjhi V. Shkapenyuk, K. Dhamdhere, C. Olston Carnegie Mellon University.
A Bandwidth Scheduling Algorithm Based on Minimum Interference Traffic in Mesh Mode Xu-Yajing, Li-ZhiTao, Zhong-XiuFang and Xu-HuiMin International Conference.
Offering a Precision- Performance Tradeoff for Aggregation Queries over Replicated Data Paper by Chris Olston, Jennifer Widom Presented by Faizaan Kersi.
SERENA: SchEduling RoutEr Nodes Activity in wireless ad hoc and sensor networks Pascale Minet and Saoucene Mahfoudh INRIA, Rocquencourt Le Chesnay.
Distributed Localization Using a Moving Beacon in Wireless Sensor Networks IEEE Transactions on Parallel and Distributed System, Vol. 19, No. 5, May 2008.
KAIS T Location-Aided Flooding: An Energy-Efficient Data Dissemination Protocol for Wireless Sensor Networks Harshavardhan Sabbineni and Krishnendu Chakrabarty.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
Authors: Jiang Xie, Ian F. Akyildiz
Department of Computer Science University of California,Santa Barbara
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Supporting Fault-Tolerance in Streaming Grid Applications
Congestion Control, Internet Transport Protocols: UDP
Yiannis Andreopoulos et al. IEEE JSAC’06 November 2006
Resource Allocation for Distributed Streaming Applications
An Analysis of Stream Processing Languages
Presentation transcript:

Models and Issues in Data Streaming Presented By :- Ankur Jain Department of Computer Science 6/23/03 A list of relevant papers is available at

Outline Data Streaming Introduction to Data Streaming and Sensor Data Introduction to Data Streaming and Sensor Data Challenges Challenges Common Applications Common Applications Active research projects in the area Active research projects in the area Common models and issues Common models and issues Adaptive Filters for Continuous Queries over Distributed Data Streams – {Widom, Olston and Jiang} Presentation of the algorithm Presentation of the algorithm Experimental results and Conclusions Experimental results and Conclusions

Data Streams Data input as continuous and ordered data streams Characteristics of Data Streams Characteristics of Data Streams Sequential access Sequential access Bounded main memory Bounded main memory History/arrival order is significant History/arrival order is significant Applications may have real time requirements Applications may have real time requirements Unpredictable/variable data arrival and characteristics Unpredictable/variable data arrival and characteristics Imprecise/noisy data Imprecise/noisy data Continuous queries (CQ) Continuous queries (CQ)

Challenges Application stream rates exceed DBMS capacity Management of high stream rates using efficient and adaptive sampling techniques Building efficient query plans Optimize memory/disk usage

Applications Network monitoring and traffic engineering Telecom call records Caller ID’s, call destinations Caller ID’s, call destinations Network Security Packet source/destination addresses Packet source/destination addresses Financial Applications Stock Tickers Stock Tickers Sensor Networks Monitoring temperature, sound, light, motion tracking Monitoring temperature, sound, light, motion tracking Manufacturing processes Process monitoring Process monitoring Web logs and click streams Massive data sets Sensor Networks

Issues in Sensor Data Streams Communication High variance, limited bandwidth, frequently dropped packets High variance, limited bandwidth, frequently dropped packetsComputation Limited computational capacity and limited memory size Limited computational capacity and limited memory size Uncertainty in sensor readings Low Power Capacity

Data Stream Projects The Cougar Project (Cornell) Sensors form a distributed database system Sensors form a distributed database system Cross-layer optimizations (data management layer and the routing layer) Cross-layer optimizations (data management layer and the routing layer) Telegraph Project (Berkeley) Adaptive routing between the sensor nodes Adaptive routing between the sensor nodes Adaptive query processing Adaptive query processing STREAM (STanford stREam dAta Manager) (Stanford) Building a new data stream management system Building a new data stream management system TRAPP (Tradeoff in Replication Precision and Performance) TRAPP (Tradeoff in Replication Precision and Performance) Have adaptive filters at remote sources Constructing adaptive query plans Constructing adaptive query plans Optimize communication costs Optimize communication costs Aurora Project (Brown/MIT)

Interesting research areas.. The main issues in data streaming appear to be :- Data integration/fusion Data integration/fusion Adaptive data filtering Adaptive data filtering Some relevant work has been done (from STREAM project to be presented shortly) Managing data on non-linear models Managing data on non-linear models No relevant work done so far Most of the projects consider simple stream problems such as traffic monitoring or stock tickers Consider the problem of data integration from multiple cameras in a parking lot Consider the problem of data integration from multiple cameras in a parking lot Monitoring traffic on a stretch of a freeway Monitoring traffic on a stretch of a freeway Monitoring enemy activity over an area using multiple cameras. Monitoring enemy activity over an area using multiple cameras. Need for adaptive sampling where rapid updates are available from interesting areas such as :- Areas where there is increased enemy activity Areas where there is increased enemy activity Freeway stretch where there is unusually slow/fast traffic Freeway stretch where there is unusually slow/fast traffic Haphazard/suspicious vehicle movement in a parking lot Haphazard/suspicious vehicle movement in a parking lot

Adaptive Filters for Continuous Queries over Distributed Data Streams Appeared in Sigmod 2003 Chris Olston, Jing Jiang and Jennifer Widom Stanford University

Environment in Consideration Some applications do not require exact precision for their queries. Distributed sources (sensors) at remote locations continuously update streams to a central stream processor Users register continuous queries (CQ) with the central processor with quantitative precision constraints The central processor installs filters at remote locations with bound widths depending on the given precision constraint

Goals Reduce the communication overhead incurred in the presence of rapid stream updates Trade precision for communication overhead at a fine granularity The filters should have the capability to adapt to changing conditions to minimize stream rates

Example Applications Wireless Sensor Networks Monitoring environmental conditions such as light, temperature, sound etc. Monitoring environmental conditions such as light, temperature, sound etc. Stock quote services Network Traffic Monitoring Network packet arrival logs at router level Network packet arrival logs at router level Online Auctions Wide Area resource accounting Load Balancing for replicated servers

A bounded approximate answer is a pair of real values L and H that define an interval [L,H] A precision constraint δ ≥ 0 for a CQ is defined such that 0 ≤ H – L ≤ δ at all times For each remote object O the filter maintains a bound [L o,H o ] of width W O If V is the latest value for O that passed the filter then L o := V – W O / 2 and H o := V + W O / 2 The central stream processor keeps a cached copy of [L o,H o ] based on filtered updates from O’s source Overview

Data Sources V 1 updates V 2 updates V n updates.. Filters Bound Shrinking [L 1, H 1 ].. Bound Shrinking [L n, H n ] CQ Evaluator Stream Processor [L 1, H 1 ] [L i, H i ] … [L n, H n ] Bound Cache Precision Manager Bound shrinking Selective growing Intercepts update streams, and forwards those that fall outside its bound Bounded Answers Registers Queries Queries + precision constraints Generates streams of updates THE SYSTEM Maintains copy of bound for each object updates Periodically shrinking bound Reallocates bound width and sends growth messages updates User

Algorithm Details Initially the bounds can be set in anyway as long as they meet the precision constraints. (e.g. by uniform allocation) The bounds are reallocated adaptively among the objects participating in each query (bound shrinking and selective growing)

Bound Shrinking (Algo. details cont..) Periodically, every T time units, O i ‘s bound width is decreased symmetrically at both the source and the stream coordinator as W i = W i (1 – S), where T (adjustment period) and S (shrink percentage) are determined experimentally Each time the bound width shrinks, the source must reapply the filter to the current data value V i. If this value does not pass the filter the source must put it on the update stream.

Bound Growing (Algo. details cont..) Each object is assigned a burden score B i based on its stream transmission cost C i, estimated stream update period P i and the current bound width W i. Each query is assigned a burden target T i by either averaging burden scores or invoking linear solver A deviation value D i is based on difference between burden score and burden target The objects are considered in decreasing deviation and each object is assigned the maximum possible bound growth ∆W i

Burden Score and Burden Target (Algo. details cont..) The burden score B i is computed as B i = C i / (P i. W i ) C i is the cost to send a stream update of object O i, W i is the bound width C i is the cost to send a stream update of object O i, W i is the bound width P i = T / N i, N i is the number of updates of O i received by the stream coordinator in the last T time units P i = T / N i, N i is the number of updates of O i received by the stream coordinator in the last T time units The burden target T i is the lowest overall burden required of the objects in the query at all times. For simple cases it is equal to the average of the burden scores of objects in the query Deviation

Maximum bound growth (Algo. details cont..) The maximum possible amount by which the bound can be grown is For each nonzero growth value, the precision manager increases the width for O i by setting L i := L i - ∆W i / 2 and H i := H i + ∆W i / 2 After all the growth has been allocated the precision manager sends update messages to all sources whose bound width has been modified (grown)

Precision Constraint Adjustments and Latency (Algo. details cont..) If δ j increases then the additional bound width is allocated automatically by the bound growth algorithm If δ j decreases (stronger precision) then the automatic bound shrinking will reduce the answer bound until the requested precision level is reached. For immediate improvement the precision manager needs to the send explicit shrink messages Source filters timestamps all updates transmitted to the stream processor The precision manager timestamps all bound width updates with an adjustment period boundary

Experiments The performance of the proposed model was tested for the Network traffic volumes which are of interest for ISP’s for security, billing infrastructure planning. Some example queries include :- Q 1 Monitor the volume of remote login request Q 1 Monitor the volume of remote login request Q 2 Monitor the volume of incoming traffic received within the organization Q 2 Monitor the volume of incoming traffic received within the organization Q 3 Monitor the volume of incoming SYN packets Q 3 Monitor the volume of incoming SYN packets

Results Comparison of overall communication cost (does not include growth message communication costs) incurred by the adaptive algorithm against the uniform static allocation measuring cost for 21hrs. The CQ monitors the average traffic level with varying precision constraint δ

Results (cont …) Results of comparing the idealized version of the proposed algorithm against the optimized static allocation, using a continuous AVG query over 10 data sources under uniform costs

Conclusions Experimental results show that the proposed approach saves communication cost at fine granularity by individually adjusting precision constraints The experiments were based on simple examples of network traffic with a few hosts. The values of S and T were determined experimentally. Effect of variation of T on the on quality of answers is not available. Evaluating S experimentally, may not be feasible in all cases The streamed update period P i = T / N i takes into consideration only the updates in the last T time units. Considering the complete history of updates (Kalman filter) might show interesting results !