Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data PAPER PRESENTATION.

Slides:



Advertisements
Similar presentations
A Construction of Locality-Aware Overlay Network: mOverlay and Its Performance Found in: IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 22, NO.
Advertisements

Data and Computer Communications
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Hadi Goudarzi and Massoud Pedram
BY PAYEL BANDYOPADYAY WHAT AM I GOING TO DEAL ABOUT? WHAT IS AN AD-HOC NETWORK? That doesn't depend on any infrastructure (eg. Access points, routers)
Copyright 2004 Koren & Krishna ECE655/DataRepl.1 Fall 2006 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing.
Playback delay in p2p streaming systems with random packet forwarding Viktoria Fodor and Ilias Chatzidrossos Laboratory for Communication Networks School.
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Receiver-driven Layered Multicast S. McCanne, V. Jacobsen and M. Vetterli University of Calif, Berkeley and Lawrence Berkeley National Laboratory SIGCOMM.
Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon.
1 Routing Techniques in Wireless Sensor networks: A Survey.
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
Gossip Scheduling for Periodic Streams in Ad-hoc WSNs Ercan Ucan, Nathanael Thompson, Indranil Gupta Department of Computer Science University of Illinois.
Latency-sensitive hashing for collaborative Web caching Presented by: Xin Qi Yong Yang 09/04/2002.
The War Between Mice and Elephants LIANG GUO, IBRAHIM MATTA Computer Science Department Boston University ICNP (International Conference on Network Protocols)
Measurements of Congestion Responsiveness of Windows Streaming Media (WSM) Presented By:- Ashish Gupta.
Monday, June 01, 2015 ARRIVE: Algorithm for Robust Routing in Volatile Environments 1 NEST Retreat, Lake Tahoe, June
Receiver-driven Layered Multicast S. McCanne, V. Jacobsen and M. Vetterli SIGCOMM 1996.
Small-world Overlay P2P Network
The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP
1 Complexity of Network Synchronization Raeda Naamnieh.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Ashish Gupta Under Guidance of Prof. B.N. Jain Department of Computer Science and Engineering Advanced Networking Laboratory.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
Adaptive Push-Pull: Disseminating Dynamic Web Data Pavan Deolasee, Amol Katkar, Krithi,Ramamritham Indian Institute of Technology Bombay Dept. of CS University.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
Experimental Evaluation
Ns Simulation Final presentation Stella Pantofel Igor Berman Michael Halperin
Receiver-driven Layered Multicast Paper by- Steven McCanne, Van Jacobson and Martin Vetterli – ACM SIGCOMM 1996 Presented By – Manoj Sivakumar.
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
Communication (II) Chapter 4
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
Network Aware Resource Allocation in Distributed Clouds.
1 Resilient and Coherence Preserving Dissemination of Dynamic Data Using Cooperating Peers Shetal Shah, IIT Bombay Kirthi Ramamritham, IIT Bombay Prashant.
CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.
Scheduling policies for real- time embedded systems.
Benjamin AraiUniversity of California, Riverside Reliable Hierarchical Data Storage in Sensor Networks Song Lin – Benjamin.
Distributed Maintenance of Cache Freshness in Opportunistic Mobile Networks Wei Gao and Guohong Cao Dept. of Computer Science and Engineering Pennsylvania.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
1 SmartGossip: An Adaptive Broadcast Service for Wireless Sensor Networks Presented By Thomas H. Hand Duke University Adapted from: “ SmartGossip: An Adaptive.
Communication Paradigm for Sensor Networks Sensor Networks Sensor Networks Directed Diffusion Directed Diffusion SPIN SPIN Ishan Banerjee
2007/03/26OPLAB, NTUIM1 A Proactive Tree Recovery Mechanism for Resilient Overlay Network Networking, IEEE/ACM Transactions on Volume 15, Issue 1, Feb.
Computer Science CSC 774 Adv. Net. Security1 Presenter: Tong Zhou 11/21/2015 Practical Broadcast Authentication in Sensor Networks.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Rate-Based Channel Assignment Algorithm for Multi-Channel Multi- Rate Wireless Mesh Networks Sok-Hyong Kim and Young-Joo Suh Department of Computer Science.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
An Energy Efficient MAC Protocol for Wireless LANs, E.-S. Jung and N.H. Vaidya, INFOCOM 2002, June 2002 吳豐州.
1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.
Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
SERENA: SchEduling RoutEr Nodes Activity in wireless ad hoc and sensor networks Pascale Minet and Saoucene Mahfoudh INRIA, Rocquencourt Le Chesnay.
1 Roie Melamed, Technion AT&T Labs Araneola: A Scalable Reliable Multicast System for Dynamic Wide Area Environments Roie Melamed, Idit Keidar Technion.
Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal.
Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.
William Stallings Data and Computer Communications
Kevin Lee & Adam Piechowicz 10/10/2009
Dissemination of Dynamic Data on the Internet
Algorithms for Selecting Mirror Sites for Parallel Download
Presentation transcript:

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data PAPER PRESENTATION on An Efficient and Resilient Approach to Filtering & Disseminating Streaming Data CMPE 521 Database Systems Prepared by: Mürsel Taşgın Onur Kardeş

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction The internet and the web are increasingly used to disseminate fast changing data. Several examples for fast changing data: sensors, traffic and weather information, stock prices, sports scores, health monitoring information

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction The properties of this data: Highly dinamic, Streaming, Aperiodic. Users are interested in not only monitoring streaming data but in also using it for on-line decision making.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction SOURCE Repository 1 Repository 2 Replicating the Source Repository 3

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Services like Akamai.net and IBM’s edge server technology are exemplars of such networks of repositories, which aim to provide better services by shifting most of the work to the edge of the network (closer to the end users). But, although such systems scale quite well, if the data is changing at a fast rate, the quality of service at a repository farther from the data source would deteriorate.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction In general; Replication can reduce the load on the sources, But, replication of time-varying data introduces new challenges: Coherency Delays and scalability

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Coherency requirement (cr) : Coherency requirement (cr) : Users specify the bound on the tolerable imprecision associated with each requested data item. SOURCE Microsoft : $60,85 at time : 11:43 Repository 2 Microsoft : $60,86 at time : 11:41 Repository 1 Microsoft : $60,89 at time : 11:36 USER 1 USER 2

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Coherency-preserving system: the delivered data must preserve associated coherency requirements, resilient to failures, efficient. Necessary changes are pushed to the users; instead of polling the source independently.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Construction of an effective dissemination network of repositories A logical overlay network of repositories are created according to: coherency needs of users attached to each repository expected delays at each repository this network is called dynamic data dissemination graph (d 3 g).

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Construction of an effective dissemination network of repositories The previous algorithm called LeLA, for d 3 g, was unable to cope with large number of data. A new algorithm (DiTA) to build dissemination networks that are scalable and resilient, is introduced.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Construction of an effective dissemination network of repositories In DiTA, repositories with more stringent coherency requirements are placed closer to the source in the network as they are likely to get more updates than the ones with looser coherency requirements. In DiTA, a dynamic data dissemination tree, d 3 g, is created for each data item, x.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction SOURCE Repository 1 c = 0.2 Repository 2 c = 0.3 Repository 3 c = 0.8 Repository 4 c = 0.7 Repository 5 c = 0.9 Repository 6 c = 0.7 Construction of an effective dissemination network of repositories

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Provision for the dissemination of dynamic data in spite of failures in the overlay network to handle repository and communication link failures; back-up parents are used. back-up parent is asked to deliver data with coherency that is less stringent than that associated with the parent.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Provision for the dissemination of dynamic data in spite of failures in the overlay network x,y,z,ta,b,c,x zy,z,tx,t Parent Back-up Parent

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Efficient filtering and scheduling techniques for repositories normally a repository receives updates and selectively disseminates them to its downstreams. it is not always necessary to disseminate the exact values of the most recent updates, as long as the values presented preserve the coherency of the data.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data The Basic Framework: Data Coherency and Overlay Network

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data The Basic Framework: Data Coherency and Overlay Network a coherency requirement (c) is associated with a data item, to denote the maximum permissible deviation of the user’s view from the value of data x at the source. c can be specified in terms of; time (values should never be out-of-sync by more than 5sec.) value (weather information where the temperature value should never be out-of-sync by more than 2 degrees).

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data The Basic Framework: Data Coherency and Overlay Network Each data item in the repository from which a user obtains data must be refreshed in such a way that the user-specified coherency requirements are maintained. fidelity f observed by a user can be defined to be the total length of time for which the above inequality holds U x (t) – S x (t) ≤ c 1 P x (t) – S x (t) ≤ c 2

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data The Basic Framework: Data Coherency and Overlay Network Assume x is served by a single source Repositories R 1,....,R n are interested in x. These repositories in turn serve a subset of the remaining repositories such that the resulting network is in the form a tree rooted at the source and consisting of repositories R 1,....,R n. Parent  dependent relationship.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data The Basic Framework: Data Coherency and Overlay Network Since the repository disseminates updates to its users and dependents, the coherency requirement of a repository should be the most stringent requirement that it has to serve. When a data change occurs at the source, it checks which of its direct and indirect dependents are interested in the change and pushes the change to them.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d 3 t Start with a physical layout of the communication network in the form of a graph, where the graph consists of a set of sources, repositories and the underlying network. Try to build a d 3 t for a data item x. The root of the d 3 t will be the source, which serves x. A repository P serving repository Q with data item x, is called the parent of Q; and Q is called the dependent of P for x.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d 3 t Source for data item x USERS R1 R2 Parent Dependents Level 0 Level 1 Level 2 in each repository;

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t A repository should ideally serve at least as many unique pairs as the number of data items served to it. If a repository is currently serving less than this fixed number, then we say that the repository has the resources to serve a new dependent. R1 Dependent Data Item R7 x R11 y R18 x R9 z R10 t R21 x ?

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t R4 c=0.1 R7 c=0.8 R5 c=0.4 R9 c=0.7 R8 c=0.6 SOURCE R6 c=0.5 R10 c=0.3 Enough resources? Max(c)=0.8 Max(c)=0.7 Max(c)=0.8Max(c)=0.6 Max(c)=0.7 Enough resources? YES c R6 > c R10 So, replace R10 with R6, and push R6 down NO

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t R4 c=0.1 R5 c=0.4 R6 c=0.5 R8 c=0.6 R10 c=0.3 Max(c)=0.6 R9 c=0.7 SOURCE Max(c)=0.8 R7 c=0.8 Max(c)=0.7 Max(c)=0.5 This algorithm is called as Data-Item-at-a- Time-Algorithm (DiTA)

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t Real world stock price streams from are used. 10,000 values are polled during 1,000 traces; approximately a new data value is obtained per second. Traces – Collection procedure and charectristics

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t A coherency requirement c is associated with each of the chosen data items. c’s associated with data in a repository are a mix of stringent tolerances (varying from $0.01 to 0.05) and less stringent tolerances (varying from $0.5 to 0.99). T% of the data items have stringent coherency requirements at each repository (the remaining (100 – T)%, of data items have less stringent coherency requirements). Repositories – Data, Coherency and Cooperation characteristics

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t The router topology was generated using BRITE ( The repositories and the sources are selected randomly. node-node communication delays derived from a Pareto distribution: x  (1 / x 1/α ) + x 1 where α = x’ / (x’-1) and Physical Network – topology and delays

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t x’ is the mean, x 1 is the minimum delay a link can have. According to the experiments, x’=15 ms and x 1 =2 ms. The computational delays for dissemination is taken to be 12.5 ms. Physical Network – topology and delays

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t The key metric is the loss in fidelity of the data. Fidelity was the total length of time which the inequality; |P(t) – S(t)| < c holds. Fidelity of a repository is the mean over all data items stored in that repository Fidelity of the system is the mean fidelity of all repositories. Obviously, the loss of fidelity is (100% - fidelity) One another metric is the number of messages in the system (system load) Metrics

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t For the base performance measurement, 600 routers, 100 repositories and 4 servers were used. Total number of data items served by servers was varied from 100 to T parameter was varied from 20 to 80. A previous algorithm, LeLA was used as a benchmark. Performance Evaluation

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t Each node in DiTA does less work than in LeLA. Thus, in DiTA height of the dissemination tree will be more. So, when computational delays are low; but link delays are large, LeLA may act better. But, this happens only for negligible computational delays (0.5 ms) and very high link delays (110 ms) Performance Evaluation

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network Active backups vs. Passive backups Passive backups may increase the load, which causes the loss in fidelity. So active backup parents are used. A backup parent serves data to a dependent Q with a coherency c B > c.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network If all changes are less than c B, the dependent can not know when parent P fails. So P should send periodic “I’m alive” messages. Once P fails, Q requests B to serve it the data at c. When P recovers from the failure, Q requests B to serve the data item at c B. In this approach, there no backup for backups. So that when both P and B fails, Q can not get any updates.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network For the sake of simplicity, c B = k * c. Here, choice of k is important: Choice of c B Using a Probabilistic Model k Backup will send updates frequently which incur high computational and communication overheads Dependent will miss a large number of changes during failure of the parent

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network Assuming that the data values change with uniform probability and Using a Markov Chain Model: # Misses = 2k 2 – 2 2k 2 -2 is the number of updates a dependent will miss before it detects that there is a failure. According to the experiments, this number is rather pessimistic; nearly an upper limit. Choice of c B Using a Probabilistic Model

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network Choice of backup parents R BP Q C Any siblings? NO Any siblings? BC YES Choose one of them randomly

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network In case the coherency at which Q wants x from B is less then the coherency at which B wants x, the parent of B is asked to serve x to Q with the required tighter coherency. An advantage of choosing a sibling, is that the change in coherency requirement is not percolated all the way to the source. However, if an ancestor of P and B is heavily loaded, then the delay due to the load will be reflected in the updates of both the P and B. This might result in additional loss in fidelity. Choice of backup parents

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network Because the kinds of failures are memory-less, an exponential probability distribution is used for simulating them. Pr (X > t) = e -λt λ = λ 1  time to failure λ = λ 2  time to recover In this approach link failures are not taken into account. So the model is incomplete... Effect of Repository failures on Loss of Fidelity λ2λ2 fast recovery slow recovery

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network The effect of adding resiliency is shown. k=2 is used. When 100 data items are used, 23% of updates sent by backups are disseminated. Some updates sent by backups reached before parents’. Perfomance Evaluation

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network But when backup parents are loaded ( > 400), their updates are of no use, and increase the loss of fidelity. The dependent should control them by time- stamping the updates. Perfomance Evaluation

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network During the experiment, about % of the repositories experienced at least one failure, and the maximum number of failures in the system at any given time for λ 2 = was around 12. For λ 2 = 0.01, the maximum number of failures was 5 and for λ 2 = 0.1, the maximum failures was 2. Perfomance Evaluation

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network Effect of quick recovery is shown. λ 1 = and λ 2 = 2 For high coherence requirements, resiliency improves fidelity even for transient failures. Perfomance Evaluation

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network However, with resiliency; with a very large number of data items, for e.g., 1000, fidelity drops. This is because, at this point, the cost of resiliency exceeds the benefits obtained by it, and hence this increases the lost in fidelity. Perfomance Evaluation

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Delays 1)Queing delay: The time delay between the arrival of the update and time its processing started 2)Processing delay: Check delay (decide if the update should be processed) + computation delay ( delay of computing the update and pushing data to the dependents) Update of yUpdate of xupdate of yupdate of x Queue update requests queing delay Check if update needed yx Process of the updates and disseminating data is complete! processing delay

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Question: How can we reduce the average delays to improve fidelity? This can be done by: a)Better filtering i.e. Reducing the processing delay in determining if an update needs to disseminated to one or more dependents a)Better scheduling of disseminations

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Better Filtering For each dependent, a repository maintains the coherency req. & last value pushed to Upper bound = last pushed value + cr Lower bound = last pushed value - cr C1=0.7 C2=0.6 C3=0.5 C4=0.3 C5=0.1 C6=0.05 The dependent with first largest cr which needs to be disseminated For every window the below rule is valid If an update violates above rule a pseudo value is generated as actual value Algorithm to find the dependents to disseminate data Sorted cr values CR values for dependents reside at the repository Dependent ordering

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Better Filtering Better filtering provides: Sending the updates of dynamic data to end users who are actually interested in that update. By filtering, no garbage data flow is on the network. (no flooding of data over the network) This improves communication time in the networks and provides better response times By the help of filtering, a better scalable system can be established and it will resist against unexpected heavy loads.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Better scheduling of disseminations u2u1 C(u1) Cost of update(delay) C(u2) Cost of update(delay) b(u1) Beneficiary of update b(u2) Beneficiary of update Total delay of processing u i Approach: Instead of standard queueing of processing the update requests, a kind of prioritization is superior to have better performance  b(u)/C(u) SCORING Each update request is shceduled according to this score. B(u) is the number of dependents that will receive the update, C(u) is the cost of dissemination to all dependants. B(u) values are stored at aech repository so they are precomputed automatıcally. Advantages: Update requests that is important to many dependents will be processed earlier  BUSINESS IMPORTANCE Updates with low ratio gets delayed and if a new update arrives older ones are dropped, which improves performance especially in heaviliy loaded environments  SCALABILITY

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Scheduling provides: Priority scheme and business importance approach that achieves better results As filtering, it makes improvements on scalability; some out of date update requests are discarded from the queue. This saves unnecessary computations and queue delays. Better scheduling of disseminations

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Experimental Results “Dependent ordering” has lower loss of fidelity than “simple algorithm”. However “Scheduling” has better than those (up to 15%) “Dependent ordering” has less number of pushes than “simple algorithm”. “Scheduling” algorithm decrease computation delays because some updates are dropped at the queue because of new updates arrive and older ones are out of date. Fidelity loss with “Scheduling” is shown with some numbers. It is seen that fidelity drops with an increase in the number of data items. Even with large increases in the number of data items, high update rates loss of fidelity is in the range within 10% only. This provides better scalability

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Advantages of the better performance approaches Approach-1-: Maintaining the dependents ordered by cr values Reduces the number of checks required for processing each update Reduces the number of pushes Approach-2-: Scheduling Reduces the overall delay to the end clients by processing updates which provide a higher benefit at a lower cost Gives a better choice in dropping updates as low score updates are dropped Due to lower propagation delay, it provides better scalibility and degrades gracefully under unexpected heavy loads

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Related Work Simple decision procedure is superior. Because there are many complex algorithms and database systems, that take much computation time to maintain data repository up to date Some dynamic web data dissemination algorithms also uses push-based scheme. However if they use coherency scalability is improved and another important feature is that data repositories don’t need to cooperate with each other to maintain coherence information. (it’s up to date already!) This approach deals with rapidly changing dynamic data while some similar approaches focus on web content that changes at slower time-scales Most powerful side of this approach is that it deals with the problem of failure and forms a resillient dissemination network.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Conclusion The key points in this architecture are: Design of a push-based dissemination for time-varying data. Not all the updates are disseminated to each repository, only the updates that meet the coherency requirements are pushed  EFFICIENT Design of cooperative dissemination network. This provides a resilient network and even if a failure in the network occurs, data coherency is not completely lost.  RESILLIENT Intelligent filtering, scheduling, selective dissemination reduces the overhead in the network. It provides a better scalability and it’s a good alternative for dynamic data publishing.  SCALABLE