Presentation is loading. Please wait.

Presentation is loading. Please wait.

Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data PAPER PRESENTATION.

Similar presentations


Presentation on theme: "Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data PAPER PRESENTATION."— Presentation transcript:

1 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data PAPER PRESENTATION on An Efficient and Resilient Approach to Filtering & Disseminating Streaming Data CMPE 521 Database Systems Prepared by: Mürsel Taşgın Onur Kardeş

2 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction The internet and the web are increasingly used to disseminate fast changing data. Several examples for fast changing data: sensors, traffic and weather information, stock prices, sports scores, health monitoring information

3 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction The properties of this data: Highly dinamic, Streaming, Aperiodic. Users are interested in not only monitoring streaming data but in also using it for on-line decision making.

4 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction SOURCE Repository 1 Repository 2 Replicating the Source Repository 3

5 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Services like Akamai.net and IBM’s edge server technology are exemplars of such networks of repositories, which aim to provide better services by shifting most of the work to the edge of the network (closer to the end users). But, although such systems scale quite well, if the data is changing at a fast rate, the quality of service at a repository farther from the data source would deteriorate.

6 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction In general; Replication can reduce the load on the sources, But, replication of time-varying data introduces new challenges: Coherency Delays and scalability

7 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Coherency requirement (cr) : Coherency requirement (cr) : Users specify the bound on the tolerable imprecision associated with each requested data item. SOURCE Microsoft : $60,85 at time : 11:43 Repository 2 Microsoft : $60,86 at time : 11:41 Repository 1 Microsoft : $60,89 at time : 11:36 USER 1 USER 2

8 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Coherency-preserving system: the delivered data must preserve associated coherency requirements, resilient to failures, efficient. Necessary changes are pushed to the users; instead of polling the source independently.

9 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Construction of an effective dissemination network of repositories A logical overlay network of repositories are created according to: coherency needs of users attached to each repository expected delays at each repository this network is called dynamic data dissemination graph (d 3 g).

10 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Construction of an effective dissemination network of repositories The previous algorithm called LeLA, for d 3 g, was unable to cope with large number of data. A new algorithm (DiTA) to build dissemination networks that are scalable and resilient, is introduced.

11 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Construction of an effective dissemination network of repositories In DiTA, repositories with more stringent coherency requirements are placed closer to the source in the network as they are likely to get more updates than the ones with looser coherency requirements. In DiTA, a dynamic data dissemination tree, d 3 g, is created for each data item, x.

12 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction SOURCE Repository 1 c = 0.2 Repository 2 c = 0.3 Repository 3 c = 0.8 Repository 4 c = 0.7 Repository 5 c = 0.9 Repository 6 c = 0.7 Construction of an effective dissemination network of repositories

13 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Provision for the dissemination of dynamic data in spite of failures in the overlay network to handle repository and communication link failures; back-up parents are used. back-up parent is asked to deliver data with coherency that is less stringent than that associated with the parent.

14 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Provision for the dissemination of dynamic data in spite of failures in the overlay network x,y,z,ta,b,c,x zy,z,tx,t Parent Back-up Parent

15 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Introduction Efficient filtering and scheduling techniques for repositories normally a repository receives updates and selectively disseminates them to its downstreams. it is not always necessary to disseminate the exact values of the most recent updates, as long as the values presented preserve the coherency of the data.

16 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data The Basic Framework: Data Coherency and Overlay Network

17 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data The Basic Framework: Data Coherency and Overlay Network a coherency requirement (c) is associated with a data item, to denote the maximum permissible deviation of the user’s view from the value of data x at the source. c can be specified in terms of; time (values should never be out-of-sync by more than 5sec.) value (weather information where the temperature value should never be out-of-sync by more than 2 degrees).

18 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data The Basic Framework: Data Coherency and Overlay Network Each data item in the repository from which a user obtains data must be refreshed in such a way that the user-specified coherency requirements are maintained. fidelity f observed by a user can be defined to be the total length of time for which the above inequality holds U x (t) – S x (t) ≤ c 1 P x (t) – S x (t) ≤ c 2

19 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data The Basic Framework: Data Coherency and Overlay Network Assume x is served by a single source Repositories R 1,....,R n are interested in x. These repositories in turn serve a subset of the remaining repositories such that the resulting network is in the form a tree rooted at the source and consisting of repositories R 1,....,R n. Parent  dependent relationship.

20 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data The Basic Framework: Data Coherency and Overlay Network Since the repository disseminates updates to its users and dependents, the coherency requirement of a repository should be the most stringent requirement that it has to serve. When a data change occurs at the source, it checks which of its direct and indirect dependents are interested in the change and pushes the change to them.

21 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d 3 t Start with a physical layout of the communication network in the form of a graph, where the graph consists of a set of sources, repositories and the underlying network. Try to build a d 3 t for a data item x. The root of the d 3 t will be the source, which serves x. A repository P serving repository Q with data item x, is called the parent of Q; and Q is called the dependent of P for x.

22 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d 3 t Source for data item x USERS R1 R2 Parent Dependents Level 0 Level 1 Level 2 in each repository;

23 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t A repository should ideally serve at least as many unique pairs as the number of data items served to it. If a repository is currently serving less than this fixed number, then we say that the repository has the resources to serve a new dependent. R1 Dependent Data Item R7 x R11 y R18 x R9 z R10 t R21 x ?

24 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t R4 c=0.1 R7 c=0.8 R5 c=0.4 R9 c=0.7 R8 c=0.6 SOURCE R6 c=0.5 R10 c=0.3 Enough resources? Max(c)=0.8 Max(c)=0.7 Max(c)=0.8Max(c)=0.6 Max(c)=0.7 Enough resources? YES c R6 > c R10 So, replace R10 with R6, and push R6 down NO

25 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t R4 c=0.1 R5 c=0.4 R6 c=0.5 R8 c=0.6 R10 c=0.3 Max(c)=0.6 R9 c=0.7 SOURCE Max(c)=0.8 R7 c=0.8 Max(c)=0.7 Max(c)=0.5 This algorithm is called as Data-Item-at-a- Time-Algorithm (DiTA)

26 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t Real world stock price streams from http://finance.yahoo.com are used.http://finance.yahoo.com 10,000 values are polled during 1,000 traces; approximately a new data value is obtained per second. Traces – Collection procedure and charectristics

27 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t A coherency requirement c is associated with each of the chosen data items. c’s associated with data in a repository are a mix of stringent tolerances (varying from $0.01 to 0.05) and less stringent tolerances (varying from $0.5 to 0.99). T% of the data items have stringent coherency requirements at each repository (the remaining (100 – T)%, of data items have less stringent coherency requirements). Repositories – Data, Coherency and Cooperation characteristics

28 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t The router topology was generated using BRITE (http://www.cs.bu.edu/brite).http://www.cs.bu.edu/brite The repositories and the sources are selected randomly. node-node communication delays derived from a Pareto distribution: x  (1 / x 1/α ) + x 1 where α = x’ / (x’-1) and Physical Network – topology and delays

29 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t x’ is the mean, x 1 is the minimum delay a link can have. According to the experiments, x’=15 ms and x 1 =2 ms. The computational delays for dissemination is taken to be 12.5 ms. Physical Network – topology and delays

30 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t The key metric is the loss in fidelity of the data. Fidelity was the total length of time which the inequality; |P(t) – S(t)| < c holds. Fidelity of a repository is the mean over all data items stored in that repository Fidelity of the system is the mean fidelity of all repositories. Obviously, the loss of fidelity is (100% - fidelity) One another metric is the number of messages in the system (system load) Metrics

31 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t For the base performance measurement, 600 routers, 100 repositories and 4 servers were used. Total number of data items served by servers was varied from 100 to 1000. T parameter was varied from 20 to 80. A previous algorithm, LeLA was used as a benchmark. Performance Evaluation

32 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Building a d3t Each node in DiTA does less work than in LeLA. Thus, in DiTA height of the dissemination tree will be more. So, when computational delays are low; but link delays are large, LeLA may act better. But, this happens only for negligible computational delays (0.5 ms) and very high link delays (110 ms) Performance Evaluation

33 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network Active backups vs. Passive backups Passive backups may increase the load, which causes the loss in fidelity. So active backup parents are used. A backup parent serves data to a dependent Q with a coherency c B > c.

34 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network If all changes are less than c B, the dependent can not know when parent P fails. So P should send periodic “I’m alive” messages. Once P fails, Q requests B to serve it the data at c. When P recovers from the failure, Q requests B to serve the data item at c B. In this approach, there no backup for backups. So that when both P and B fails, Q can not get any updates.

35 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network For the sake of simplicity, c B = k * c. Here, choice of k is important: Choice of c B Using a Probabilistic Model k Backup will send updates frequently which incur high computational and communication overheads Dependent will miss a large number of changes during failure of the parent

36 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network Assuming that the data values change with uniform probability and Using a Markov Chain Model: # Misses = 2k 2 – 2 2k 2 -2 is the number of updates a dependent will miss before it detects that there is a failure. According to the experiments, this number is rather pessimistic; nearly an upper limit. Choice of c B Using a Probabilistic Model

37 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network Choice of backup parents R BP Q C Any siblings? NO Any siblings? BC YES Choose one of them randomly

38 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network In case the coherency at which Q wants x from B is less then the coherency at which B wants x, the parent of B is asked to serve x to Q with the required tighter coherency. An advantage of choosing a sibling, is that the change in coherency requirement is not percolated all the way to the source. However, if an ancestor of P and B is heavily loaded, then the delay due to the load will be reflected in the updates of both the P and B. This might result in additional loss in fidelity. Choice of backup parents

39 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network Because the kinds of failures are memory-less, an exponential probability distribution is used for simulating them. Pr (X > t) = e -λt λ = λ 1  time to failure λ = λ 2  time to recover In this approach link failures are not taken into account. So the model is incomplete... Effect of Repository failures on Loss of Fidelity λ2λ2 fast recovery slow recovery

40 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network The effect of adding resiliency is shown. k=2 is used. When 100 data items are used, 23% of updates sent by backups are disseminated. Some updates sent by backups reached before parents’. Perfomance Evaluation

41 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network But when backup parents are loaded ( > 400), their updates are of no use, and increase the loss of fidelity. The dependent should control them by time- stamping the updates. Perfomance Evaluation

42 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network During the experiment, about 80- 90% of the repositories experienced at least one failure, and the maximum number of failures in the system at any given time for λ 2 = 0.001 was around 12. For λ 2 = 0.01, the maximum number of failures was 5 and for λ 2 = 0.1, the maximum failures was 2. Perfomance Evaluation

43 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network Effect of quick recovery is shown. λ 1 = 0.0001 and λ 2 = 2 For high coherence requirements, resiliency improves fidelity even for transient failures. Perfomance Evaluation

44 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Enchancing the Resiliency of the Repository Network However, with resiliency; with a very large number of data items, for e.g., 1000, fidelity drops. This is because, at this point, the cost of resiliency exceeds the benefits obtained by it, and hence this increases the lost in fidelity. Perfomance Evaluation

45 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Delays 1)Queing delay: The time delay between the arrival of the update and time its processing started 2)Processing delay: Check delay (decide if the update should be processed) + computation delay ( delay of computing the update and pushing data to the dependents) Update of yUpdate of xupdate of yupdate of x Queue update requests queing delay Check if update needed yx Process of the updates and disseminating data is complete! processing delay

46 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Question: How can we reduce the average delays to improve fidelity? This can be done by: a)Better filtering i.e. Reducing the processing delay in determining if an update needs to disseminated to one or more dependents a)Better scheduling of disseminations

47 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Better Filtering For each dependent, a repository maintains the coherency req. & last value pushed to Upper bound = last pushed value + cr Lower bound = last pushed value - cr C1=0.7 C2=0.6 C3=0.5 C4=0.3 C5=0.1 C6=0.05 The dependent with first largest cr which needs to be disseminated For every window the below rule is valid If an update violates above rule a pseudo value is generated as actual value Algorithm to find the dependents to disseminate data Sorted cr values CR values for dependents reside at the repository Dependent ordering

48 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Better Filtering Better filtering provides: Sending the updates of dynamic data to end users who are actually interested in that update. By filtering, no garbage data flow is on the network. (no flooding of data over the network) This improves communication time in the networks and provides better response times By the help of filtering, a better scalable system can be established and it will resist against unexpected heavy loads.

49 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Better scheduling of disseminations u2u1 C(u1) Cost of update(delay) C(u2) Cost of update(delay) b(u1) Beneficiary of update b(u2) Beneficiary of update Total delay of processing u i Approach: Instead of standard queueing of processing the update requests, a kind of prioritization is superior to have better performance  b(u)/C(u) SCORING Each update request is shceduled according to this score. B(u) is the number of dependents that will receive the update, C(u) is the cost of dissemination to all dependants. B(u) values are stored at aech repository so they are precomputed automatıcally. Advantages: Update requests that is important to many dependents will be processed earlier  BUSINESS IMPORTANCE Updates with low ratio gets delayed and if a new update arrives older ones are dropped, which improves performance especially in heaviliy loaded environments  SCALABILITY

50 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Scheduling provides: Priority scheme and business importance approach that achieves better results As filtering, it makes improvements on scalability; some out of date update requests are discarded from the queue. This saves unnecessary computations and queue delays. Better scheduling of disseminations

51 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Experimental Results “Dependent ordering” has lower loss of fidelity than “simple algorithm”. However “Scheduling” has better than those (up to 15%) “Dependent ordering” has less number of pushes than “simple algorithm”. “Scheduling” algorithm decrease computation delays because some updates are dropped at the queue because of new updates arrive and older ones are out of date. Fidelity loss with “Scheduling” is shown with some numbers. It is seen that fidelity drops with an increase in the number of data items. Even with large increases in the number of data items, high update rates loss of fidelity is in the range within 10% only. This provides better scalability

52 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Reducing the Delay at a Repository Advantages of the better performance approaches Approach-1-: Maintaining the dependents ordered by cr values Reduces the number of checks required for processing each update Reduces the number of pushes Approach-2-: Scheduling Reduces the overall delay to the end clients by processing updates which provide a higher benefit at a lower cost Gives a better choice in dropping updates as low score updates are dropped Due to lower propagation delay, it provides better scalibility and degrades gracefully under unexpected heavy loads

53 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Related Work Simple decision procedure is superior. Because there are many complex algorithms and database systems, that take much computation time to maintain data repository up to date Some dynamic web data dissemination algorithms also uses push-based scheme. However if they use coherency scalability is improved and another important feature is that data repositories don’t need to cooperate with each other to maintain coherence information. (it’s up to date already!) This approach deals with rapidly changing dynamic data while some similar approaches focus on web content that changes at slower time-scales Most powerful side of this approach is that it deals with the problem of failure and forms a resillient dissemination network.

54 Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data Conclusion The key points in this architecture are: Design of a push-based dissemination for time-varying data. Not all the updates are disseminated to each repository, only the updates that meet the coherency requirements are pushed  EFFICIENT Design of cooperative dissemination network. This provides a resilient network and even if a failure in the network occurs, data coherency is not completely lost.  RESILLIENT Intelligent filtering, scheduling, selective dissemination reduces the overhead in the network. It provides a better scalability and it’s a good alternative for dynamic data publishing.  SCALABLE


Download ppt "Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data PAPER PRESENTATION."

Similar presentations


Ads by Google