1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.

1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance

2 More and more of the information we consume is dynamically constructed… Ad Component Headline Component Headline Component Headline Component Headline Component Personalized Component Navigation Component

3 Buying a camera? Track auctions…

4 Dynamic Data Data gathered by (wireless sensor) networks Sensors that monitor light, humidity, pressure, and heat Network traffic passing via switches Sports Scores Score changes by 5 points Financials Rice price changes by Rs. 10 compared to previous day Total value of stock portfolio exceeds $ 10,000 rapid and unpredictable changes time critical, value critical used in on-line decision making

5 Continual Queries A CQ is a standing query coupled with a trigger/select condition CQ stock_monitor SELECT stock_price FROM quotes ; WHEN stock_price – prev_stock_price > $0.5 CQ RFP_tracker: SELECT project_name, contact_info FROM RFP_DB; WHERE skill_set_required ⋐ available_skills Not every change at a source leads to a change in the result of the query

6 Generic Architecture Data sources Proxies /caches /Data aggreators End-hosts servers sensors wired host mobile host Network

7 Where should the queries execute ? At clients can’t optimize across clients, links At source (where changes take place) Advantages Minimum number of refresh messages, high fidelity Main challenge Scalability Multiple sources hard to handle At Data Aggregators -- DAs/proxies -- placed at edge of network Advantages Allows scalability through consolidation, Multiple data sources Main challenge Need mechanisms for maintaining data consistency at DAs

8 Coherency of Dynamic Data Strong coherency The client and source always in sync with each other Strong coherency is expensive! Relax strong coherency:  - coherency Time domain:  t - coherency The client is never out of sync with the source by more than  t time units eg: Traffic data not stale by more than a minute Value domain:  v - coherency The difference in the data values at the client and the source bounded by  v at all times eg: Only interested in temperature changes larger than 1 degree

9 Coherency Requirement (c ) temperature, max incoherency = 1 degree Source S(t) Client U(t)

10 T at Server Data/Query Value at client Violation Bounds

11 Source pushes interesting changes + Achieves  v - coherency + Keeps network overhead minimum -- poor scalability (has to maintain state and keep connections open) Source DA User push

12 Pull – interesting changes Pull after Time to Live (TTL) Time To Next Refresh (TTR / TNR) + Can be implemented using the HTTP protocol + Stateless and hence is generally scalable with respect to state space and computation Need to estimate when a change of interest will happen Heavy polling for stringent coherence requirement or highly dynamic data Network overheads higher than for Push Server Repository User Pull

13 Complementary Properties

14 Dynamic Content Distribution Networks To create a scalable content dissemination network (CDN) for streaming/dynamic data. Metric: Fidelity: % of time coherency requirement is met

15 Dissemination Network: Example Data Set: p, q, r Max Clients : 2 Sourc e p : 0.2, q : 0.2 r : 0.2 p : 0.4, r : 0.3 q : 0.3 A B D C

16 Challenges – I Given the data and coherency needs of repositories, how should repositories cooperate to satisfy these needs? How should repositories refresh the data such that coherency requirements of dependents are satisfied? How to make repository network resilient to failures? [VLDB02, VLDB03, IEEE TKDE]

17 Challenges - II Given the data and the coherency available at repositories in the network, how to assign clients to repositories? Given the data and coherency needs of clients in the network, what data should reside in each repository and at what coherency? If the client requirements keep changing, how and when should the repositories be reorganized ? [RTSS 2004, VLDB 2005]

18 Dynamics along three axes Data is dynamic, i.e., data changes rapidly and unpredictably Data items that a client is interested in also change dynamically Network is dynamic, nodes come and go

19 Data Dissemination

20 Data Dissemination Different users have different coherency req for the same data item. Coherency requirement at a repository should be at least as stringent as that of the dependents. Repositories disseminate only changes of interest. Source p:0.2, q:0.2r:0.2 p:0.4, r: 0.3 q: 0.4 q: 0.3 A B D C Client

21 Data dissemination -- must be done with care SourceRepository PRepository Q should prevent missed updates! 1.2 1.5 1 1.4 1.7 1.4 111 1 1.7 1 1

22 Source Based Dissemination Algorithm For each data item, source maintains unique coherency requirements of repositories the last update sent for that coherency For every change, source finds the maximum coherency for which it must be disseminated tags the change with that coherency disseminates (changed data, tag)

23 Source Based Dissemination Algorithm SourceRepository PRepository Q 1.2 1.5 1 1.4 1.7 111 1 1.5 1

24 Repository Based Dissemination Algorithm A repository P sends changes of interest to the dependent Q if

25 Repository Based Dissemination Algorithm SourceRepository PRepository Q 1.2 1.5 1 1.4 1.7 1.4 111 1 1.7 1.4

26 Building the content distribution network Choose parents for repositories such that overall fidelity observed by the repositories is high ---reduce communication and computational delays..

27 If parents are not chosen judiciously It may result in Uneven distribution of load on repositories. Increase in the number of messages in the system. Increase in loss in fidelity! Sour ce p:0.2, q:0.2 r:0.2 p:0.4, r: 0.3 q: 0.3 A B D C

28 DiTA Repository N needs data item x If the source has available push connections, or the source is the only node in the dissemination tree for x N is made the child of the source Else repository is inserted in most suitable subtree where N’’s ancestors have more stringent coherency requirements N is closest to the root

29 Most Suitable Subtree? l: smallest level in the subtree with coherency requirement less stringent than N’’s. d: communication delay from the root of the subtree to N. smallest (l x d ): most suitable subtree. Essentially, minimize communication and computational delays!

30 Example Source Initially the network consists of the source. q: 0.2 A A and B request service of q with coherency requirement 0.2 q: 0.2 B C requests service of q with coherency requirement 0.1 q: 0.1 C q: 0.2 A

31 Example D requests service of q with coherency requirement 0.2 Source q: 0.1q: 0.2 q: 0.3 q: 0.5 q: 0.4 q: 0.3 q: 0.2 q: 0.3 q: 0.5 q: 0.4

32 Example D requests service of q with coherency requirement 0.2 Source q: 0.1q: 0.2 q: 0.3 q: 0.5 q: 0.4 q: 0.3 q: 0.2 q: 0.3 q: 0.4 q: 0.5 q: 0.3 D

33 Resiliency

34 Handling Failures in the Network Need to detect permanent/transient failures in the network and to recover from them Resiliency is obtained by adding redundancy Without redundancy, failures  loss in fidelity Adding redundancy can increase cost  possible loss of fidelity! Handle failures such that cost of adding resiliency is low!

35 Passive/Active Failure Handling Passive failure detection: Parent sends I’m alive messages at the end of every time interval. what should the time interval be? Active failure handling: Always be prepared for failures. For example: 2 repositories can serve the same data item at the same coherency to a child. This means lots of work  greater loss in fidelity.

36 Middle Path A backup parent B is found for each data item that the repository needs Let repository R want data item x with coherency c. P R B serves R with coherency k × c k × c c B At what coherency should B serve R ?

37 If a parent fails Detection: Child gets two consecutive updates from the backup parent with no updates from the parent B R k x ck x c c P c Recovery: Backup parent is asked to serve at coherency c till we get an update from the parent

38 Adding Resiliency to DiTA A sibling of P is chosen as the backup parent of R. If P fails, A serves B with coherency c  change is local. If P has no siblings, a sibling of nearest ancestor is chosen. Else the source is made the backup parent. B R k x ck x c c P A

39 Markov Analysis for k Assumptions Data changes as a random walk along the line The probability of an increase is the same as that of a decrease No assumptions made about the unit of change or time taken for a change Expected # misses for any k <= 2 k 2 – 2 for k = 2, expected # misses <= 6

40 Experimental Methodology Physical network: 4 servers, 600 routers, 100 repositories Communication delay: 20-30 ms Computation delay: 3-5 ms Real stock traces: 100-1000 Time duration of observations: 10,000 s Tight coherency range: 0.01 to 0.05 loose coherency range: 0.5 to 0.99

41 Failure and Recovery Modelling Failures and recovery modeled based on trends observed in practice Analysis of link failures in an IP backbone by G. Iannaccone et al Internet Measurement Workshop 2002 Recovery:10% > 20 min 40% > 1 min & < 20 min 50% < 1 min Trend for time between failure:

42 In the Presence of Failures, Varying Recovery Times Addition of resiliency does improve fidelity.

43 In the Presence of Failures, Varying Data Items Increasing Load Fidelity improves with addition of resiliency even for large number of data items.

44 In the Absence of Failures Increasing Load Often, fidelity improves with addition of resiliency, even in the absence of failures!

45 Beyond Resiliency Scheduling Assigning clients to repositories Balancing load in the network Handling queries

46 Acknowledgements Allister Bernard & Vivek Sharma S. Dharmarajan Shweta Agarwal T. Siva Prof. C. Ravishankar Prof. Sohoni and Prof. Rangaraj Prof. S. Sudarshan Prof. Krithi Ramamritham

1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.

Similar presentations

Presentation on theme: "1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.

Similar presentations

Presentation on theme: "1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance."— Presentation transcript:

Similar presentations

About project

Feedback