1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.

Slides:



Advertisements
Similar presentations
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Advertisements

Dissemination-based Data Delivery Using Broadcast Disks.
Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
1 Improving the Performance of Distributed Applications Using Active Networks Mohamed M. Hefeeda 4/28/1999.
June 3, A New Multipath Routing Protocol for Ad Hoc Wireless Networks Amit Gupta and Amit Vyas.
SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy,
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Adaptive Push-Pull: Disseminating Dynamic Web Data Pavan Deolasee, Amol Katkar, Krithi,Ramamritham Indian Institute of Technology Bombay Dept. of CS University.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Application Layer Anycasting: A Server Selection Architecture and Use in a Replicated Web Service Presented in by Jayanthkumar Kannan On 11/26/03.
Building Low-Diameter P2P Networks Eli Upfal Department of Computer Science Brown University Joint work with Gopal Pandurangan and Prabhakar Raghavan.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
Detecting SYN-Flooding Attacks Aaron Beach CS 395 Network Secu rity Spring 2004.
Hands-On Microsoft Windows Server 2003 Networking Chapter 7 Windows Internet Naming Service.
Efficiently Maintaining Stock Portfolios Up-To-Date On The Web Prashant Shenoy Manish Bhide Krithi Ramamritham 2002 IEEE E-Commerce System Proceedings.
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
World Wide Web Caching: Trends and Technology Greg Barish and Katia Obraczka USC Information Science Institute IEEE Communications Magazine, May 2000 Presented.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Algorithms for Self-Organization and Adaptive Service Placement in Dynamic Distributed Systems Artur Andrzejak, Sven Graupner,Vadim Kotov, Holger Trinks.
UCSC 1 Aman ShaikhICNP 2003 An Efficient Algorithm for OSPF Subnet Aggregation ICNP 2003 Aman Shaikh Dongmei Wang, Guangzhi Li, Jennifer Yates, Charles.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Computer Measurement Group, India Reliable and Scalable Data Streaming in Multi-Hop Architecture Sudhir Sangra, BMC Software Lalit.
Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy,
Towards Highly Reliable Enterprise Network Services via Inference of Multi-level Dependencies Paramvir Bahl, Ranveer Chandra, Albert Greenberg, Srikanth.
Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data PAPER PRESENTATION.
GeoGrid: A scalable Location Service Network Authors: J.Zhang, G.Zhang, L.Liu Georgia Institute of Technology presented by Olga Weiss Com S 587x, Fall.
Network Aware Resource Allocation in Distributed Clouds.
“Intra-Network Routing Scheme using Mobile Agents” by Ajay L. Thakur.
Rate-based Data Propagation in Sensor Networks Gurdip Singh and Sandeep Pujar Computing and Information Sciences Sanjoy Das Electrical and Computer Engineering.
1 Resilient and Coherence Preserving Dissemination of Dynamic Data Using Cooperating Peers Shetal Shah, IIT Bombay Kirthi Ramamritham, IIT Bombay Prashant.
A Distributed Clustering Framework for MANETS Mohit Garg, IIT Bombay RK Shyamasundar School of Tech. & Computer Science Tata Institute of Fundamental Research.
De-Nian Young Ming-Syan Chen IEEE Transactions on Mobile Computing Slide content thanks in part to Yu-Hsun Chen, University of Taiwan.
Architectures of distributed systems Fundamental Models
COMPUTING AGGREGATES FOR MONITORING WIRELESS SENSOR NETWORKS Jerry Zhao, Ramesh Govindan, Deborah Estrin Presented by Hiren Shah.
Benjamin AraiUniversity of California, Riverside Reliable Hierarchical Data Storage in Sensor Networks Song Lin – Benjamin.
TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
Communication Paradigm for Sensor Networks Sensor Networks Sensor Networks Directed Diffusion Directed Diffusion SPIN SPIN Ishan Banerjee
Energy-Efficient Monitoring of Extreme Values in Sensor Networks Loo, Kin Kong 10 May, 2007.
2007/03/26OPLAB, NTUIM1 A Proactive Tree Recovery Mechanism for Resilient Overlay Network Networking, IEEE/ACM Transactions on Volume 15, Issue 1, Feb.
MobiQuitous 2007 Towards Scalable and Robust Service Discovery in Ubiquitous Computing Environments via Multi-hop Clustering Wei Gao.
Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,
Computer Science Lecture 14, page 1 CS677: Distributed OS Last Class: Concurrency Control Concurrency control –Two phase locks –Time stamps Intro to Replication.
1 Data Mining at work Krithi Ramamritham. 2 Dynamics of Web Data Dynamically created Web Pages -- using scripting languages Ad Component Headline Component.
Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.
Self-stabilizing energy-efficient multicast for MANETs.
Peter Pham and Sylvie Perreau, IEEE 2002 Mobile and Wireless Communications Network Multi-Path Routing Protocol with Load Balancing Policy in Mobile Ad.
1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.
Architecture and Algorithms for an IEEE 802
Wireless Sensor Network Architectures
Internet Networking recitation #12
Accessing nearby copies of replicated objects
Providing Secure Storage on the Internet
Introduction There are many situations in which we might use replicated data Let’s look at another, different one And design a system to work well in that.
Dissemination of Dynamic Data on the Internet
Dynamic Replica Placement for Scalable Content Delivery
Design and Implementation of OverLay Multicast Tree Protocol
Presentation transcript:

1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance

2 More and more of the information we consume is dynamically constructed… Ad Component Headline Component Headline Component Headline Component Headline Component Personalized Component Navigation Component

3 Buying a camera? Track auctions…

4 Dynamic Data Data gathered by (wireless sensor) networks Sensors that monitor light, humidity, pressure, and heat Network traffic passing via switches Sports Scores Score changes by 5 points Financials Rice price changes by Rs. 10 compared to previous day Total value of stock portfolio exceeds $ 10,000 rapid and unpredictable changes time critical, value critical used in on-line decision making

5 Continual Queries A CQ is a standing query coupled with a trigger/select condition CQ stock_monitor SELECT stock_price FROM quotes ; WHEN stock_price – prev_stock_price > $0.5 CQ RFP_tracker: SELECT project_name, contact_info FROM RFP_DB; WHERE skill_set_required ⋐ available_skills Not every change at a source leads to a change in the result of the query

6 Generic Architecture Data sources Proxies /caches /Data aggreators End-hosts servers sensors wired host mobile host Network

7 Where should the queries execute ? At clients can’t optimize across clients, links At source (where changes take place) Advantages Minimum number of refresh messages, high fidelity Main challenge Scalability Multiple sources hard to handle At Data Aggregators -- DAs/proxies -- placed at edge of network Advantages Allows scalability through consolidation, Multiple data sources Main challenge Need mechanisms for maintaining data consistency at DAs

8 Coherency of Dynamic Data Strong coherency The client and source always in sync with each other Strong coherency is expensive! Relax strong coherency:  - coherency Time domain:  t - coherency The client is never out of sync with the source by more than  t time units eg: Traffic data not stale by more than a minute Value domain:  v - coherency The difference in the data values at the client and the source bounded by  v at all times eg: Only interested in temperature changes larger than 1 degree

9 Coherency Requirement (c ) temperature, max incoherency = 1 degree Source S(t) Client U(t)

10 T at Server Data/Query Value at client Violation Bounds

11 Source pushes interesting changes + Achieves  v - coherency + Keeps network overhead minimum -- poor scalability (has to maintain state and keep connections open) Source DA User push

12 Pull – interesting changes Pull after Time to Live (TTL) Time To Next Refresh (TTR / TNR) + Can be implemented using the HTTP protocol + Stateless and hence is generally scalable with respect to state space and computation Need to estimate when a change of interest will happen Heavy polling for stringent coherence requirement or highly dynamic data Network overheads higher than for Push Server Repository User Pull

13 Complementary Properties

14 Dynamic Content Distribution Networks To create a scalable content dissemination network (CDN) for streaming/dynamic data. Metric: Fidelity: % of time coherency requirement is met

15 Dissemination Network: Example Data Set: p, q, r Max Clients : 2 Sourc e p : 0.2, q : 0.2 r : 0.2 p : 0.4, r : 0.3 q : 0.3 A B D C

16 Challenges – I Given the data and coherency needs of repositories, how should repositories cooperate to satisfy these needs? How should repositories refresh the data such that coherency requirements of dependents are satisfied? How to make repository network resilient to failures? [VLDB02, VLDB03, IEEE TKDE]

17 Challenges - II Given the data and the coherency available at repositories in the network, how to assign clients to repositories? Given the data and coherency needs of clients in the network, what data should reside in each repository and at what coherency? If the client requirements keep changing, how and when should the repositories be reorganized ? [RTSS 2004, VLDB 2005]

18 Dynamics along three axes Data is dynamic, i.e., data changes rapidly and unpredictably Data items that a client is interested in also change dynamically Network is dynamic, nodes come and go

19 Data Dissemination

20 Data Dissemination Different users have different coherency req for the same data item. Coherency requirement at a repository should be at least as stringent as that of the dependents. Repositories disseminate only changes of interest. Source p:0.2, q:0.2r:0.2 p:0.4, r: 0.3 q: 0.4 q: 0.3 A B D C Client

21 Data dissemination -- must be done with care SourceRepository PRepository Q should prevent missed updates!

22 Source Based Dissemination Algorithm For each data item, source maintains unique coherency requirements of repositories the last update sent for that coherency For every change, source finds the maximum coherency for which it must be disseminated tags the change with that coherency disseminates (changed data, tag)

23 Source Based Dissemination Algorithm SourceRepository PRepository Q

24 Repository Based Dissemination Algorithm A repository P sends changes of interest to the dependent Q if

25 Repository Based Dissemination Algorithm SourceRepository PRepository Q

26 Building the content distribution network Choose parents for repositories such that overall fidelity observed by the repositories is high ---reduce communication and computational delays..

27 If parents are not chosen judiciously It may result in Uneven distribution of load on repositories. Increase in the number of messages in the system. Increase in loss in fidelity! Sour ce p:0.2, q:0.2 r:0.2 p:0.4, r: 0.3 q: 0.3 A B D C

28 DiTA Repository N needs data item x If the source has available push connections, or the source is the only node in the dissemination tree for x N is made the child of the source Else repository is inserted in most suitable subtree where N’’s ancestors have more stringent coherency requirements N is closest to the root

29 Most Suitable Subtree? l: smallest level in the subtree with coherency requirement less stringent than N’’s. d: communication delay from the root of the subtree to N. smallest (l x d ): most suitable subtree. Essentially, minimize communication and computational delays!

30 Example Source Initially the network consists of the source. q: 0.2 A A and B request service of q with coherency requirement 0.2 q: 0.2 B C requests service of q with coherency requirement 0.1 q: 0.1 C q: 0.2 A

31 Example D requests service of q with coherency requirement 0.2 Source q: 0.1q: 0.2 q: 0.3 q: 0.5 q: 0.4 q: 0.3 q: 0.2 q: 0.3 q: 0.5 q: 0.4

32 Example D requests service of q with coherency requirement 0.2 Source q: 0.1q: 0.2 q: 0.3 q: 0.5 q: 0.4 q: 0.3 q: 0.2 q: 0.3 q: 0.4 q: 0.5 q: 0.3 D

33 Resiliency

34 Handling Failures in the Network Need to detect permanent/transient failures in the network and to recover from them Resiliency is obtained by adding redundancy Without redundancy, failures  loss in fidelity Adding redundancy can increase cost  possible loss of fidelity! Handle failures such that cost of adding resiliency is low!

35 Passive/Active Failure Handling Passive failure detection: Parent sends I’m alive messages at the end of every time interval. what should the time interval be? Active failure handling: Always be prepared for failures. For example: 2 repositories can serve the same data item at the same coherency to a child. This means lots of work  greater loss in fidelity.

36 Middle Path A backup parent B is found for each data item that the repository needs Let repository R want data item x with coherency c. P R B serves R with coherency k × c k × c c B At what coherency should B serve R ?

37 If a parent fails Detection: Child gets two consecutive updates from the backup parent with no updates from the parent B R k x ck x c c P c Recovery: Backup parent is asked to serve at coherency c till we get an update from the parent

38 Adding Resiliency to DiTA A sibling of P is chosen as the backup parent of R. If P fails, A serves B with coherency c  change is local. If P has no siblings, a sibling of nearest ancestor is chosen. Else the source is made the backup parent. B R k x ck x c c P A

39 Markov Analysis for k Assumptions Data changes as a random walk along the line The probability of an increase is the same as that of a decrease No assumptions made about the unit of change or time taken for a change Expected # misses for any k <= 2 k 2 – 2 for k = 2, expected # misses <= 6

40 Experimental Methodology Physical network: 4 servers, 600 routers, 100 repositories Communication delay: ms Computation delay: 3-5 ms Real stock traces: Time duration of observations: 10,000 s Tight coherency range: 0.01 to 0.05 loose coherency range: 0.5 to 0.99

41 Failure and Recovery Modelling Failures and recovery modeled based on trends observed in practice Analysis of link failures in an IP backbone by G. Iannaccone et al Internet Measurement Workshop 2002 Recovery:10% > 20 min 40% > 1 min & < 20 min 50% < 1 min Trend for time between failure:

42 In the Presence of Failures, Varying Recovery Times Addition of resiliency does improve fidelity.

43 In the Presence of Failures, Varying Data Items Increasing Load Fidelity improves with addition of resiliency even for large number of data items.

44 In the Absence of Failures Increasing Load Often, fidelity improves with addition of resiliency, even in the absence of failures!

45 Beyond Resiliency Scheduling Assigning clients to repositories Balancing load in the network Handling queries

46 Acknowledgements Allister Bernard & Vivek Sharma S. Dharmarajan Shweta Agarwal T. Siva Prof. C. Ravishankar Prof. Sohoni and Prof. Rangaraj Prof. S. Sudarshan Prof. Krithi Ramamritham