Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar.

Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar University of California Riverside

Dynamic Data Traffic data packets thru switches / vehicles on highways Stock prices, Sport Scores rapid and unpredictable changes time critical, value critical used in on-line monitoring, decision making More and more of data gathered from the web/internet is dynamic

Coherency of Dynamic Data Strong coherency The client and source always in sync (U(t) = S(t)) Strong coherency is expensive! Relax strong coherency:  - coherency Time domain:  t - coherency Value domain:  v - coherency The difference in the data values at the client and the source bounded by  v at all times E.g.: temperature changes greater than 1 degree Source S(t) Repository R(t) Clien t U(t)

Broad Focus of work To create a scalable content dissemination network (CDN) for streaming/dynamic data. Metric: Fidelity: % of time coherency requirement is met

Clients request for different data items by specifying coherence requirements for each item Repositories derive their requirements from the client requirements Source pushes the changes of interest to repositories Repositories cooperate with each other and the source to serve clients Basic Framework: Sources, Repositories, Clients

Example Dissemination Network Data Set: p, q, r Max Clients : 2 Source p : 0.2, q : 0.2 r : 0.2 p : 0.4, r : 0.3 q : 0.3 R1R1 R2R2 R4R4 R3R3

Challenges – I Given the data and coherency needs of repositories, how should repositories cooperate to satisfy these needs? How should repositories refresh the data such that coherency requirements of dependents are satisfied? How to make repository network resilient to failures? [VLDB02, VLDB03, IEEE TKDE]

Challenges – II: Service to Clients Given the data and coherency needs of clients what data at what coherency should reside in each repository? Given the data and the coherency available at repositories, how to assign clients to the repositories? Service to Clients Assign data to repositories Assign clients to repositories

Assigning clients to repositories Client request is satisfied Overheads are low Communication delay Computational delay C1C1 R2R2 Source p :0.2, q :0.2 r :0.2 p :0.4, r : 0.3 q : 0.3 R1R1 R4R4 R3R3 ? Assign to repository

Overview Client assignment problem is NP-Hard Solve using preferences Clients and repositories order each other by preferences Use Stable Marriages Assign costs and do many-to-one client-repository pairing

Cost based Client Assignment Assign cost to each potential pair Minimum Cost Assignment = {1,3,7} 7 6 1 3 8 9 5 Repositories

Cost based Client Assignment Assign cost to each potential pair Minimum Cost Assignment = {1,3,7} Load imbalance! Assignment = {1, 3, 8} 7 6 1 3 8 9 5 Repositories Minimum Weight Matching

Client Assignment An assignment may contribute to delay for other assignments at the same node Assignment = {1,3,8} 7 6 1 3 8 9 5 Minimum Weight Matching Repositories

Matching Bipartite Graph (G, V 1, V 2, E) E -> (V 1, V 2 ) Weight on each edge Matching Set of edges, where no vertex appears more than once Minimum weight matching Of all matchings, the one of minimum weight Many clients, few repositories Many to one matching 7 6 1 3 8 9 5

Many-to-one Matching: Min Cost Network Flows Directed graph, G={V, E} Start vertex End vertex or sink Edge Capacity: maximum flow the edge can have Cost: per unit flow Intermediate vertex Inflow = outflow Start End 4 2 3 2 2 2 5 2 8

A Possible flow in the network 2 Start End 2 2 3 2 1 5 2 2 Start End 4 2 3 2 2 5 2 8  2 Capacity Graph Flow Graph

Maximum Flow Value of the flow: flow leaving the source Maximum flow: value of flow is maximum Cost of flow =  edges  flow * cost per unit flow) Min Cost Flow: maximum flow of minimum cost 2 Start End 2 2 3 2 1 5 2 2

Client Assignment Using Network Flows X,Y, Z : number of clients the repository is willing to serve Capacity of edge = 1 Sum of capacities on edges  number of client requests End 1 Start X Y Z 11

Network Flows: Costs and Capacities edge Capacity : 1 Cost: function of communication delays and coherence requirement Cost of all other edges:0 Start 1 1 11 1 1 X Y Z

Max Flows Flow out of start node = number of client requests Each unit of flow makes one assignment Cost of unit flow = cost of assignment Maximum Flow of minimum Cost => required solution But this could overload the repositories! Start 1 1 11 1 1 X Y Z

Considering Load: Iterative Min Cost Flows Load depends on the coherence requirement of the assignments Assignments depend on this load! Limit the number of requests assigned to a repository using capacity But this number does not translate into load It translates to load if coherences are close to each other

Iterative Min Cost Flows Split the requests into ranges. For each range : Calculate the approximate load at each repository due to the previous assignments Calculate the approximate load of the assignments to be made in this range Determine the capacity of each repository Find min-cost max flow

For Each Range Number of updates for coherence c i is c i -2 Approximate load at a repository:A i. Average load A. For n client requests, expected load = n * c i -2 Number of repositories: k Let t i be the number of assignments in the current range to repository R i Total load at R i will be A i + t i * c i -2 Average load at R after assignment = Capacity for R i

Client Assignment Problem is NP-Hard! Input: S 1 …S s sources, R 1 …R r repositories, L 1 …L n clients, d 1 …d k data items For each S i and R j : list of data items served and the coherence requirements. triplets and a positive value  ij Distribution of each d j Communication delays and computational delays Question: Is there an assignment of every to some R k, such that c i R j  c i L j and minimum fidelity is at least  ij ?

Reduce Partition to the Client Assignment Problem Partition Input: Set S For each element in S, a positive weight w i A positive integer, k Question: Is there a partition of S into k equally weighted parts?

Best Effort Service Source p :0.2, q :0.2 r :0.2 p :0.4, r : 0.3 q : 0.3 R1R1 R4R4 R3R3 q :0.1 C1 Client will be served q at coherence 0.2 R2R2

Augmentation Source p :0.2, q :0.1 r :0.2 p :0.4, r : 0.3 q : 0.3 R1R1 R4R4 R3R3 q :0.1 C1 Coherence of A for q is changed to 0.1. R2R2

Experimental Methodology Network: 1 source, 10 - 20 repositories, 10,000 – 80,000 client requests Real stock traces: 100-1000 Time duration of observations: 10,000 s Ranges for min cost flow: {0.01-0.03, 0.03- 0.07, 0.07-0.2, 0.2-1.0} Network Flow Solver: RelaxIV from www.di.unipi.it/di/groups/optimize/ORGrou p.html

For comparison… Prior online Global Heuristic Selector node for each data item Selector keeps information of coherence requirements at repositories delays between the nodes in the network number of clients assigned to each repository Client is assigned to a repository where the sum of the delays is minimized. Two flavours: GHIS, GHES S. Agarwal et al. Construction of a Temporal Coherency Preserving Dynamic Data Dissemination Network. RTSS’04

Performance of the algorithms GHIS does better than MCF, GHES initially, but degrades rapidly unsatisfied requests source overloading! Augmentation performs very well GHES and MCF are comparable for small number of repositories 50% client requests between 0.01 to 0.09. Remaining from 0.1 to 0.99 GHIS

MCF vs GHES (best effort) MCF does better as the number of repositories increase In fact for some simple inputs, MCF did better than GHES by a factor of 9! Topology: 1 source, 10 repositories, 50 data items GHIS GHES MCF

Augmentation helps, but… as the load increases, augmentation increases loss in fidelity As load increases, serving clients at less stringent coherence requirements might actually reduce the loss in fidelity!

Need to adapt to load– Fair vs. biased approaches Fair ApproachBiased Approach It is better to be biased than to be fair! MCF_aug MCF

Adaptive Algorithm For each data item, source maintains a list of unique coherences and the number of clients for each coherence If the queuing delay at any source/repository crosses a threshold th 1 For each data item, the source reduces the coherence of service for some clients If the queuing delays at any source/repository goes below a threshold th 2. Resume service at desired coherency to some of the clients

Performance of the adaptive algorithm Augmented adaptation performs the best!

Conclusions and Current Work Conclusions We prove that the client assignment problem is NP-Hard Develop two new heuristics for the client assignment problem Develop an adaptive algorithm for client assignment Current Work Investigation of the algorithms in real network settings – Planet Lab.

Thank You!

Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar.

Similar presentations

Presentation on theme: "Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar.

Similar presentations

Presentation on theme: "Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar."— Presentation transcript:

Similar presentations

About project

Feedback