On the Placement of Web Server Replicas Yu Cai
Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom 2001
What is the paper talking about? Web server replicas placement problem. –A popular Web site aims to improve its performance (e.g., reducing its clients’ perceived latency) by pushing its content to some hosting services. –The problem is to choose M replicas (or hosting services) among N potential sites (N > M) such that some objective function is optimized under a given traffic pattern. –The objective function can be minimizing either its clients’ latency, or its total bandwidth consumption, or an overall cost function if each link is associated with a cost.
Contribution of the paper Present several placement algorithms Evaluate the performance by simulating on synthetic and real network.
Algorithms Tree based algorithm: –The underlying topologies are trees, and modeled it as a dynamic programming problem. –The algorithm was originally designed for Web proxy cache placement, and it is also applicable for Web replica placement. –Unrealistic assumption and not so good performance under normal network.
Algorithms Greedy algorithm –Evaluate N potential site to determine its suitability by assuming all clients traffic converge at this site –Pick the best one –Repeat step 1 and 2 for the rest N-1 site, –Until we pick M sites.
Algorithms Random algorithm –Randomly pick M sites from N sites. –Can be improve by introducing genetic evolution. Hotspot algorithm –Place replicas near the clients generating the greatest load. –Can be improve by using client clustering. Super optimal algorithm to get lower bound –May not be feasible solution, only used for comparison.
Simulation Use GT-ITM to generate random network topology –Transit-stub, hierarchical graph Use BGP routing tables to generate real Internet topology –AS hop count ???
Simulation Web workload and client generation –Use the access logs collected from real web sites, like MSNBC –Cluster the web clients who are topologically close to each other –Top 10, 100, 1000 and 3000 clusters account for 24%, 45%, 78% and 94% requests. –Map the clusters randomly to the nodes in the simulation network
Simulation Evaluate the effects of imperfect data –Salt the input data with random noise of uniform distribution. Evaluate the effects of dynamic network –Input data change over time.
Conclusion Greedy perform the best –Error increase or network changes, the performance degrades slightly.