Algorithms for Selecting Multiple Mirror Sites for Parallel Download Yu Cai 12 / 2003 UCCS
Introduction By utilizing the HTTP 1.1 byte range header, we can retrieve a specific range of data from a mirror server site. It provide possibility to retrieve documents from multiple mirror sites in parallel to increase the downloading speed. But the mirror sites selection is still a problem. Different server selection may result in different performance.
Diagram of Parallel Download
Project Goal In this project, we develop algorithms to choose the best multiple mirror sites for parallel download. We implement brutal force algorithms as well as genetic algorithms. We test the algorithms in the simulated network as well as the real-world network.
Related Work on Algorithms Mirror server and cache server selection problem has been studied recent years. Formal approach: abstract network model; use graph theory. Common assumptions when getting network model: a) network topology is known, b) the cost associated with each path is known, c) single and static network connections.
Related Work on Algorithms Algorithms include: (selecting M replicas among N potential sites) NP-hard problem. Need to develop heuristic algorithms, or by loosing the optimal constrains to simplify the problem to make it solvable in P- time. tree-basedgreedyrandomhot spot O(N 3 M 2 )O(N 2 M)O(NM)N 2 + min (NlogN, NM)
Problems to be studied 1) What is the possible maximum download speed for a given network topology? We refer to it as “global max speed”. 2) How many mirror sites to need to be chosen to achieve the global max speed, and which are them? 3) If we only want to choose a certain number of mirror sites, say 5 sites, what is the maximum download speed we can get, and which 5 sites to choose? We refer to the speed as “n sites max speed”. 4) There might be multiple selection results to achieve the max speed. what is the criteria to pick “the best from the best”? 5) What is the complexity of the algorithm?
Network Graph Model G=(V, E), –V: the set of nodes –E: the set of edges/paths The maximum download speed at node r using mirror server set S, mds(r,S)= The maximum download speed by selecting k mirror servers from set S, k-pds(c,S)=max{mds(c, S’)|S’ S, S’ has k nodes}
An Example mds(S1,S)=30, mds(S2,S)=25 mds(R2,S)=min(mds(S1,S),5)+min(mds(S2,S),8) =min(30,5)+min(25,8)=5+8=13, similarly, mds(R3, S)=16, mds(R1,S)=min(mds(R2,S),40)+min(mds(R3,S),3 0)=min(13,40)+min(16,30)=13+16=29, mds(c, S)=mds(R1, S)=29, 3-pds(c,S)=max{mds(c,{S1,S2,S3}),mds(c,{S1, S2,S4}),mds(c,{S1,S3,S4}),mds(c,{S2,S3,S4})} =max{20,22,21,24}=24, the subset of mirror servers to use 3-pdsSubset(c,S)={S 2,S 3,S 4 }. Similarly, 2-pds(c,S)=17 and 2-pdsSubset(c, S)={S 2,S 4 }.
Algorithm Implementation Brutal Force Algorithm: implements previous formulas. –for mds –for k-pds Genetic Algorithm: –fix-length genetic algorithm –variable-length genetic algorithm.
Genetic Algorithm 1) Assign the sequential server number, node number and path number to denote each server, node and path. Assign the initial bandwidth and server speed. 2) Initialize the first generation of chromosomes with random length by filling server number in chromosome. 3) Crossover and mutation at certain probability. Make sure no duplicated server in chromosome, and the length of chromosome is less than the given number. Several different crossover and mutation methods have been combined together for better performance. 4) Fitness function. For a given chromosome S’, use the max download speed mds(c, S’) as fitness function. 5) Run certain generations, and output the result.
Crossover in Genetic Algorithm
Testing Results We tested the algorithms on simulated network as well as real-world network. GT-ITM (Georgia Tech Internetwork Topology Models), is used to generate network topologies with varying sizes for simulation.
Parallel Download Algorithm Performance
Future Work Extend work to proxy server based multipath connections. Investigate more algorithms and related works. Do more simulation and performance test. Develop non-heuristic algorithms