WORKING DRAFT Approximation Algorithm for Soft-Capacitated Connected Facility Location Problems 7'th Israeli Network Seminar 2012 Prof. Danny Raz and Assaf Rappaport 17/05/2012 Data Centers Placement
1 Contents ▪ Data Centers ▪ Facility Location Problem ▪ Steiner Tree ▪ Connected Facility Location ▪ Google Case Study
Data centers are becoming the hosting platform for a wide spectrum of composite applications 1 services 2 Database services 3 File Servers 4 Collaboration tools 5 CRM (Customer Relationship Management) 6 ERP (Enterprise Resource Planning) 7 E-Commerce 2 ▪ Data centers are used to run applications that handle the core business and operational data of organizations: – SaaS – Software as a Service – HaaS – Hardware as a service – PaaS – Platform as a Service Examples of data centers applications
In recent years, large investments have been made in massive data centers supporting cloud services 3 A list of companies that are running at least 50,000 servers SOURCE:Data Center Knowledge (DCK)
With an increasing trend towards communication intensive applications, the bandwidth usage within and between data centers is rapidly growing 4
Data centers placement presents challenging optimization problems (1/2) 5 1 Number of facilities 2 Location 3 Assignment 1 Graph with costs on edges 2 Set of locations where facilities may be placed 3 Set of demand nodes that must be assigned to an open facility
Data centers placement presents challenging optimization problems (2/2) 6 1 Number of facilities 2 Location 3 Assignment 1 Graph with costs on edges 2 Set of locations where facilities may be placed 3 Set of demand nodes that must be assigned to an open facility
The goal is to optimally place the applications and their related data over the available infrastructure 7 Consider the following scenario: ▪ An application in the cloud depends on an authentication service ? ▪ We consider the problem of placing replicas of the authentication servers at multiple locations in the data center Data center
Replica placement deals with the actual number and network location of the replicas 8 ? ▪ Having more replicas is more expensive so we need to model the cost ▪ We would like to minimize the network distance between an application server and the closest replica and thus having more replicas helps ▪ A replica must be synchronized with the original content server in order to supply reliable service ▪ The synchronization traffic across the network depends on the number of replicas deployed in the network, the topology of the distributed update and the rate of updates in the content of the server
9 Contents ▪ Data Centers ▪ Facility Location Problem ▪ Steiner Tree ▪ Connected Facility Location ▪ Google Case Study
The general uncapacitated facility location problem (1/2) 10 ▪ Set D of clients ▪ Set F of potential facility locations ▪ A distance function ▪ A cost function InputOutput ▪ ▪ Set of potential facility sites where a facility can be opened ▪ Set of demand points D that must be serviced ▪ We want the facilities to be as efficient as possible, thus we want to minimize the distance from each client to its closest facility. ▪ There can be a cost associated with creating each facility that also must be minimized, otherwise all points would be facilities ▪ Minimize the sum of distances, plus the sum of opening costs of the facilities Description
The general uncapacitated facility location problem (2/2) 11 Customers D Facilities F d ij fjfj Facility Location (FL) Problem: Open a subset of facilities & connect customers to one facility each at minimal cost
12 Uncapacitated Facility Location Problem The Fermat-Weber Problem The point minimizing the sum of distances to the sample points: Given set of m points and positive multipliers Find a point that minimizes 17 th century1960s1997 Constant-factor approximation algorithm Stollsteimer Balinski and Wolfe Kuehn and Hamburger Manne Plant location problem or warehouse location problem Shmoys, Tardos and Aardal give a first polynomial-time algorithm that finds a solution within a factor of 3.16 of the optimal Uncapacitated facility location problem - History
13 Contents ▪ Data Centers ▪ Facility Location Problem ▪ Steiner Tree ▪ Connected Facility Location ▪ Google Case Study
2 26 Steiner Tree Problem 14 ▪ Given: – An undirected weighted graph G(V,E) – A set of nodes S (subset of V) Input ▪ Find the minimum cost tree that spans the nodes in S ▪ Which is the Steiner tree for the green nodes? ▪ Shortest path tree doesn’t equal Steiner tree Output
15 Contents ▪ Data Centers ▪ Facility Location Problem ▪ Steiner Tree ▪ Connected Facility Location ▪ Google Case Study
Connected Facility Location client facility node ▪ Given: Input Graph G=(V,E), costs {c e } on edges and a parameter M ≥ 1 F : set of facilities D : set of clients (demands) Facility i has facility cost f i c ij : distance between i and j in V client facility Cost = I in A f i + j in D c i(j)j + M e in T c e = facility opening cost + client assignment cost + cost of connecting facilities Assign each demand j to an open facility i(j) Steiner tree Connect all open facilities by a Steiner tree T open facility Pick a set A of facilities to open We want to:
Soft-ConFL algorithm – the first deterministic constant approximation algorithm for the soft capacitated connected facility location problem 17 Text Ρ-approximation algorithm for the Uncapacitated Facility Location Problem μ-approximation algorithm for the minimum Steiner Tree Problem Add a cost λi to each facility: This cost is defined as twice the minimum cost of satisfying M units of demand from facility i. fjfj d ij Modify the distance function by adding:
Deterministic constant approximation algorithm 18
Proof of lemma 1 19
Proof of lemma 1 20
Proof of lemma 1 21 Convert into a binary tree <M 3M>
22 Contents ▪ Data Centers ▪ Facility Location Problem ▪ Steiner Tree ▪ Connected Facility Location ▪ Google Case Study
Google data centers Google data centers world wide Google data centers in the USA Google data centers in Europe ▪ Google operates data centers in: – 19 in the US – 12 in Europe – one in Russia – one in South America – 3 in Asia ▪ Not all of the locations are dedicated Google data centers
Google data centers – Case example 24 X 36 Google data centers How many replicas? Locations? Unified demand Unified cost Geographic distance
Google data centers: Greedy vs. CoFL ▪ Facility cost: 5,000-10,000 ▪ Min SPT: 22,000 ▪ Total demand: 36 Greedy CoFL
Google data centers: Greedy vs. UFL vs. CoFL Greedy UFL CoFL ▪ Facility cost: 5,000 ▪ Min SPT: 22,000 ▪ Total demand: 36
Google data centers: Greedy vs. UFL vs. CoFL Greedy UFL CoFL ▪ Facility cost: 3,000 ▪ Min SPT: 22,000 ▪ Total demand:
Google data centers: Greedy vs. UFL vs. CoFL Greedy UFL CoFL ▪ Facility cost: 3,000 ▪ Min SPT: 22,000 ▪ Total demand: 36
▪ Facility cost: 1,000 ▪ Min SPT: 22,000 ▪ Total demand: 36 CoFL
▪ Facility cost: 1,000 ▪ Min SPT: 22,000 ▪ Total demand: %5.6%8.3%11.1%13.9% Mountain View, Calif. Pleasanton, Calif. San Jose, Calif. Los Angeles, Calif. Palo Alto, Calif. Seattle Portland, Oregon The Dalles, Oregon Chicago Atlanta, Ga. (two sites) Reston, Virginia Ashburn, Va. Virginia Beach, Virginia Houston, Texas Miami, Fla. Lenoir, North Carolina Goose Creek, South Carolina Pryor, Oklahoma Council Bluffs, Iowa Toronto, Canada Berlin, Germany Frankfurt, Germany Munich, Germany Zurich, Switzerland Groningen, Netherlands Mons, Belgium Eemshaven, Netherlands Paris London Dublin, Ireland Milan, Italy Moscow, Russia Sao Paulo, Brazil Tokyo Hong Kong Beijing CoFL
31 2.8%
32 Greedy UFL CoFL
The Steiner tree problem is NP-hard 34 Reduction We will show that a known NP-hard problem can be solved in polynomial complexity if the Steiner decision problem can be solved in polynomial complexity Exact cover by 3-sets is NP-hard X = {x 1, x 2,……, x 3p } C = {C 1, C 2,….. C q } C i X | |C i |=3, i=1,…..q Is it possible to select mutually disjoint subsets such that their union is X? v C1C1 C2C2 C3C3 C4C4 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8 x9x9 x 10