Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012
High throughput computing jobs No interactive deadline Tasks are independent of each other All tasks are ready for execution Unknown runtimes Execution Model: ◦ Allocate resources (e.g. machines) ◦ Run each task (once) from the bag on some machine 2
Unknown runtime distribution However, some distribution exists The total number of jobs is also known Tasks can be aborted 3
There are many Cloud providers. (EC2, Azure, Rackspace, 3Tera) Many types of machines even in the same provider, for a different price. ◦ CPU count and speed ◦ Memory size Upper limit on the number of machines assignable from a provider (self imposed) A machine is charged per ATU (Hour) 4
The Goal ◦ Run all the tasks from a given bag on cloud computers, meeting a limited budget ◦ Minimize the makespan of the whole bag (without exceeding the budget constraint) Assumption ◦ Running each task on a machine separately (FIFO) 5
The scheduler (BaTS) runs outside of the cloud (for free) The scheduler gets the Bag Of Tasks It allocates machines from each cloud Dispatch jobs to the allocated machines Receives feedback on tasks completion 6
7
8
9 Error Level Typical Values: 0.10,0.15,0.20,0.25
10 Required sample size (n) Bag Of Tasks Size (N)
11
12
13
14 ATU cost for machine of type i
15
16
17
18
19
20
Thus, BaTS continuously tries to avoid budget violations Theoretically, It’s easy. As the execution continues, the bag is smaller and the budget is smaller. The trouble is estimating the size of the bag at a given moment. (some machines will finish their current job before ATU ends) 21
22
23
24
25
26
27
28
“Machine speed” in each “cloud” was simulated according to 5 scenarios: 29 Profitability C2 w.r.t C1 Cloud 2 CostSpeed 1/441 3/ /
In each scenario, comparing RR to BaTS RR always uses machines BaTS initial configuration is machines and ◦ Budget B = the cost of running RR for that scenario ◦ Budget B = the cost of running only on the most “profitable” machine type. (computed offline) 30
31
32
33
BaTS helps choosing the cloud resources suitable for an application BaTS helps scheduling within budget while still performing reasonably well 34
Limitations ◦ The provided tests “cheat” because the number of machines is very small ◦ The “Tail phase” is not handled well (The “faster” machines will be released before the “slow” ones) ◦ Guessing a proper budget ◦ Actual Bags on actual clouds ◦ What about data transfer costs? ◦ Storage constraints? ◦ Other metric – maximize the profitability (or minimize the budget) while not exceeding a given makespan 35