Download presentation
Presentation is loading. Please wait.
Published byAugustus Watson Modified over 9 years ago
1
CoolAir Temperature- and Variation-Aware Management for Free-Cooled Datacenters Íñigo Goiri, Thu D. Nguyen, and Ricardo Bianchini 1
2
Hybrid: typical + free cooling Typical datacenter cooling Filters Evaporative cooler Fans Server racks Outside air Cooling tower Water chiller Air handling unit Server racks Microsoft DC in Chicago 2 Free cooling
3
Potentially negative impact on hardware reliability, especially disks High temperature Wide temperature variation High humidity Free cooling limitations 3 Outside Disk Inlet Outside temp directly impacts inlet and disk temps Daily temperature variation can be large
4
Roadmap Motivation and background CoolAir: Managing free-cooled datacenters Cooling modeling Cooling management Compute management CoolAir for Parasol Evaluation and general lessons Conclusions 4
5
Energy-aware management of cooling & workload Minimize hardware reliability issues Limit temperature and relative humidity Reduce temperature variation Major tasks 1.Predict conditions and energy 2.Select best cooling settings 3.Apply cooling settings 4.Place and schedule load CoolAir Datacenter CoolAir: Managing free-cooled datacenters Cooling Servers Cooling Manager Compute Manager Cooling Modeler Weather Forecast 5
6
Predictions based on linear regression model Datacenter Cooling modeling Historic Data Cooling Learner Cooling Model Temperature inside Humidity inside Cooling power Temperature outside Location in the datacenter Datacenter utilization Cooling setting Temperature inside/outside Humidity outside Cooling setting Cooling operation 6
7
Use predictions from cooling model Reduce variation with a temp band based on expected outside temp Maintain temperature within the band Middle: forecast outside temp + offset Periodically Predict environmentals and energy Select best settings using utility Apply cooling settings Cooling management Band selection example Temperature 7 Average Outside temperature forecast Hour 06 12 2418 Offset
8
Compute management Spatial placement Distribute load to servers Group servers into “pods” of similar behavior Reduce solving and modeling complexity Favor pods with higher heat recirculation Against common practice in non-free-cooled DCs Lower recirculation pods are closer to cooling → temperature variation Temporal scheduling When to execute deferrable loads (see paper) 8 Sensors Server Pod Front view of Parasol’s racks Rack 1 Rack 2
9
Roadmap Motivation and background CoolAir: Managing free-cooled datacenters CoolAir for Parasol Evaluation and general lessons Conclusions 9
10
Case study: Parasol Default cooling controller: Outside temperature ≤ 30⁰C → Free cooling with variable fan speed Outside temperature > 30⁰C → AC cycling with hysteresis External view Internal layout (top view) Exhaust Free Cooling Rack 1 Rack 2 Cold aisleHot aisle Cooling Controller Door Relays Partition Air duct 10 Air Conditioner
11
CoolAir for Parasol Data collection and model learning Historical sensor info for two months Generated extreme settings to learn faster Cooling configurer Interface with Parasol’s “thermostat” Control fan speed and AC Compute configurer for Hadoop Send idle worker nodes to sleep while keeping data available >90% with <0.5⁰C errors 11
12
Example of CoolAir on Parasol 12
13
Evaluation methodology Parasol as the baseline system 64 Atom servers: 8 pods in 2 racks Hadoop workloads: Non-deferrable Facebook (see paper for others) Real experiments and validated simulations Evaluated policies (see paper for others) 13 PolicyTemperatureHumidityEnergySpatial placement BaselineReactive <30⁰C ✔✔✘ CoolAirAdaptive band ✔✔ High recirculation
14
Baseline vs CoolAir 14 Warmer locations are more inefficient Up to 4⁰C reduction Up to 60% reduction
15
Multiple geographical locations Power efficiency (PUE) improvement Reduction in max temperature range Improves PUE in warmer locations where PUE is worse Reduces variation the most in colder locations where variation is highest Sacrifices PUE slightly in cold locations 15 -0.02 to -0.01
16
Principles and lessons learned Variation management requires fine-grain cooling and load control Management challenges depend on the climate Warm: managing absolute temperature costs more than variation Cold: managing temperature variation is more critical and successful Temp band and spatial placement are key; temporal scheduling is not Other lessons in the paper 16
17
Conclusions CoolAir successfully manages Absolute temperature and temperature variation Relative humidity Energy CoolAir broadens the set of areas where free cooling can be used Principles should apply to larger datacenters 17
18
CoolAir Temperature- and Variation-Aware Management for Free-Cooled Datacenters Íñigo Goiri, Thu D. Nguyen, and Ricardo Bianchini 18
19
Motivation Typical cooling in datacenters Chillers, cooling tower, air handlers Very energy hungry Bring cool air from outside: free cooling Reduces cooling energy Typically used in cooler and drier climates Warmer locations: hybrid Free cooling when external temperature and humidity are suitable 19 Microsoft DC in Chicago
20
Validation of models and simulations ~80% with <0.5⁰C errors Real behavior Simulation Simulation close to real behavior 20
21
Example of CoolAir on Parasol 21
22
Principles and lessons learned (long version) Absolute temperatures and variations are high in many locations Variability management requires fine-grain cooling and load control Management challenges depend on the climate Warm: managing absolute temperature costs more than variation Cold: managing temperature variation is more critical and successful Temp band and spatial placement are key; temporal scheduling is not Management is “easier” when allowed temperatures can be higher Weather forecast inaccuracy is not a problem (temp band) Energy cost of CoolAir is low even in hot climates 22
23
Why simulation? Limitations of a real system simulation External conditions change → can’t compare runs Results for a whole year at multiple locations around the world Average simulation errors < 6% Parasol cooling changes are too abrupt variable-speed AC 23
24
Absolute temperature violation No violations Small violations 24 Large violations CoolAir
25
Maximum daily temperature variation Reduced variation 25 CoolAir Show only maximum
26
Power efficiency Low efficiency PUE (Power Usage Efficiency) Warmer locations are more inefficient Small increase over the Energy version 26 CoolAir
27
Evaluated policies Policies isolate the impact of CoolAir characteristics PoliciesTemperatureHumidityEnergySpatial BaselineReactive <30⁰C ✔✔✘ TemperaturePredictive <30⁰C ✔✔ Low recirculation VariationAdaptive band ✔✘ High recirculation EnergyPredictive <30⁰C ✔✔ Low recirculation CoolAirAdaptive band ✔✔ High recirculation 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.