Chih-Hsun Chou Daniel Wong Laxmi N. Bhuyan DynSleep: Fine-grained Power Management for a Latency-Critical Data Center Application Chih-Hsun Chou Daniel Wong Laxmi N. Bhuyan
Outline Background & Motivation DynSleep Prototype with Memcached Data Center Workload Characteristics. Existing Approaches. DynSleep Prototype with Memcached Experimental Evaluation
Data Center Latency-Critical Workloads Characteristics Server utilization Lightly loaded. Short-term variability. Request processing ON/OFF execution pattern. Non-deterministic. Poor energy efficiency at low server utilization
Power Saving Opportunities Target QoS is defined at peak load. Low utilization servers create latency slack. Exploiting this slack for power saving. Tail latency under light load Target tail latency Latency Slack
Existing Approaches DVFS: reducing the processing rate. Sleep States Limited room for down scaling. Limited power saving. Per-core control is not common. Sleep States Limited by the length of idle periods. Frequency (GHz) 2.7 2.4 2.1 1.8 1.5 1.2 Voltage (V) 0.99 0.96 0.94 0.92 0.90 0.88 Active Power (W) 3.42 2.93 2.49 2.05 1.68 1.31 56% frequency reduction 13% voltage reduction 61% power reduction State State transition time Target Residency Power C0 N/A (3~3.5 W) C1 1 μs 1.2 W C3 59 μs 156 μs 0.13 W C6 89 μs 300 μs 0 W
Observations Our Solution: DynSleep Short idle periods cause high idle power. Traffic variability. Fine-grained control over time and space domain. Our Solution: DynSleep
DynSleep: Overview Utilizing per-core sleep state. (space domain) Postponing the request service. Transform scattered idle periods into a longer one for deep sleep state. (reduce idle power) Dynamically determine core wake-up time. Satisfy the target tail latency constraint. (time domain)
DynSleep: Example at t=A2 at t=A3 at t=A1 t=0 W3 W1 time R1 arrives Target Tail Latency Target Tail Latency Target Tail Latency R1 arrives R2 arrives R3 arrives
DynSleep: Power consumption behavior Baseline DynSleep Active Shallow sleep Deep sleep Time
Case Study: Memcached Clients Worker Thread ∙ ∙ ∙ Libevent Request Processing Send Response req result Read and Parse Data fd Client send requests Memcached Server libevent monitors network sockets through epoll for the request arrivals. Independent threads and requests.
Memcached with DynSleep Libevent Thread Request Processing Thread Clients Libevent Request Processing Send Response DynSleep Manager DynSleep Calculator req result fd1 fd2 Register/Update Timer Thread Communication Read and Parse Data fd Wakeup signal Sleep signal Two separate threads. A core is woken up by wakeup signal.
Evaluation: Experiment Setup A client server and a request processing server connected over 10G Ethernet. Intel Xeon E5 2697-V2 12-core processor. Only support per-core DFS. On-chip energy sensors with 1KHz sampling rate.
Evaluation: Power Saving At low to medium load, lager latency slack leads to high power saving of DynSleep. At high load, DynSleep power saving is comparative to DVFS scheme. DynSleep significantly outperforms per-core DFS scheme.
Evaluation: Latency Distribution Baseline 95th: 187µs DFS 95th: 448µs Target : 686µs DynSleep 95th: 665µs Load 0.3 Baseline has about 500 µsec latency slack. Large gap still left in DFS(DVFS) scheme because of the limited VF down scaling. DynSleep effective close the gap by postponing request processing.
Evaluation: Load Changes At low load, DFS can’t fully utilize the latency slack. At high load, DFS lacks the responsiveness and frequently violate the constraint. DynSleep responds instantaneously to the load changes because of the request level updates.
Conclusion Major source of the energy inefficiency comes from the idle power. Non-deterministic and short idle periods. We propose DynSleep Reshape the idle periods pattern. Utilize deep sleep states. Dynamically wake up to meet the strict QoS constraint. Our memcached prototype demonstrates up to 65% power saving.