Download presentation
Presentation is loading. Please wait.
Published byEvelyn Eaton Modified over 9 years ago
1
TimeThief: Leveraging Network Variability to Save Datacenter Energy in On-line Data- Intensive Applications Balajee Vamanan (Purdue UIC) Hamza Bin Sohail (Purdue) Jahangir Hasan (Google) T. N. Vijaykumar (Purdue)
2
Massive growth of big-data Unstructured data doubles every two years Mostly Internet data Consumed by Online, data-intensive (OLDI) applications Datacenters are critical computing platforms for OLDIs Big data and OLDI applications 2 OLDIs are important Many popular Internet applications are OLDIs e.g., Google Search, Facebook, Twitter Organize and provide access to Internet data Vital to many big companies
3
Challenges to energy management Traditional energy management does not work 1.OLDIs are interactive strict service-level agreements (SLAs) cannot batch 2.Shorter response times and inter-arrival times (e.g., 200 ms and ≈ 1 ms for Web Search) Low-power modes (e.g., p-states) not applicable 3.OLDIs distribute data over 1000s of servers each user query searches all servers Cannot consolidate workload to fewer servers 3 Energy management for OLDIs can only slowdown servers without violating SLAs
4
Previous work Pegasus [ISCA’14] slows down responses at low loads Response time more accurate than CPU utilization Shifts latency distribution at lower loads (uses datacenter-wide metrics) Achieves load-proportional energy Saves energy at lower loads Does not save much at higher loads Saves 0% at peak Datacenters operate at moderate-peak during the day (diurnal pattern) savings desired at high loads 4 Pegasus saves substantial energy at low loads but the savings are limited at high loads
5
TimeThief: contributions 1.Exploits slack from the network to slowdown compute i.e., 80% flows take 1/5 th of budget even at peak load 2.Reshapes response time distribution (at all loads) Slows sub-critical leaf servers for each query 3.Leverages network signals to determine slack Does not need fine-grained clock synchronization 4.Employs Earliest Deadline First (EDF) scheduling Decouples critical queries from slowed-down, previously- queued sub-critical queries 5 TimeThief identifies sub-critical queries and exploits their slack to save 12% energy even at the peak load
6
Introduction Motivation, previous work Our contributions Background OLDI architecture Network variability TimeThief Key Ideas Slack calculation & application EDF Methodology Results Talk Organization 6
7
Background: OLDI architecture OLDI = OnLine Data Intensive applications e.g. Web Search 1.Online: deadline-bound 2.Data Intensive: large datasets highly distributed Partition-aggregate (tree) Request-compute-reply Request: root leaf Compute: leaf node Reply: leaf data Wait until deadline Dropped replies affect quality e.g., SLA of 1% misses 7 Leaf 1Leaf nLeaf 1Leaf n Agg 1 Agg m Root Request Response Tail latency problem
8
OLDI tail latencies and SLA budgets Tail latency problem: Overall response time affected by slower leaves Deadline budgets based on 99 th -99.9 th percentiles of individual leaves’ replies for one query Component budgets: To optimize network and compute separately deadline budget split among nodes (compute) and network e.g., total = 200ms flow deadlines 10-20ms 8 10 20 30 80 200
9
OLDI traffic characteristics: Incast Many children respond around same time causing incast Causes packet drops and long tails OLDI trees have large degree Hard to absorb in buffers 1.Datacenter switches use shallow buffers for cost 2.Query multiplexing: many in-flight queries’ incast collide 3.Inevitable background traffic worsens incast Requests (parent child) also affected by reply incast (child parent) due to root randomization details in the paper 9 Root
10
Network variability and incast Network-variability is significant even with root randomization 10 Incast leads to long tails; network budgets based on tail latency are about 6-7X of median network delay TCP Timeouts Queuing in the network
11
Introduction Motivation, previous work Our contributions Background OLDI architecture Network variability TimeThief Key Ideas Slack calculation & application EDF Methodology Results Talk Organization 11
12
TimeThief: key ideas TimeThief exploits per-query per-leaf slack e.g., 80% of requests take 1/5 th of budget Three kinds of slack: Request (before compute) Compute (future extension) Reply (after compute) TimeThief exploits request slack Mechanism to determine request slack Machinery to effectively exploit slack EDF scheduling to shield tail (critical) requests Intel’s Running Avg. Power Limit (RAPL) to set power state 12 TimeThief reshapes response time distribution using per-query slack; EDF pulls the tail in
13
Determining request Slack To determine slack, timestamps at parent and leaf? Clock skew ≈ slack Estimate based on network signals 1.Explicit Congestion Notification (ECN) Switch marks if buffer occupancy > Threshold 2.TCP Timeouts Sender marks retransmitted packets (lost earlier) If a leaf doesn’t see either ECN or Timeouts request_slack = network_budget - median_latency else request_slack = 0 how much slack can be used to slowdown compute? depends on queuing (load) 13
14
Slowing down based on request slack Two questions 1.How much to slow down the current request? We don’t know the current request’s needs ahead! But compute budget accounts for tail! 2.How to account for queueing (load)? Attenuate the slack slowdown = request_slack * scale / compute_budget scale depends on load Feedback controller to dynamically determine scale Controller monitors response times of completed queries every 5 seconds (more details in the paper) scale += 0.05 if there is 5% room scale -= 0.05 otherwise 14
15
Earliest-deadline first scheduling A sub-critical request ( request_slack ≠0 ) could hurt another critical request ( request_slack =0 ) If the critical request is queued behind the sub-critical request EDF decouples critical and sub-critical We compute the deadline at the leaf node as follows: deadline = compute_budget + request_slack Note: Determining slack enables EDF! More details in the paper 15
16
Introduction Motivation, previous work Our contributions Background OLDI architecture Network variability TimeThief Key Ideas Slack calculation & application EDF Methodology Results Talk Organization 16
17
Methodology 1.Compute (service time, power) Real measurements feasible at a small scale 2.Network Tails effects only in large clusters ns-3 Workload: Web Search (Search) from CloudSuite 2.0 Search index from Wikipedia 3000 queries/s at peak load 90% load with 100 threads per leaf (4 sockets * 12 cores * 2 way SMT ≈ 100) Deadline budget: Overall 125 ms Network (request/reply): 25 ms Compute: 75 ms The budgets are in line with other papers SLA of 1% missed deadlines 17
18
Methodology (cont.) Service time and power measurements Service time measured at low load (no queueing) at the leaf server (matches other papers) Power measurement with RAPL Measurements at one leaf server Leaf servers are i.i.d Network Fat-tree topology 64 racks, 16 servers/rack 10 Gbps links 4 MB shared packet buffers 200 μs round-trip time (unloaded) 18
19
Service time and Power 19
20
Response time distribution 20 TimeThief reshapes distribution at all loads; saves 12% energy at the peak load TimeThief
21
Power state distribution 21 Even at the peak load, TimeThief slows down about 80% of the time using per-query slack
22
Conclusion TimeThief exploits per-query slack Request slack Compute slack (future work) Reply slack not easily predictable Reshapes response time distribution at all loads Saves 12% energy at peak load Leverages network signals to estimate slack Does not require fine-grain clock synchronization Employs EDF to decouple critical requests from sub-critical requests 22 TimeThief converts OLDIs’ performance disadvantage of latency tail into an energy advantage!
23
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.