TimeThief: Leveraging Network Variability to Save Datacenter Energy in On-line Data- Intensive Applications Balajee Vamanan (Purdue UIC) Hamza Bin Sohail.

TimeThief: Leveraging Network Variability to Save Datacenter Energy in On-line Data- Intensive Applications Balajee Vamanan (Purdue UIC) Hamza Bin Sohail (Purdue) Jahangir Hasan (Google) T. N. Vijaykumar (Purdue)

Massive growth of big-data  Unstructured data doubles every two years  Mostly Internet data  Consumed by Online, data-intensive (OLDI) applications  Datacenters are critical computing platforms for OLDIs Big data and OLDI applications 2 OLDIs are important  Many popular Internet applications are OLDIs  e.g., Google Search, Facebook, Twitter  Organize and provide access to Internet data  Vital to many big companies

Challenges to energy management Traditional energy management does not work 1.OLDIs are interactive  strict service-level agreements (SLAs)  cannot batch 2.Shorter response times and inter-arrival times (e.g., 200 ms and ≈ 1 ms for Web Search)  Low-power modes (e.g., p-states) not applicable 3.OLDIs distribute data over 1000s of servers  each user query searches all servers  Cannot consolidate workload to fewer servers 3 Energy management for OLDIs can only slowdown servers without violating SLAs

Previous work  Pegasus [ISCA’14] slows down responses at low loads  Response time more accurate than CPU utilization  Shifts latency distribution at lower loads (uses datacenter-wide metrics) Achieves load-proportional energy  Saves energy at lower loads  Does not save much at higher loads  Saves 0% at peak  Datacenters operate at moderate-peak during the day (diurnal pattern)  savings desired at high loads 4 Pegasus saves substantial energy at low loads but the savings are limited at high loads

TimeThief: contributions 1.Exploits slack from the network to slowdown compute  i.e., 80% flows take 1/5 th of budget even at peak load 2.Reshapes response time distribution (at all loads)  Slows sub-critical leaf servers for each query 3.Leverages network signals to determine slack  Does not need fine-grained clock synchronization 4.Employs Earliest Deadline First (EDF) scheduling  Decouples critical queries from slowed-down, previously- queued sub-critical queries 5 TimeThief identifies sub-critical queries and exploits their slack to save 12% energy even at the peak load

 Introduction  Motivation, previous work  Our contributions  Background  OLDI architecture  Network variability  TimeThief  Key Ideas  Slack calculation & application  EDF  Methodology  Results Talk Organization 6

Background: OLDI architecture OLDI = OnLine Data Intensive applications e.g. Web Search 1.Online: deadline-bound 2.Data Intensive: large datasets  highly distributed  Partition-aggregate (tree)  Request-compute-reply  Request: root  leaf  Compute: leaf node  Reply: leaf  data  Wait until deadline  Dropped replies affect quality  e.g., SLA of 1% misses 7 Leaf 1Leaf nLeaf 1Leaf n Agg 1 Agg m Root Request Response Tail latency problem

OLDI tail latencies and SLA budgets  Tail latency problem: Overall response time affected by slower leaves  Deadline budgets based on 99 th -99.9 th percentiles of individual leaves’ replies for one query  Component budgets:  To optimize network and compute separately  deadline budget split among nodes (compute) and network  e.g., total = 200ms  flow deadlines 10-20ms 8 10 20 30 80 200

OLDI traffic characteristics: Incast  Many children respond around same time causing incast  Causes packet drops and long tails  OLDI trees have large degree  Hard to absorb in buffers 1.Datacenter switches use shallow buffers for cost 2.Query multiplexing: many in-flight queries’ incast collide 3.Inevitable background traffic worsens incast  Requests (parent  child) also affected by reply incast (child  parent) due to root randomization  details in the paper 9 Root

Network variability and incast  Network-variability is significant even with root randomization 10 Incast leads to long tails; network budgets based on tail latency are about 6-7X of median network delay TCP Timeouts Queuing in the network

TimeThief: key ideas  TimeThief exploits per-query per-leaf slack  e.g., 80% of requests take 1/5 th of budget  Three kinds of slack: Request (before compute)  Compute (future extension)  Reply (after compute)  TimeThief exploits request slack  Mechanism to determine request slack  Machinery to effectively exploit slack  EDF scheduling to shield tail (critical) requests  Intel’s Running Avg. Power Limit (RAPL) to set power state 12 TimeThief reshapes response time distribution using per-query slack; EDF pulls the tail in

Determining request Slack  To determine slack, timestamps at parent and leaf?  Clock skew ≈ slack Estimate based on network signals 1.Explicit Congestion Notification (ECN)  Switch marks if buffer occupancy > Threshold 2.TCP Timeouts  Sender marks retransmitted packets (lost earlier) If a leaf doesn’t see either ECN or Timeouts request_slack = network_budget - median_latency else request_slack = 0  how much slack can be used to slowdown compute?  depends on queuing (load) 13

Slowing down based on request slack Two questions 1.How much to slow down the current request?  We don’t know the current request’s needs ahead! But compute budget accounts for tail! 2.How to account for queueing (load)? Attenuate the slack slowdown = request_slack * scale / compute_budget scale depends on load Feedback controller to dynamically determine scale  Controller monitors response times of completed queries every 5 seconds (more details in the paper) scale += 0.05 if there is 5% room scale -= 0.05 otherwise 14

Earliest-deadline first scheduling  A sub-critical request ( request_slack ≠0 ) could hurt another critical request ( request_slack =0 )  If the critical request is queued behind the sub-critical request  EDF decouples critical and sub-critical  We compute the deadline at the leaf node as follows: deadline = compute_budget + request_slack Note: Determining slack enables EDF! More details in the paper 15

Methodology 1.Compute (service time, power)  Real measurements  feasible at a small scale 2.Network  Tails effects only in large clusters  ns-3 Workload: Web Search (Search) from CloudSuite 2.0  Search index from Wikipedia  3000 queries/s at peak load  90% load with 100 threads per leaf (4 sockets * 12 cores * 2 way SMT ≈ 100)  Deadline budget: Overall 125 ms  Network (request/reply): 25 ms  Compute: 75 ms  The budgets are in line with other papers  SLA of 1% missed deadlines 17

Methodology (cont.)  Service time and power measurements  Service time measured at low load (no queueing) at the leaf server (matches other papers)  Power measurement with RAPL  Measurements at one leaf server  Leaf servers are i.i.d  Network  Fat-tree topology  64 racks, 16 servers/rack  10 Gbps links  4 MB shared packet buffers  200 μs round-trip time (unloaded) 18

Service time and Power 19

Response time distribution 20 TimeThief reshapes distribution at all loads; saves 12% energy at the peak load TimeThief

Power state distribution 21 Even at the peak load, TimeThief slows down about 80% of the time using per-query slack

Conclusion  TimeThief exploits per-query slack Request slack  Compute slack (future work)  Reply slack  not easily predictable  Reshapes response time distribution at all loads  Saves 12% energy at peak load  Leverages network signals to estimate slack  Does not require fine-grain clock synchronization  Employs EDF to decouple critical requests from sub-critical requests 22 TimeThief converts OLDIs’ performance disadvantage of latency tail into an energy advantage!

Thank you

TimeThief: Leveraging Network Variability to Save Datacenter Energy in On-line Data- Intensive Applications Balajee Vamanan (Purdue UIC) Hamza Bin Sohail.

Similar presentations

Presentation on theme: "TimeThief: Leveraging Network Variability to Save Datacenter Energy in On-line Data- Intensive Applications Balajee Vamanan (Purdue UIC) Hamza Bin Sohail."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

TimeThief: Leveraging Network Variability to Save Datacenter Energy in On-line Data- Intensive Applications Balajee Vamanan (Purdue UIC) Hamza Bin Sohail.

Similar presentations

Presentation on theme: "TimeThief: Leveraging Network Variability to Save Datacenter Energy in On-line Data- Intensive Applications Balajee Vamanan (Purdue UIC) Hamza Bin Sohail."— Presentation transcript:

Similar presentations

About project

Feedback