Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Search Using Mobile Cores Presented by: Luwa Matthews 0.

Similar presentations


Presentation on theme: "Web Search Using Mobile Cores Presented by: Luwa Matthews 0."— Presentation transcript:

1 Web Search Using Mobile Cores Presented by: Luwa Matthews 0

2 Datacenters moving towards parallel execution frameworks Distribute computation among massive number of cores, not few high-powered cores, workloads memory or I/O intensive, e.g. OLTP Smaller cores more power efficient, mobile cores designed specifically for power efficiency Leverage mobile cores for datacenter workloads is conventional wisdom High-Level Summary

3 Bing web-search is more compute intensive than most datacenter workloads due to machine learning heuristics Mobile processors not powerful enough, Server processors too power inefficient Could either improve performance of mobile cores or make server processors more efficient Mobile processors are much closer to ideal power-performance tradeoff and could even provide more reliability Need small microarchitectural modifications to make mobile processors even more robust High-Level Summary

4 Bing Web-search more Compute-intensive Bing web-search exhibits more instruction-level parallelism, with IPC exceeding 1.

5 How does web-search work? After crawling, words are indexed Each Index Serving Node (ISN) in charge of range of indices Aggregator sends query to multiple ISNs to check their indices ISNs return with ranked pages, which Aggregator aggregates Bulk of computing load done by ISNs which use complex heuristics in ranking

6 Requirements for Web-search Robustness: QoS ( θ ) – minimum percentage of successful queries Latency – aggregator needs response within certain time limit Throughput – max QPS arriving able to maintain given θ Flexibility and Absolute Load Can underlying architecture gracefully react to spikes in QPS without dramatically reducing QoS Reliability and Relative Load Can datacenter tolerate hardware failures, which are bound to happen? QoS should degrade slowly.

7 Comparing 2 Processors on Extreme Ends Authors focus on one ISN to see how web-search stresses each processor on metrics listed above This will inform us on the power-performance tradeoffs and where bottlenecks occur

8 Web-search Execution Characteristics Terminology: X_ θ or A_ θ – max incoming Xeon or Atom QPS to not violate θ % success rate Lack of specialized units, smaller caches, deeper pipeline, in-order execution affect performance of Atom

9 Web-search Power Characteristics Average Xeon power ~ 62.5W, Average Atom power ~3.2W Over an order of magnitude difference in power!

10 So, is the extra performance worth it? On all metrics on processor and core level, Atom significantly more efficient than Xeon

11 Price of Atom’s Efficiency: Robustness Past.3X_ θ, Atom struggles to maintain QoS target, Dramatically less robust than Xeon

12 Price of Atom’s Efficiency: Latency Query latency distribution Atom latency is about 3x Xeon’s Atom shows wider variation in latancy with variations in load Longer latencies mean less relevant results (why?)

13 Price of Atom’s Efficiency: Query Complexity Complex queries (ands, ors, etc) broken down into simpler ones, but can’t be split among ISNs More susceptible to longer latency, maxing out queues and dropping requests

14 How do we reduce the price of Atom’s Efficiency? -reduce platform overheads Consolidating cores in Atom Hypothetical reduces platform overhead/operational costs while boosting throughput

15 How do we reduce the price of Atom’s Efficiency? -microarchitectural modifications Notice that bottlenecks are evenly distributed across different phases

16 15 Proposals for Datacenter Design Having caches the size of Xeon caches while leaving the Atom datapath the same Mitigate effects of cache bottlenecks Pack more Atom cores in a single chip to offset operational costs Specialized processing units Authors argue this might not be fruitful, given how different components of search uniformly stress the memory system Only small performance increases Heterogeneity Having heterogeneous cores with the ability to steer complex queries to Xeons offers more efficiency and robustness


Download ppt "Web Search Using Mobile Cores Presented by: Luwa Matthews 0."

Similar presentations


Ads by Google