Predictive Parallelization: Taming Tail Latencies in Web Search Myeongjae Jeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, Sameh Elnikety, Alan L. Cox, Scott Rixner Microsoft Research, POSTECH, Rice University
Performance of Web Search 1) Query response time Answer quickly to users (e.g., in 300 ms) 2) Response quality (relevance) Provide highly relevant web pages Improve with resources and time consumed Focus: Improving response time without compromising quality
Background: Query Processing Stages Focus: Stage 1 Query 100s – 1000s of good matching docs doc Doc. index search For example: 300 ms latency SLA 10s of the best matching docs 2nd phase ranking Few sentences for each doc Snippet generator Response
Goal Query Speeding up index search (stage 1) without compromising result quality Improve user experience Larger index serving Sophisticated 2nd phase doc Doc. index search For example: 300 ms latency SLA 2nd phase ranking Snippet generator Response
A slow server makes the entire cluster slow How Index Search Works Query Pages Partition all web pages across index servers (massively parallel) Distribute query processing (embarrassingly parallel) Aggregate top-k relevant pages Index server Aggregator Top-k pages Top-k pages Problem: A slow server makes the entire cluster slow Partition All web pages
We need to reduce its tail latencies Observation Query processing on every server. Response time is determined by the slowest one. We need to reduce its tail latencies Latency
Examples Terminate long query in the middle of processing Fast response Slow response Aggregator Index servers Aggregator Index servers Terminate long query in the middle of processing → Fast response, but quality drop Long query (outlier)
Parallelism for Tail Reduction Opportunity Challenge Available idle cores CPU-intensive workloads Tails are few Tails are very long Breakdown Latency Network 4.26 ms Queueing 0.15 ms I/O 4.70 ms CPU 194.95 ms Percentile Latency Scale 50%tile 7.83 ms x1 75%tile 12.51 ms x1.6 95%tile 57.15 ms x7.3 99%tile 204.06 ms x26.1 Latency breakdown for the 99%tile. Latency distribution
Query Parallelism for Tail Reduction Opportunity 30% CPU utilization Available idle cores Few long queries Computationally-intensive workload Breakdown Latency Network 4.26 ms Queueing 0.15 ms I/O 4.70 ms CPU 194.95 ms Percentile Latency Scale 50%tile 7.83 ms x1 75%tile 12.51 ms x1.6 95%tile 57.15 ms x7.3 99%tile 204.06 ms x26.1 Table. Latency breakdown for the 99%tile. 99%tile latency of 204.06 ms = 99% requests have latency ≤ 204.06 ms Table. Latency distribution in Bing index server.
Predictive Parallelism for Tail Reduction Short queries Many Almost no speedup Long queries Few Good speedup
Predictive Parallelization Workflow Index server query Execution time predictor Predict (sequential) execution time of the query with high accuracy
Predictive Parallelization Workflow Index server query Execution time predictor Resource manager long short Using predicted time, selectively parallelize long queries
Predictive Parallelization Focus of Today’s Talk Predictor: of long query through machine learning Parallelization: of long query with high efficiency
Brief Overview of Predictor Accuracy Cost High recall for guaranteeing 99%tile reduction Low prediction overhead and misprediction cost In our workload, 4% queries with > 80 ms At least 3% must be identified (75% recall) Prediction overhead of 0.75ms or less and high precision Existing approaches: Lower accuracy and higher cost
Accuracy: Predicting Early Termination Only some limited portion contributes to top-k relevant results Such portion depends on keyword (or score distribution more exactly) Doc 1 Doc 2 Doc 3 ……. Doc N-2 Doc N-1 Doc N Docs sorted by static rank Highest Lowest Web documents ……. ……. Inverted index for “SIGIR” Not evaluated Processing
Space of Features Query features 4/11/2017 Space of Features Term Features [Macdonald et al., SIGIR 12] IDF, NumPostings Score (Arithmetic, Geometric, Harmonic means, max, var, gradient) Query features NumTerms (before and after rewriting) Relaxed Language Query features (6): captures query complexity. Query rewriting Term features (14): IDF: inverse document frequency © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
New Features: Query Rich clues from queries in modern search engines <Fields related to query execution plan> rank=BM25F enablefresh=1 partialmatch=1 language=en location=us …. <Fields related to search keywords> SIGIR (Queensland or QLD)
Space of Features Query features 4/11/2017 Space of Features Term Features [Macdonald et al., SIGIR 12] IDF, NumPostings Score (Arithmetic, Geometric, Harmonic means, max, var, gradient) Query features NumTerms (before and after rewriting) Relaxed Language Query features (6): captures query complexity. Query rewriting Term features (14): IDF: inverse document frequency © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Space of Features Category Feature Term feature (14) AMeanScore GMeanScore HMeanScore MaxScore EMaxScore VarScore NumPostings GAvgMaxima MaxNumPostings In5%Max NumThres ProK IDF Query feature (6) English NumAugTerm Complexity RelaxCount NumBefore NumAfter All features cached to ensure responsiveness (avoiding disk access) Term features require 4.47GB memory footprint (for 100M terms)
Feature Analysis and Selection Accuracy gain from boosted regression tree, suggesting cheaper subset What a surprise. Cheap features are enough to make prediction
Efficiency: Cheaper subset possible? 4/11/2017 Efficiency: Cheaper subset possible? Query features (6): captures query complexity. Query rewriting Term features (14): IDF: inverse document frequency © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Prediction Performance 80 ms Thresh. Precision (|A∩P|/|P|) Recall (|A∩P|/|A|) Cost Keyword features 0.76 0.64 High All features 0.89 0.84 Cheap features 0.86 0.80 Low A = actual long queries P = predicted long Query features are important Using cheap features is advantageous IDF from keyword features + query features Much smaller overhead (90+% less) Similarly high accuracy as using all features
Algorithms Classification vs. Regression Comparable accuracy 4/11/2017 Algorithms Classification vs. Regression Comparable accuracy Flexibility Algorithms Linear regression Gaussian process regression Boosted regression tree Regression versus classification Flexibility of regression © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Accuracy of Algorithms 4/11/2017 Accuracy of Algorithms Summary 80% long queries (> 80 ms) identified 0.6% short queries mispredicted 0.55 ms for prediction time with low memory overhead © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Predictive Parallelism 4/11/2017 Predictive Parallelism Key idea Parallelize only long queries Use a threshold on predicted execution time Evaluation Compare Predictive to other baselines Sequential Fixed Adaptive © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
99%tile Response Time Outperforms “Parallelize all” 50% throughput increase Outperforms “Parallelize all”
Performance: Response Time 4/11/2017 Performance: Response Time © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
4/11/2017 Response Time © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Related Work Search query parallelism Execution time prediction Fixed parallelization [Frachtenberg, WWWJ 09] Adaptive parallelization using system load only [Raman et al., PLDI 11] High overhead due to parallelizing all queries Execution time prediction Keyword-specific features only [Macdonald et al., SIGIR 12] → Lower accuracy and high memory overhead for our target problem
Future Work Misprediction Diverse workloads Dynamic adaptation Prediction confidence Diverse workloads Analytics, graph processing,
Thank You! Your query to Bing is now parallelized if predicted as long. query Execution time predictor Resource manager long short