Saehoon Kim§, Yuxiong He. , Seung-won Hwang§, Sameh Elnikety

Slides:

Advertisements

Similar presentations

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.

Advertisements

Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.

03/20/2003Parallel IR1 Papers on Parallel IR Agenda Introduction Paper 1:Inverted file partitioning schemes in multiple disk systems Paper 2: Parallel.

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Predictive Parallelization: Taming Tail Latencies in

DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,

1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University

Tians Scheduling: Using Partial Processing in Best-Effort Applications Yuxiong He *, Sameh Elnikety *, Hongyang Sun + * Microsoft Research + Nanyang Technological.

MINERVA Infinity: A Scalable Efficient Peer-to-Peer Search Engine Middleware 2005 Grenoble, France Sebastian Michel Max-Planck-Institut für Informatik.

Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,

Evaluating the Performance of IR Sytems

Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.

A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.

Ensemble Learning (2), Tree and Forest

Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.

Cache-Conscious Runtime Optimization for Ranking Ensembles Xun Tang, Xin Jin, Tao Yang Department of Computer Science University of California at Santa.

A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.

Author : Chengwei Wang, Vanish Talwar*, Karsten Schwan, Parthasarathy Ranganathan* Conference: IEEE 2010 Network Operations and Management Symposium (NOMS)

1 Scheduling I/O in Virtual Machine Monitors© 2008 Diego Ongaro Scheduling I/O in Virtual Machine Monitors Diego Ongaro, Alan L. Cox, and Scott Rixner.

Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.

A Novel Adaptive Distributed Load Balancing Strategy for Cluster CHENG Bin and JIN Hai Cluster.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Master Thesis Defense Jan Fiedler 04/17/98

Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.

Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.

Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.

C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Marco Canini (UCL) with Lalith Suresh, Stefan Schmid, Anja Feldmann (TU Berlin)

Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.

The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Web Search Using Mobile Cores Presented by: Luwa Matthews 0.

BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.

Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright.

Advanced Search Features Dr. Susan Gauch. Pruning Search Results  If a query term has many postings  It is inefficient to add all postings to the accumulator.

Power-Aware Parallel Job Scheduling

Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.

Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

What Does the User Really Want ? Relevance, Precision and Recall.

Scalable and Coordinated Scheduling for Cloud-Scale computing

1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),

Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.

Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,

Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR

Information Retrieval Quality of a Search Engine.

ENHANCING CLUSTER LABELING USING WIKIPEDIA David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab SIGIR’09.

Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.

Zeta: Scheduling Interactive Services with Partial Execution Yuxiong He, Sameh Elnikety, James Larus, Chenyu Yan Microsoft Research and Microsoft Bing.

EuroSys Doctoral Workshop 2011 Resource Provisioning of Web Applications in Heterogeneous Cloud Jiang Dejun Supervisor: Guillaume Pierre

The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.

18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.

Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Gleb Skobeltsyn Flavio Junqueira Vassilis Plachouras

Lecture 2: Performance Evaluation

Reza Yazdani Albert Segura José-María Arnau Antonio González

Optimizing Parallel Algorithms for All Pairs Similarity Search

Information Retrieval in Practice

Search User Behavior: Expanding The Web Search Frontier

Hamed Rezaei, Mojtaba Malekpourshahraki, Balajee Vamanan

The Four Dimensions of Search Engine Quality

Predictive Performance

Wikitology Wikipedia as an Ontology

TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for Online Search Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and T. N. Vijaykumar.

Intent-Aware Semantic Query Annotation

Cross-Layer Optimizations between Network and Compute in Online Services Balajee Vamanan.

(A Research Proposal for Optimizing DBMS on CMP)

Approaching an ML Problem

Presentation transcript:

Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search Saehoon Kim§, Yuxiong He*, Seung-won Hwang§, Sameh Elnikety*, Seungjin Choi§ § *

Web Search Engine Requirement Queries High quality + Low latency This talk focuses on how to achieve low latency without compromising the quality

Low Latency for All Users Reduce tail latency (high-percentile response time) Reducing average latency is not sufficient Latency Commercial search engine reduces 99th-percentile latency

Reducing End-to-End Latency The 99th–percentile response time < 120ms Aggregator ISN The 99.99th–percentile response time < 120ms 40 Index Server Nodes (ISNs) Long(-running )query

Reducing Tail Latency by Parallelization Resource Latency Network 4.26 ms Queueing 0.15 ms I/O 4.70 ms CPU 194.95 ms Opportunity of Parallelization Available idle cores CPU-intensive workloads

Challenges of Exploiting Parallelism Parallelizing all queries Inefficient under medium to high load Parallelizing short queries No speed up Parallelizing long queries Good speed up Parallelize only long(-running) queries

Prior Work - PREDictive Parallelization Predict the query execution time Parallelize the predicted long queries only Execute the predicted short queries sequentially Long Feature Extraction Regression function Prediction model “WSDM” Short Predictive Parallelization: Taming Tail Latencies in Web Search, [M. Jeon, SIGIR’14]

Requirements PRED cannot effectively reduce 99.99th tail latency 99th tail latency at aggregator <= 120ms Reduce 99.99th tail latency at each ISN <= 120ms Recall Precision Requirements >= 98.9% Should be high Reason To optimize 99.99th tail latency Less queries to be parallelized PRED 98.9% 1.1% PRED cannot effectively reduce 99.99th tail latency

Contributions Key Contributions: Proposes DDS (Delayed-Dynamic-Selective) prediction to achieve very high recall and good precision Use DDS prediction to effectively reduce extreme tail latency

Overview of DDS Selective prediction Delayed prediction Not confident Predictor for confidence level Not confident Selective prediction Finished Queries < 10ms Delayed prediction Queries > 10ms Predictor for execution time Long Short Dynamic prediction Query

Delayed Prediction Complete many short queries sequentially Collect dynamic features

Dynamic Features What are dynamic features? Two categories Features that can only be collected at runtime Two categories NumEstMatchDocs: to estimate the total # matched docs DynScores: to predict early termination

Primary Factors for Execution Time 1. # total matched documents Doc 1 Doc 2 Doc 3 ……. Doc N-2 Doc N-1 Doc N Docs sorted by static scores Highest Lowest Web documents ……. ……. Inverted index for “WSDM” Inverted index for “2015” Processing

Primary Factors for Execution Time 1. # total matched documents Doc 1 Doc 2 Doc 3 ……. Doc N-2 Doc N-1 Doc N Docs sorted by static scores Highest Lowest Web documents ……. ……. Inverted index for “WSDM” 2. Early termination Inverted index for “2015” Processing Not evaluated

Inverted index for “WSDM” Docs sorted by static scores Early Termination Inverted index for “WSDM” Processing Not evaluated Doc 1 Doc 2 Doc 3 ……. Doc N-2 Doc N-1 Doc N Docs sorted by static scores Highest Lowest Web documents To predict early termination, Consider a dynamic score distribution Doc ID Dynamic Score Doc 3 -4.01 Doc 1 -4.11 Doc 5 -4.23 Doc ID Dynamic Score Doc 3 -4.01 Doc 8 -4.10 Doc 1 -4.11 Doc ID Dynamic Score Doc 1 -4.11 Doc ID Dynamic Score Doc 3 -4.01 Doc 1 -4.11 Top-3 Results If min. Dynamic score > threshold, then stop.

Importance of Dynamic Features Top-10 feature importance by boosted regression tree NumEstMachDoc helps to predict # total matched docs DynScore helps to predict early termination

Selective Prediction Find out almost all long queries with good precision Identify the outliers (long query predicted as short) Predicted execution time

Selective Prediction Long queries Short queries Predicted 𝐿 1 error Long queries Predicted execution time Predicted 𝐿 1 error Short queries

Overview of DDS Selective prediction Delayed prediction Not confident Predictor for confidence level Not confident Selective prediction Finished Queries < 10ms Delayed prediction Queries > 10ms Predictor for execution time Long Short Dynamic prediction Query

Evaluations of Predictor Accuracy (1/3) Baseline (PRED) Static features with no delayed prediction IDF, Static score (e.x. PageRank), etc. Proposed method (DDS) Dynamic (+static) features with Delayed and Selective prediction

Evaluations of Predictor Accuracy (2/3) 69,010 Bing queries at production workload 14,565 queries >= 10ms 635 queries >= 100ms Boosted regression tree with 10-fold cross validation For PRED, we use 69,010 queries For DDS, we use 14,565 queries

Evaluations of Predictor Accuracy (3/3) 957% Improvement over PRED

Evaluations of Predictor Accuracy (3/3) 957% Improvement over PRED Delayed

Evaluations of Predictor Accuracy (3/3) 957% Improvement over PRED Selective features You may want to add an additional animation label to show “Delay”, “Dynamic”, “Selective” Dynamic features Delayed

Simulation Results on Tail Latency Reduction Baseline (PRED) Predict query execution time before running it Parallelize the long query with 4-way parallelism Proposed method (DDS) Run a query for 10ms sequentially Parallelizes the long or unpredictable queries with 4-way parallelism

ISN Response Time

ISN Response Time

ISN Response Time 70% throughput increase

Aggregator Response Time DDS can optimize 99th-percentile tail latency at aggregator under high QPS

Conclusion Proposes a novel prediction framework Delayed prediction/Dynamic features/Selective prediction Achieves a high precision and recall compared to PRED Reduces 99th-percentile aggregator response time <= 120ms under high load!