Time-sensitive Personalized Query Auto-Completion Date: 2015/01/29 Author: Fei Cai, Shangsong Liang, Maarten de Rijke Source: CIKM’14 Advisor:Jia-ling Koh Speaker:Sz-Han,Wang
Outline Introduction Method Experiment Conclusion
Introduction Query auto-completion(QAC) is aimed at saving user’s time
Introduction A common approach: Extract past queries from query log, and rank them by their past popularity → performance on average but far from optimal Time and trend MH370 movie christmas
Introduction User-specific
Introduction Goal: Propose a hybrid QAC model considers both of time-sensitivity and personalization to achieve optimal QAC effectiveness for a user. Time-sensitivity: predict query popularity based on periodicity of a query and its recent trend Personalization: exploit each user’s previous queries, both during the current session and from historical logs as user-specific
top N query completions Flow Chart Personalized QAC Score user-specific query similarity query Time-sensitive QAC Rank QAC candidates by predicted query popularity top N query completions candidates Hybrid QAC Rerank QAC candidates by combined time-sensitive QAC with personalized QAC
Outline Introduction Method Experiment Conclusion
Time-sensitive QAC Rank QAC candidates by predicted query popularity Predict query popularity based on its recent trend and periodicity Time-sensitive score = 𝑦 𝑡0+1 𝑞,λ =λ× 𝑦 𝑡0+1 𝑞 𝑡𝑟𝑒𝑛𝑑 + 1−λ × 𝑦 𝑡0+1 𝑞 𝑝𝑒𝑟𝑖
Time-sensitive QAC Predict form recent trend (recent N days) harry f: decay factor, TD(i): the interval from day i to future day t0+1 harry harry potter harry winston harry style N days = 3, f=0.95 𝑦 𝑡 0+1 ℎ𝑎𝑟𝑟𝑦 𝑝𝑜𝑡𝑡𝑒𝑟 𝑡𝑟𝑒𝑛𝑑 =0.35∗125+0.33∗130+0.32∗140 =131.45 𝑦 𝑡 0+1 ℎ𝑎𝑟𝑟𝑦 𝑝𝑜𝑡𝑡𝑒𝑟 𝑡𝑟𝑒𝑛𝑑 𝑡 0+1 𝑡 0 𝑡 0−1 𝑡 0−2 𝑡 0−3 85 90 110 70 𝑡 0−4 𝑇𝐷 1 =2,𝑤 1 = 0.95 2−1 =0.95 → 𝑛𝑜𝑟𝑚(𝑤 1 )=0.35 𝑇𝐷 2 =3,𝑤 2 = 0.95 3−1 =0.9025 → 𝑛𝑜𝑟𝑚(𝑤 2 )=0.33 𝑇𝐷 3 =4,𝑤 3 = 0.95 4−1 =0.85735 → 𝑛𝑜𝑟𝑚(𝑤 3 )=0.32 𝑦 𝑡 0+1 (ℎ𝑎𝑟𝑟𝑦 𝑝𝑜𝑡𝑡𝑒𝑟,1) 𝑡𝑟𝑒𝑛𝑑 =90+15=105 𝑦 𝑡 0+1 (ℎ𝑎𝑟𝑟𝑦 𝑝𝑜𝑡𝑡𝑒𝑟,2) 𝑡𝑟𝑒𝑛𝑑 =85+40=125 𝑦 𝑡 0+1 (ℎ𝑎𝑟𝑟𝑦 𝑝𝑜𝑡𝑡𝑒𝑟,3) 𝑡𝑟𝑒𝑛𝑑 =70+55=125
Time-sensitive QAC Predict form periodicity harry harry potter Use autocorrelation coefficient to detect q’s periodicity , 𝑇 𝑞 denotes q’s periodicity 𝑇 𝑞 =1 𝑦𝑒𝑎𝑟, 𝑀=3 harry harry potter harry winston harry style 2005/07/16 100 95 120 2005/07/17 2004/07/17 2003/07/17 2002/07/17 𝑦 𝑡 0+1 ℎ𝑎𝑟𝑟𝑦 𝑝𝑜𝑡𝑡𝑒𝑟 𝑝𝑒𝑟𝑖 = 1 3 100+95+120 =105 𝑦 𝑡 0+1 ℎ𝑎𝑟𝑟𝑦 𝑝𝑜𝑡𝑡𝑒𝑟 𝑝𝑒𝑟𝑖 1 2 3 𝑟 1 = 1−2 ∗ 2−2 + 2−2 ∗ 3−2 +…+ 2−2 ∗(3−2) (1−2) 2 + (2−2) 2 + …+(2−2) 2 =−0.33 𝑟 2 = 1−2 ∗ 3−2 + 2−2 ∗ 1−2 +…+ 1−2 ∗(3−2) (1−2) 2 + (2−2) 2 + …+(1−2) 2 =−0.67 𝑟 3 = 1−2 ∗ 1−2 + 2−2 ∗ 2−2 + 3−2 ∗(3−2) (1−2) 2 + (2−2) 2 + (3−2) 2 =1
Personalized QAC Score user-specific query similarity Using a combination of similarity scores 𝑆𝑐𝑜𝑟𝑒( 𝑄 𝑠 ,𝑞 𝑐 ) and 𝑆𝑐𝑜𝑟𝑒( 𝑄 𝑢 ,𝑞 𝑐 ) 𝑄 𝑠 : the recent queries in the current search, 𝑄 𝑢 : the same user issued before Personalized score =𝜔×𝑆𝑐𝑜𝑟𝑒( 𝑄 𝑠 ,𝑞 𝑐 )+ 1−𝜔 ×𝑆𝑐𝑜𝑟𝑒( 𝑄 𝑢 ,𝑞 𝑐 )
Personalized QAC 𝑤 𝑠 = 𝑓 𝑇𝐷 𝑠 −1 harry potter harry , f: decay factor, TD(s): the interval between 𝑞 𝑐 and 𝑞 𝑠 𝑝 𝑞 𝑐 𝑞 𝑠 = 𝑤 𝑐𝑖 ∈ 𝑞 𝑐 𝑝 𝑤 𝑐𝑖 𝑞 𝑠 𝑁( 𝑤 𝑐𝑖 , 𝑞 𝑐 ) = 𝑤 𝑐𝑖 ∈ 𝑞 𝑐 𝑝 𝑤 𝑐𝑖 𝑊(𝑤 𝑐𝑖 ) 𝑁( 𝑤 𝑐𝑖 , 𝑞 𝑐 ) W( 𝑤 𝑐𝑖 )={𝑤:𝑤∈ 𝑞 𝑠 |𝑤 0 = 𝑤 𝑐𝑖 [0]} ,N( 𝑤 ∗ , 𝑞 ∗ ) denotes the frequency of term 𝑤 ∗ appearing in 𝑞 ∗ 𝑝 𝑤 𝑐𝑖 𝑊(𝑤 𝑐𝑖 ) ≡𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦( 𝑤 𝑐𝑖 ,𝑊( 𝑤 𝑐𝑖 ))= 1 𝑊(𝑤 𝑐𝑖 | 𝑤 𝑗 ∈ 𝑊(𝑤 𝑐𝑖 ) 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦( 𝑤 𝑐𝑖 , 𝑤 𝑗 ) = 1 𝑊(𝑤 𝑐𝑖 | 𝑤 𝑗 ∈ 𝑊(𝑤 𝑐𝑖 ) 𝑙𝑒𝑛(𝑐𝑜𝑚𝑚𝑜𝑛 𝑤 𝑐𝑖 , 𝑤 𝑗 ) 𝑚𝑖𝑛( 𝑙𝑒𝑛(𝑤 𝑐𝑖 ), 𝑙𝑒𝑛(𝑤 𝑗 )) harry potter harry harry potter hogwarts 𝑝 ℎ𝑎𝑟𝑟𝑦 𝑝𝑜𝑡𝑡𝑒𝑟 ℎ𝑎𝑟𝑟𝑦 𝑝𝑜𝑡𝑡𝑒𝑟 ℎ𝑜𝑔𝑤𝑎𝑟𝑡𝑠 = 0.7 1 2 × 1 1 2 =0.84 𝑝 ℎ𝑎𝑟𝑟𝑦 𝑝𝑜𝑡𝑡𝑒𝑟 ℎ𝑎𝑟𝑟𝑦 𝑝𝑜𝑡𝑡𝑒𝑟 ℎ𝑜𝑔𝑤𝑎𝑟𝑡𝑠 𝑝 ℎ𝑎𝑟𝑟𝑦 𝑊(ℎ𝑎𝑟𝑟𝑦) = 1 2 × 5 5 + 2 5 =0.7 𝑝 𝑝𝑜𝑡𝑡𝑒𝑟 𝑊(𝑝𝑜𝑡𝑡𝑒𝑟) = 1 1 × 6 6 =1 𝑞 𝑐 term {harry, potter} 𝑞 𝑠 term {harry, potter, hogwarts} W(harry)= {harry, hogwarts} W(potter)= {potter} , 𝑤 𝑢 depend on the query count
Hybrid QAC Rerank QAC candidates by combined time-sensitive QAC with personalized QAC Hybrid score =𝛾∙𝑇𝑆𝑠𝑐𝑜𝑟𝑒( 𝑞 𝑐 )+ 1−𝛾 ∙𝑃𝑠𝑐𝑜𝑟𝑒( 𝑞 𝑐 ) Standardize TSscore and Pscore 𝑇𝑆𝑠𝑐𝑜𝑟𝑒 𝑞 𝑐 ← 𝑦 𝑡0+1 𝑞,λ − 𝜇 𝑇 𝜎 𝑇 𝑃𝑠𝑐𝑜𝑟𝑒 𝑞 𝑐 ← 𝑃𝑠𝑐𝑜𝑟𝑒( 𝑞 𝑐 )− 𝜇 𝑃 𝜎 𝑃
Outline Introduction Method Experiment Conclusion
Experiment Dataset AOL sampled between 2006/03/01 and 2006/05/31 Sound and Vision sampled between 2013/01/01 and 2013/12/31
Experiment Evaluation metrics Measure forecast accuracy Mean Absolute Error (MAE) Symmetric Mean Absolute Error Evaluate the effectiveness of QAC rankings Mean Reciprocal Rank (MRR) 𝑦 𝑖 :𝑡ℎ𝑒 𝑡𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒, 𝑦 𝑖 :𝑡ℎ𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒
Experiment Query popularity prediction evaluation
Experiment Impact of trade-off parameter 0.62 0.83
Experiment Performance of TS-QAC rankings Performance of Hybrid QAC rankings
Outline Introduction Method Experiment Conclusion
Conclusion Adopt a combination of the two aspects of the QAC problem Proposed to use time-series analysis for predicting its future frequency Extend time-sensitive QAC method with personalized The best performer λ *-H-QAC showing significant improvements over various time-sensitive QAC baselines Use parallel processing to enhance the efficiency of the method and use other metrics to evaluate the QAC rankings