ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Finding The Unknown Number In A Number Sentence! NCSCOS 3 rd grade 5.04 By: Stephanie Irizarry Click arrow to go to next question.
Advanced Piloting Cruise Plot.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
1 A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs Qiaozhu Mei, Chao Liu, Hang Su, and ChengXiang Zhai : University of Illinois.
1 Random Sampling from a Search Engines Index Ziv Bar-Yossef Maxim Gurevich Department of Electrical Engineering Technion.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
Determine Eligibility Chapter 4. Determine Eligibility 4-2 Objectives Search for Customer on database Enter application signed date and eligibility determination.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
ZMQS ZMQS
10.1 Trading Strategies Involving Options Chapter 10.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Week 2 The Object-Oriented Approach to Requirements
Intel VTune Yukai Hong Department of Mathematics National Taiwan University July 24, 2008.
Service Level Agreement
ABC Technology Project
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
Green Eggs and Ham.
VOORBLAD.
Text Categorization.
CAR Training Module PRODUCT REGISTRATION and MANAGEMENT Module 2 - Register a New Document - Without Alternate Formats (Run as a PowerPoint show)
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Chapter 5: Query Operations Hassan Bashiri April
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Optimization 1/33 Radford, A D and Gero J S (1988). Design by Optimization in Architecture, Building, and Construction, Van Nostrand Reinhold, New York.
© 2012 National Heart Foundation of Australia. Slide 2.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Sets Sets © 2005 Richard A. Medeiros next Patterns.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
New Look, New Features, Broader Coverage
Addition 1’s to 20.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Januar MDMDFSSMDMDFSSS
Week 1.
Analyzing Genes and Genomes
Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
From Model-based to Model-driven Design of User Interfaces.
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Presentation transcript:

ACM CIKM 2008, Oct , Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign

ACM CIKM 2008, Oct , Napa Valley 2 Ineffective Queries reduce space command latex

ACM CIKM 2008, Oct , Napa Valley 3 Effective Queries squeeze space command latex

ACM CIKM 2008, Oct , Napa Valley 4 More Examples If you want to wash your vehicle –vehicle wash, auto wash –car wash, truck wash If you want to buy a car –auto quotes –auto sale quotes? –auto insurance quotes?

ACM CIKM 2008, Oct , Napa Valley 5 What Makes a Query Ineffective? Vocabulary mismatch –reduce space command latex vs squeeze space command latex –auto wash vs car wash Lack of discrimination –auto quotes vs auto sale quotes … How can we help improving ineffective queries? Term substitution Term addition

ACM CIKM 2008, Oct , Napa Valley 6 Our Contribution We cast query reformulation as term level pattern mining from search logs We define two basic types of patterns at term level and propose probabilistic methods –Context-sensitive term substitution auto car | _wash, car auto | _trade –Context-sensitive term addition +sale | auto_quotes We evaluate our methods on commercial search engine logs and show their effectiveness

ACM CIKM 2008, Oct , Napa Valley 7 Problem Formulation Query Collection Task 1: Contextual Models Task 2: Translation Models q = auto wash Task 3: Pattern Mining auto car | _wash auto truck | _wash +southland | _auto wash … Patterns Search logs Offline partOnline part car wash truck wash southland auto wash …

ACM CIKM 2008, Oct , Napa Valley 8 Task 1: Contextual Models enterprise car rental rental car budget car rental car pricing car pictures car accidents … G: General context Syntagmatic relations Capture terms frequently co-occur with w inside queries Sample query collection rental: enterprise: budget: pricing: … Model P G ( * |car)

ACM CIKM 2008, Oct , Napa Valley 9 Task 1: Contextual Models enterprise car rental rental car budget car rental car pricing car pictures car accidents … Model: P L1 ( * | car) Syntagmatic relations Capture terms frequently co-occur with w inside queries Sample query collection rental: enterprise: budget: … L 1 : 1 st Left Context

ACM CIKM 2008, Oct , Napa Valley 10 Task 1: Contextual Models enterprise car rental rental car budget car rental car pricing car pictures car accidents … Model: P R1 ( * |w) Syntagmatic relations Capture terms frequently co-occur with w inside queries Sample query collection rental: 0.4 pricing: 0.2 pictures: 0.2 accidents: 0.2 … R 1 : 1 st Right context

ACM CIKM 2008, Oct , Napa Valley 11 Task 2: Translation Models Paradigmatic relations (car and auto) Capture terms that are substitutable with w Similar contexts high translation probability Translation models Probability of generating ss context from ws contextual model Size of L 1 contextSize of R 1 context

ACM CIKM 2008, Oct , Napa Valley 12 Task 3.1: Pattern Mining–Term Substitution q=[w 1 …w i-1 w i w i+1 …w n ] q=[w 1 …w i-1 sw i+1 …w n ] Substitute w i by s Which word s should be chosen? Local factor Global factor: translation model

ACM CIKM 2008, Oct , Napa Valley 13 Estimating Local Factor Independence w 1 …w i-1 __w i+1 …w n s …… Ignore those terms far away

ACM CIKM 2008, Oct , Napa Valley 14 Task 3.2: Pattern Mining–Term Addition q=[w 1 …w i-1 w i …w n ] q=[w 1 …w i-1 rw i …w n ] Adding r before w i Similar to the Local Factor in Term Substitution Patterns Uniform

ACM CIKM 2008, Oct , Napa Valley 15 Evaluation: Data Preparation From Microsoft Live Labs 5/1/2006 5/31/20065/20/2006 History Logs Future logs History Collection 4.4M queries 1.6M are distinct 1.3M user sessions Used to construct test cases

ACM CIKM 2008, Oct , Napa Valley 16 Examples of Contextual Models Left and Right contexts are different General context mixed them together

ACM CIKM 2008, Oct , Napa Valley 17 Examples of Translation Models Conceptually similar keywords have high translation probabilities Provide possibility for exploratory search in an interactive manner

ACM CIKM 2008, Oct , Napa Valley 18 Examples of Term Substitution Substitution is context sensitive Intuitively, reworded queries are more effective

ACM CIKM 2008, Oct , Napa Valley 19 Effectiveness Comparison of Term Substitution – Experiment Design Q1Q1 Q2Q2 QkQk R 21 R 22 R 23 … R k1 R k2 R k3 … C3C3 C2C2 C1C1 Session … … How well can a reformulated query rank C 1, C 2, and C 3 on the top? Q1Q1 reformulation Q 1 dxC3C1C2dx…dxC3C1C2dx… Q 2 Q 3 dxC1dxdxdx…dxC1dxdxdx… dxC2dxC3dx…dxC2dxC3dx… Best

ACM CIKM 2008, Oct , Napa Valley 20 Results Our method reformulates queries more effectively [Jones06] Our method #Recommended Queries

ACM CIKM 2008, Oct , Napa Valley 21 Term Addition Patterns Term addition patterns can refine a broad query

ACM CIKM 2008, Oct , Napa Valley 22 Related Work Query suggestions [e.g., Jones06, Sahami et al06] –Discover pattern at query level –Rely on external resources or training data –Does not consider the effectiveness Query modifications in IR [Rocchio71, Anick03] –Expand queries from returned documents –Does not rely on search logs, mostly adding terms Related work in NLP community [Lin98, Rapp02] –Finding synonym or near synonyms –Syntagmatic and paradigmatic relations –Not used for query reformulation

ACM CIKM 2008, Oct , Napa Valley 23 Conclusions and Future Work We propose a new way to mine search logs for patterns to address ineffective queries –Vocabulary mismatch –Lack of discrimination We define and mine two basic patterns at term level –Context-sensitive term substitution patterns –Context-sensitive term addition patterns Experiments show the effectiveness of our methods In the future, –Use relevance judgments instead of clicks –Exploit click information for better query reformulation

ACM CIKM 2008, Oct , Napa Valley 24 Thank You!

ACM CIKM 2008, Oct , Napa Valley 25 Offline Efficiency Linear scalability with data size More data

ACM CIKM 2008, Oct , Napa Valley 26 Enhancement by User Sessions Improve translation models by user sessions –t(express|idol) is very high –american express and american idol are frequent Method w=idol top N thresholding t(idols|idol)=1 Normalized Mutual Information

ACM CIKM 2008, Oct , Napa Valley 27 Formal Definitions Query is a sequence of keywords –q = [w 1 w 2 …w n ] Context-sensitive term substitution –[w w|c L _c R ] Context-sensitive term addition –[+w|c L _c R ] Query rewording: replace a word w i by s –q = [w 1 …w i-1 w i w i+1 …w n ] q = [w 1 …w i-1 sw i+1 …w n ] Query refinement: add a new word r –q = [w 1 …w i w i+1 …w n ] q = [w 1 …w i rw i+1 …w n ]