1 Automatic Identification of User Goals in Web Search Uichin Lee, Zhenyu Liu, Junghoo Cho Computer Science Department, UCLA {uclee, vicliu,

Slides:



Advertisements
Similar presentations
Eye Tracking Analysis of User Behavior in WWW Search Laura Granka Thorsten Joachims Geri Gay.
Advertisements

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
A Quality Focused Crawler for Health Information Tim Tang.
Evaluating Search Engine
Search Engines and Information Retrieval
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
CS246: Page Selection. Junghoo "John" Cho (UCLA Computer Science) 2 Page Selection Infinite # of pages on the Web – E.g., infinite pages from a calendar.
Focused Crawling in Depression Portal Search: A Feasibility Study Thanh Tin Tang (ANU) David Hawking (CSIRO) Nick Craswell (Microsoft) Ramesh Sankaranarayana(ANU)
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
CS246 Search Engine Bias. Junghoo "John" Cho (UCLA Computer Science)2 Motivation “If you are not indexed by Google, you do not exist on the Web” --- news.com.
Web queries classification Nguyen Viet Bang WING group meeting June 9 th 2006.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Web Archive Information Retrieval Miguel Costa, Daniel Gomes (speaker) Portuguese Web Archive.
Information Retrieval
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Chapter 5: Information Retrieval and Web Search
© 2004 Keynote Systems Customer Experience Management (CEM) Bonny Brown, Ph.D. Director, Research & Public Services.
Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.
Information Re-Retrieval Repeat Queries in Yahoo’s Logs Jaime Teevan (MSR), Eytan Adar (UW), Rosie Jones and Mike Potts (Yahoo) Presented by Hugo Zaragoza.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Search Engines and Information Retrieval Chapter 1.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Automatic Subject Classification and Topic Specific Search Engines -- Research at KnowLib Anders Ardö and Koraljka Golub DELOS Workshop, Lund, 23 June.
Personalization features to accelerate research Presented by: Armond DiRado Account Development Manager
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
Author: Sali Allister Date: 21/06/2011 COASTAL Google Analytics Report March 2011 – June /03/2011 – 08/06/11.
Author: Sali Allister Date: 09/12/2010 COASTAL Google Analytics Report September 2010 – December /09/2010 – 08/12/2010.
1 Can People Collaborate to Improve the relevance of Search Results? Florian Eiteljörge June 11, 2013Florian Eiteljörge.
Author: Sali Allister Date: 10/01/2012 COASTAL Google Analytics Report September 2011– December /09/2011 – 08/12/11.
Author: Sali Allister Date: 18/10/2011 COASTAL Google Analytics Report June 2011 – September /06/2011 – 08/09/11.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Focused Crawling for both Topical Relevance and Quality of Medical Information By Tim Tang, David Hawking, Nick Craswell, Kathy Griffiths CIKM ’05 November,
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Personalized Search Xiao Liu
Data Mining By Dave Maung.
Chapter 6: Information Retrieval and Web Search
Improving Classification Accuracy Using Automatically Extracted Training Data Ariel Fuxman A. Kannan, A. Goldberg, R. Agrawal, P. Tsaparas, J. Shafer Search.
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
For: CS590 Intelligent Systems Related Subject Areas: Artificial Intelligence, Graphs, Epistemology, Knowledge Management and Information Filtering Application.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Algorithmic Detection of Semantic Similarity WWW 2005.
SEO. SEO Market Store Best Practice “The Rakuten Merchant Package for SEO will aid in improving the visibility of your store in search.” Getting Started.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
A Taxonomy of Web Searches Andrei Broder, SIGIR Forum, 2002 Ahmet Yenicag Ceyhun Karbeyaz.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.
SEO. SEO Market Store Best Practice “The Rakuten Merchant Package for SEO will aid in improving the visibility of your store in search.” Getting Started.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
Context-Sensitive IR using Implicit Feedback Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.
Search Engine Marketing Science Writers Conference 2009.
Recommender Systems & Collaborative Filtering
Chapter 5: Information Retrieval and Web Search
CS246: Leveraging User Feedback
Query Type Classification for Web Document Retrieval
Presentation transcript:

1 Automatic Identification of User Goals in Web Search Uichin Lee, Zhenyu Liu, Junghoo Cho Computer Science Department, UCLA {uclee, vicliu,

2 Motivation Users have different goals for Web search –Reach the homepage of an organization (e.g., UCLA) –Learn about a topic (e.g., simulated annealing) –Download online music, etc. Can we identify the user goal for a Web search automatically? –Improve and customize search results based on the identified user goal, for example

3 Two high-level user-goals Navigational query –Reach a Web site the user already has in mind (e.g., “UCLA Library”) Informational query –Visit multiple sites to learn about a particular topic (e.g. “Simulated Annealing”) Based on [Broder02, Rose&Levinson04] –Navigational and informational are common in both studies

4 Exploiting identified user goals Tailored weighting/ranking mechanism –Navigational queries Emphasize on anchor texts [Craswell01, Kang03], URL path [Westerveld01] –Informational queries Emphasize on page content [Kang03], IR techniques (query expansion, relevance feedback, pseudo relevance feedback, etc.) Tailored result presentation –Informational queries Clustered search results [Etzioni99, Zeng04, Kummamuru04] Targeted ads / answers

5 Outline Are query goals predictable? –Human-subject study How can we predict user goals automatically? –Anchor-link distribution –User-click distribution How effective are our features? –Experimental evaluation

6 Are query goals “predictable”? Search engines “see” only a few keywords –No explicit indication of goals by users –Can we predict the user goal simply from the keywords? Human subject study –50 most popular Google queries from UCLA CS –28 participants (grad students) from UCLA CS –Ask subjects to indicate the likely goal of each query if they had issued it Do most subjects agree on a particular goal?

7 Human subject study results i(q) – the % of participants that judge query q as informational –e.g., i(q) = for “UCLA Library” Queries with a predictable goal

8 Human subject study results i(q) – the % of participants that judge query q as informational –e.g., i(q) = for “UCLA Library” “ambiguous queries” 43.5% software names 30.4% person names

9 Human subject study results i(q) – the % of participants that judge query q as informational –e.g., i(q) = for “UCLA Library” After removing software and person-name queries

10 Human subject study: summary Majority of queries have predictable goals Interestingly, most ambiguous queries tend to be on a certain set of topics –Topic-based ambiguity detection may be possible –Treat ambiguous queries differently from others

11 Outline Are query goals predictable? –Human-subject study How can we predict user goals automatically? How effective are our features? –Experimental evaluation

12 How to predict user goal? “UCLA Library” vs. “Simulated Annealing” –Navigational vs. informational –Semantic analysis necessary? Our idea: use information provided implicitly by Web users – Web-link structure – User-click behavior

13 Web-link structure Anchor-link distribution to quantify the link structure UCLA Library library.html repositories.cdlib.org/uclalib/

14 Web-link structure Anchor-link distribution to quantify the link structure library.html repositories.cdlib.org/uclalib/ Anchor-link distribution for query: “UCLA Library”

15 Anchor-link distribution for sample queries NavigationalInformational “UCLA Library” “Simulated Annealing”

16 User-click behavior Click distribution to quantify past user- click behavior Click distribution for the navigational query: “UCLA Library”

17 User-click behavior (cont’d) NavigationalInformational “UCLA Library” “Simulated Annealing”

18 Capturing the “shape” of distributions Possible numeric features for f(x) –Mean –  –Median –Skewness –  (x -  ) 3  f(x)  dx /  3 How “asymmetric” f(x) is –Kurtosis –  (x -  ) 4  f(x)  dx /  4 How “peaked” f(x) is Single linear regression –Median is the most effective measurement for both anchor- link distribution and click distribution

19 Evaluation of features Based on 30 queries from the human subject study –Except software and person-name queries –Each query is associated with a distinct user goal Anchor-link distribution for each query –Based on 60M pages crawled from the Web Click distribution for each query –Based on Google-result click behavior from UCLA CS during April September 2004

20 Goal-prediction graph (synthetic) An effective feature (hypothetically)  navigational informational

21 Prediction graph: median of anchor-link dist.  1 = 1.0 Navigational iff median <  1 = 1.0 –Navigational queries: the vast majority of links point to the #1 anchor destination Prediction accuracy: 80.0% navigational informational

22 Prediction graph: combining the two features Linear combination with equal weights: Navigational queries iff the median of click dist. + the median of anchor-link dist. <  1 +  2 (= 2.0) Prediction accuracy: 90%  1 +  2 = 2.0 navigational informational

23 Three features in [Kang and Kim 03] (1) Anchor usage rate (2) Query term distribution (3) Term-dependence Comparison with previous work navigational informational Result –Could not reproduce reported results –Three features not very effective

24 Summary Two effective features for goal identification –Anchor-link distribution (Web-link structure) and click distribution (user-click behavior) –Achieved an overall accuracy of 90% on a benchmark query set More details in the paper

25 Future work Evaluate on a larger and less biased query set Handle queries with insufficient anchor/click statistics –Learn patterns from queries whose goals are clear Predict search intentions on a finer granularity –Informational queries can be further classified, e.g., directed, undirected, advice, list, etc. [Rose04] –Analyze the contents of Web pages that users have clicked/viewed –Linguistic methods

26 Thank you Any questions?

27 Questionnaire design 1 st version: direct classification by subjects –Navigational vs. informational –Some confusion “Alan Kay”: home page + other pages “Have a site in mind?” vs “plan to visit one site?” 2 nd version: 1.Have a site in mind. Intend to visit only that site 2.Have a site in mind. But willing to visit others 3.Have no site in mind. Willing to visit anything relevant