HUMANE INFORMATION SEEKING: GOING BEYOND THE IR WAY JIN YOUNG IBM RESEARCH 1.

Slides:



Advertisements
Similar presentations
Struggling or Exploring? Disambiguating Long Search Sessions
Advertisements

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Modern Information Retrieval Chapter 1: Introduction
Sensible Searching: Making Search Engines Work Dr Shaun Ryan CEO S.L.I. Systems
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Recommender Systems – An Introduction Dietmar Jannach, Markus Zanker, Alexander Felfernig, Gerhard Friedrich Cambridge University Press Which digital.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Evaluating Search Engine
Search Engines and Information Retrieval
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Modern Information Retrieval Chapter 1: Introduction
Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Search Engines
1 ST DATA SCIENCE MEETUP IN SEOUL JINYOUNG KIM & HEEWON JEON 1.
Section 2: Finding and Refinding Jaime Teevan Microsoft Research 1.
Retrieval and Evaluation Techniques for Personal Information Jin Young Kim 7/26 Ph.D Dissertation Seminar.
Retrieval Model and Evaluation Jinyoung Kim UMass Amherst CS646 Lecture 1.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Personalization of the Digital Library Experience: Progress and Prospects Nicholas J. Belkin Rutgers University, USA
Search Engines and Information Retrieval Chapter 1.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Personal Information Management Vitor R. Carvalho : Personalized Information Retrieval Carnegie Mellon University February 8 th 2005.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Personalized Search Xiao Liu
University of Malta CSA3080: Lecture 4 © Chris Staff 1 of 14 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Collaborative Information Retrieval - Collaborative Filtering systems - Recommender systems - Information Filtering Why do we need CIR? - IR system augmentation.
Search Engine Architecture
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Individualized Knowledge Access David Karger Lynn Andrea Stein Mark Ackerman Ralph Swick.
Personalized Course Navigation Based on Grey Relational Analysis Han-Ming Lee, Chi-Chun Huang, Tzu- Ting Kao (Dept. of Computer Science and Information.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Post-Ranking query suggestion by diversifying search Chao Wang.
ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
User Guide Enhanced Knowledge Hub. 2 Note Accessing Knowledge Hub 1 2 Access K-Hub by selecting: 1.Knowledge Hub tab, OR 2.Knowledge Hub under My Communities.
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
ASSOCIATIVE BROWSING Evaluating 1 Jin Y. Kim / W. Bruce Croft / David Smith by Simulation.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
WHIM- Spring ‘10 By:-Enza Desai. What is HCIR? Study of IR techniques that brings human intelligence into search process. Coined by Gary Marchionini.
Recommender Systems & Collaborative Filtering
User Characterization in Search Personalization
Contextual Intelligence as a Driver of Services Innovation
Search Engine Architecture
Federated & Meta Search
Recommender Systems Copyright: Dietmar Jannah, Markus Zanker and Gerhard Friedrich (slides based on their IJCAI talk „Tutorial: Recommender Systems”)
Code search & recommendation engines
Search Engine Architecture
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Presentation transcript:

HUMANE INFORMATION SEEKING: GOING BEYOND THE IR WAY JIN YOUNG IBM RESEARCH 1

2 You need the freedom of expression. You need someone who understands. Information seeking requires a communication.

3 Information Seeking circa 2012 Search engine accepts keywords only. Search engine doesn’t understand you.

4 Toward Humane Information Seeking Rich User Interactions Rich User Modeling Profile Context Behavior Search Browsing Filtering

5 Challenges in Rich User Interactions Filtering Browsing Search Enabling rich interactions Evaluating complex interactions

6 Challenges in Rich User Modeling Profile Context Behavior Representing the user Estimating the user model

from Query to Session Rich User Modeling HCIR Way: 7 Action Response Action Response Action Response USERSYSTEM Interaction History Filtering / Browsing Relevance Feedback … Filtering Conditions Related Items … User Model Rich User Interaction IR Way: The Providing personalized results vs. rich interactions are complementary, yet both are needed in most scenarios. No real distinction between IR vs. HCI, and IR vs. RecSys Profile Context Behavior

8 Book Search The Rest of Talk… Web Search Personal Search Improving search and browsing for known-item finding Evaluating interactions combining search and browsing User modeling based on reading level and topic Proving non-intrusive recommendations for browsing Analyzing interactions combining search and filtering

P ERSONAL S EARCH Retrieval And Evaluation Techniques for Personal Information [Thesis] 9

Why does Personal Search Matter? 10 Knowledge workers spend up to 25% of their day looking for information. – IDC Group U Personal Search

Example: Desktop Search 11 Example: Search over Social Media Ranking using Multiple Document Types for Desktop Search [SIGIR10] Evaluating Search in Personal Social Media Collections [WSDM12]

[1] Stuff I’ve seen [Dumais03] Characteristics of Personal Search Many document types Unique metadata for each type Users mostly do re-finding [1] Opportunities for personalization Challenges in evaluation 12 Most of these hold true for enterprise search!

Field Relevance Different field is important for different query-term ‘james’ is relevant when it occurs in ‘registration’ is relevant when it occurs in Building a User Model for Search Query Structured Docs [ECIR09,12] Why don’t we provide field operator or advanced UI?

Estimating the Field Relevance If User Provides Feedback Relevant document provides sufficient information If No Feedback is Available Combine field-level term statistics from multiple sources 14 content title from/to Relevant Docs content title from/to Collection content title from/to Top-k Docs + ≅

15 Retrieval Using the Field Relevance Comparison with Previous Work Ranking in the Field Relevance Model q 1 q 2... q m f1f1 f1f1 f2f2 f2f2 fnfn fnfn... f1f1 f1f1 f2f2 f2f2 fnfn fnfn w1w1 w2w2 wnwn w1w1 w2w2 wnwn q 1 q 2... q m f1f1 f1f1 f2f2 f2f2 fnfn fnfn... f1f1 f1f1 f2f2 f2f2 fnfn fnfn P(F 1 |q 1 ) P(F 2 |q 1 ) P(F n |q 1 ) P(F 1 |q m ) P(F 2 |q m ) P(F n |q m ) Per-term Field Weight Per-term Field Score sum multiply

Retrieval Effectiveness (Metric: Mean Reciprocal Rank) DQLBM25FMFLMFRM-CFRM-TFRM-R TREC 54.2%59.7%60.1%62.4%66.8%79.4% IMDB 40.8%52.4%61.2%63.7%65.7%70.4% Monster 42.9%27.9%46.0%54.2%55.8%71.6% Evaluating the Field Relevance Model 16 Fixed Field Weights Per-term Field Weights

Summary so far… Query Modeling for Structured Documents Using the estimated field relevance improves the retrieval User’s feedback can help personalize the field relevance What’s Coming Next Alternatives to keyword search: associative browsing Evaluating the search and browsing together 17

What if keyword search is not enough? 18 Registration Search first, then browse through documents!

Building the Associative Browsing Model Link Extraction 3. Link Refinement 1. Document Collection Term Similarity Temporal Similarity Topical Similarity [CIKM10,11] Click-based Training

Evaluation Challenges for Personal Search Previous Work Each based on its own user study No comparative evaluation was performed yet Building Simulated Collections Crawl CS department webpages, docs and calendars Recruit department people for user study Collecting User Logs DocTrack: a human-computation search game Probabilistic User Model: a method for user simulation 20 [CIKM09,SIGIR10,CIKM11]

DocTrack Game 21

Probabilistic User Modeling 22 Evaluation Type TotalBrowsing used Successful Simulation 63,2609,410 (14.8%)3,957 (42.0%) User Study (14.5%)15 (35.7%) Query Generation Term selection from a target document State Transition Switch between search and browsing Link Selection Click on browsing suggestions Probabilistic user model trained on log data from user study.

Parameterization of the User Model Query Generation for Search Preference for specific field Link Selection for Browsing Breadth-first vs. depth-first 23 Evaluate the system under various assumptions of user, system and the combination of both

B OOK S EARCH Understanding Book Search Behavior on the Web 24 [Submitted to SIGIR12]

Why does Book Search Matter? 25 Book Search U

Understanding Book Search on the Web OpenLibrary User-contributed online digital library DataSet: 8M records from web server log 26

Comparison of Navigational Behavior Users entering directly show different behaviors from users entering via web search engines 27 Users entering the site directly Users entering via Google

Comparison of Search Behavior 28 Rich interaction reduces the query lengths Filtering induces more interactions than search

Summary so far… Rich User Interactions for Book Search Combination of external and internal search engines Combination of search, advanced UI, and filtering Analysis using User Modeling Model both navigation and search behavior Characterize and compare different user groups What Still Keeps Me Busy… Evaluating the Field Relevance Model for book search Build a predictive model of task-level search success [1] 29 [1] Beyond DCG: User Behavior as a Predictor of a Successful Search [Hassan10]

W EB S EARCH Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic 30 [WSDM12]

Myths on Web Search 31 Web search is a solved problem Maybe true for navigational queries, yet not for tail queries [1] Search results are already personalized Lots of localization efforts (e.g., query: pizza) Little personalization at individual user level Personalization will solve everything Not enough evidence in many cases Users do deviate from their profile [1] Web search solved? All result rankings the same? [Zaragoza10] Need for rich user modeling and interaction!

User Modeling by Reading Level and Topic Reading Level and Topic Reading Level: proficiency (comprehensibility) Topic: topical areas of interests Profile Construction Profile Applications Improving personalized search ranking Enabling expert content recommendation P(R|d 1 ) P(T|d 1 ) P(R|d 1 ) P(T|d 1 ) P(R|d 1 ) P(T|d 1 ) P(R,T|u)

Reading level distribution varies across major topical categories

Profile matching can predict user’s preference over search results Metric % of user’s preferences predicted by profile matching Results By the degree of focus in user profile By the distance metric between user and website User Group#ClicksKL R (u,s)KL T (u,s)KL RLT (u,s) ↑ Focused 5, %60.79%65.27% 147, %54.20%54.41% ↓Diverse 197, %53.36%53.63%

Comparing Expert vs. Non-expert URLs Expert vs. Non-expert URLs taken from [White’09] Higher Reading Level Lower Topic Diversity

Enabling Browsing for Web Search SurfCanyon ® Recommend results based on clicks 36 Initial results indicate that recommendations are useful for shopping domain. [Work-in-progress]

L OOKING O NWARD 37

Summary: Rich User Interactions Combining Search and Browsing for Personal Search Associative browsing complements search for known-item finding Combining Search and Filtering for Book Search Rich interactions reduce user efforts for keyword search Non-intrusive Browsing for Web Search Providing suggestions for browsing is beneficial for shopping task 38

Summary: Rich User Modeling Query (user) modeling improves ranking quality Estimation is possible without past interactions User feedback improves effectiveness even more User Modeling improves evaluation / analysis Prob. user model allows the evaluation of personal search Prob. user model explains complex book search behavior Enriched representation has additional values 39 P(R,T|u)

Where’s the Future of Information Seeking? Thank you! Any

Selected Publications Structured Document Retrieval A Probabilistic Retrieval Model for Semi-structured Data [ECIR09] A Field Relevance Model for Structured Document Retrieval [ECIR11] Personal Search Retrieval Experiments using Pseudo-Desktop Collections [CIKM09] Ranking using Multiple Document Types in Desktop Search [SIGIR10] Building a Semantic Representation for Personal Information [CIKM10] Evaluating an Associative Browsing Model for Personal Info. [CIKM11] Evaluating Search in Personal Social Media Collections [WSDM12] Web / Book Search Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic [WSDM12] Understanding Book Search Behavior on the Web [In submission to SIGIR12] 41 More or cs.umass.edu/~jykim

O PTIONAL S LIDES 42

Bonus: My Self-tracking Efforts Life-optimization Project (2002~2006) LiFiDeA Project ( ) 43

Topic and reading level characterize websites in each category Interesting divergence for the case of users

The Great Divide: IR vs. RecSys IR Query / Document Provide relevant info. Reactive (given query) SIGIR / CIKM / WSDM RecSys User / Item Support decision making Proactive (push item) RecSys / KDD / UMAP 45 Both requires similarity / matching score Personalized search involves user modeling Most RecSys also involves keyword search Both are parts of user’s info seeking process Both requires similarity / matching score Personalized search involves user modeling Most RecSys also involves keyword search Both are parts of user’s info seeking process

Criteria for Choosing IR vs. RecSsys 46 IR RecSys User’s willingness to express information needs Lack of evidence about the user himself Confidence in predicting user’s preference Availability of matching items to recommend

The Great Divide: IR vs. CHI IR Query / Document Relevant Results Ranking / Suggestions Feature Engineering Batch Evaluation (TREC) SIGIR / CIKM / WSDM CHI User / System User Value / Satisfaction Interface / Visualization Human-centered Design User Study CHI / UIST / CSCW 47 Can we learn from each other?