CS276A Text Information Retrieval, Mining, and Exploitation Lecture 9 5 Nov 2002.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Modern Information Retrieval Chapter 1: Introduction
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Google Chrome & Search C Chapter 18. Objectives 1.Use Google Chrome to navigate the Word Wide Web. 2.Manage bookmarks for web pages. 3.Perform basic keyword.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Jane Reid, AMSc IRIC, QMUL, 13/11/01 1 IR interfaces Purpose: to support users in information-seeking tasks Issues: –Functionality –Usability Motivations.
Evaluating Search Engine
Information Retrieval in Practice
Search Engines and Information Retrieval
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
ISP 433/533 Week 2 IR Models.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
Interfaces for Selecting and Understanding Collections.
INFO 624 Week 3 Retrieval System Evaluation
Information Retrieval Interaction CMSC 838S Douglas W. Oard April 27, 2006.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
HCI Part 2 and Testing Session 9 INFM 718N Web-Enabled Databases.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
1 SIMS 247: Information Visualization and Presentation Marti Hearst March 3, 2004.
WMES3103: INFORMATION RETRIEVAL WEEK 10 : USER INTERFACES AND VISUALIZATION.
8/28/2001Information Organization and Retrieval SIMS 202 Information Organization and Retrieval Prof. Ray Larson & Prof. Warren Sack UC Berkeley SIMS Tues/Thurs.
Information Retrieval
ISP 433/633 Week 12 User Interface in IR. Why care about User Interface in IR Human Search using IR depends on –Search in IR and search in human memory.
1 CS/INFO 430 Information Retrieval Lecture 23 Usability 1.
Overview of Search Engines
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
Lesson 12 — The Internet and Research
Search Engines and Information Retrieval Chapter 1.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
1 Adapting the TileBar Interface for Visualizing Resource Usage Session 602 Adapting the TileBar Interface for Visualizing Resource Usage Session 602 Larry.
WEB BAR 2004 Advanced Retrieval and Web Mining Lecture 14.
Using Metadata in Search Prof. Marti Hearst SIMS 202, Lecture 27.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Fourth Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Interaction LBSC 734 Module 4 Doug Oard. Agenda Where interaction fits Query formulation Selection part 1: Snippets  Selection part 2: Result sets Examination.
CPT 499 Internet Skills for Educators Session Three Class Notes.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Information Retrieval
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Search and Retrieval: Query Languages Prof. Marti Hearst SIMS 202, Lecture 19.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Third Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
Information Retrieval in Practice
Search Engine Architecture
Proposal for Term Project
A Contextual Computing approach towards Personalized Search
Document Clustering Matt Hughes.
Introduction to Information Retrieval
Planning and Storyboarding a Web Site
Attributes and Values Describing Entities.
Presentation transcript:

CS276A Text Information Retrieval, Mining, and Exploitation Lecture 9 5 Nov 2002

Recap: Relevance Feedback Rocchio Algorithm: Typical weights: alpha = 8, beta = 64, gamma = 64 Tradeoff alpha vs beta/gamma: If we have a lot of judged documents, we want a higher beta/gamma. But we usually don’t …

Pseudo Feedback documents retrieve documents top k documents apply relevance feedback label top k docs relevant initial query

Pseudo-Feedback: Performance

Today’s topics User Interfaces Browsing Visualization

The User in Information Access Stop Information need Explore results Formulate/ Reformulate Done? Query Send to system Receive results yes no User Find starting point

The User in Information Access yes no Focus of most IR! Stop Information need Explore results Formulate/ Reformulate Done? Query Send to system Receive results User Find starting point

Information Access in Context Stop High-Level Goal Synthesize Done? Analyze yes no User Information Access

The User in Information Access Stop Information need Explore results Formulate/ Reformulate Done? Query Send to system Receive results yes no User Find starting point

Starting points Source selection Highwire press Lexis-nexis Google! Overviews Directories/hierarchies Visual maps Clustering

Highwire Press Source Selection

Hierarchical browsing Level 2 Level 1 Level 0

Visual Browsing: Themescape

Browsing x x x xx x x x x x x x x x Starting point Credit: William Arms, Cornell Answer

Scatter/Gather Scatter/gather allows the user to find a set of documents of interest through browsing. Take the collection and scatter it into n clusters. Pick the clusters of interest and merge them. Iterate

Scatter/Gather

Scatter/gather

How to Label Clusters Show titles of typical documents Titles are easy to scan Authors create them for quick scanning! But you can only show a few titles which may not fully represent cluster Show words/phrases prominent in cluster More likely to fully represent cluster Use distinguishing words/phrases But harder to scan

Visual Browsing: Hyperbolic Tree

UWMS Data Mining Workshop Study of Kohonen Feature Maps H. Chen, A. Houston, R. Sewell, and B. Schatz, JASIS 49(7) Comparison: Kohonen Map and Yahoo Task: “Window shop” for interesting home page Repeat with other interface Results: Starting with map could repeat in Yahoo (8/11) Starting with Yahoo unable to repeat in map (2/14) Credit: Marti Hearst

UWMS Data Mining Workshop Study (cont.) Participants liked: Correspondence of region size to # documents Overview (but also wanted zoom) Ease of jumping from one topic to another Multiple routes to topics Use of category and subcategory labels Credit: Marti Hearst

UWMS Data Mining Workshop Study (cont.) Participants wanted: hierarchical organization other ordering of concepts (alphabetical) integration of browsing and search corresponce of color to meaning more meaningful labels labels at same level of abstraction fit more labels in the given space combined keyword and category search multiple category assignment (sports+entertain) Credit: Marti Hearst

Browsing Effectiveness depends on Starting point Ease of orientation (are similar docs “close” etc, intuitive organization) How adaptive system is Compare to physical browsing (library, grocery store)

Searching vs. Browsing Information need dependent Open-ended (find an interesting quote on the virtues of friendship) -> browsing Specific (directions to Pacific Bell Park) -> searching User dependent Some users prefer searching, others browsing (confirmed in many studies: some hate to type) You don’t need to know vocabulary for browsing. System dependent (some web sites don’t support search) Searching and browsing are often interleaved.

Searchers vs. Browsers 1/3 of users do not search at all 1/3 rarely search (or urls only) Only 1/3 understand the concept of search (ISP data from 2000)

Exercise Observe your own information seeking behavior WWW University library Grocery store Are you a searcher or a browser? How do you reformulate your query? Read bad hits, then minus terms Read good hits, then plus terms Try a completely different query …

The User in Information Access Stop Information need Explore results Formulate/ Reformulate Done? Query Send to system Receive results yes no User Find starting point

Query Specification Recall: Relevance feedback Query expansion Spelling correction Query-log mining based Interaction styles for query specification Queries on the Web Parametric search Term browsing

Query Specification: Interaction Styles Shneiderman 97 Command Language Form Fillin Menu Selection Direct Manipulation Natural Language Example: How do each apply to Boolean Queries Credit: Marti Hearst

Command-Based Query Specification command attribute value connector … find pa shneiderman and tw user# What are the attribute names? What are the command names? What are allowable values? Credit: Marti Hearst

Form-Based Query Specification (Altavista) Credit: Marti Hearst

Form-Based Query Specification (Melvyl) Credit: Marti Hearst

Form-based Query Specification (Infoseek) Credit: Marti Hearst

Direct Manipulation Spec. VQUERY (Jones 98) Credit: Marti Hearst

Menu-based Query Specification (Young & Shneiderman 93) Credit: Marti Hearst

Query Specification/Reformulation A good user interface makes it easy for the user to reformulate the query Challenge: one user interface is not ideal for all types of information needs

Types of Information Needs Need answer to question (who won the game?) Re-find a particular document Find a good recipe for tonight’s dinner Authoritative summary of information (HIV review) Exploration of new area (browse sites about Baja)

Queries on the Web Most Frequent on 2002/10/26

Queries on the Web (2000)

Intranet Queries (Aug 2000) 3351 bearfacts 3349 telebears 1909 extension 1874 schedule+of+classes 1780 bearlink 1737 bear+facts 1468 decal 1443 infobears 1227 calendar 989 career+center 974 campus+map 920 academic+calendar 840 map 773 bookstore 741 class+pass 738 housing 721 tele-bears 716 directory 667 schedule 627 recipes 602 transcripts 582 tuition 577 seti 563 registrar 550 info+bears 543 class+schedule 470 financial+aid Source: Ray Larson

Intranet Queries Summary of sample data from 3 weeks of UCB queries 13.2% Telebears/BearFacts/InfoBears/BearLink (12297) 6.7% Schedule of classes or final exams (6222) 5.4% Summer Session (5041) 3.2% Extension (2932) 3.1% Academic Calendar (2846) 2.4% Directories (2202) 1.7% Career Center (1588) 1.7% Housing (1583) 1.5% Map (1393) Average query length over last 4 months: 1.8 words This suggests what is difficult to find from the home page Source: Ray Larson

Query Specification: Feast or Famine Famine Feast Specifying a well targeted query is hard. Bigger problem for Boolean.

Parametric search Each document has, in addition to text, some “meta-data” e.g., Language = French Format = pdf Subject = Physics etc. Date = Feb 2000 A parametric search interface allows the user to combine a full-text query with selections on these parameters e.g., language, date range, etc.

Notice that the output is a (large) table. Various parameters in the table (column headings) may be clicked on to effect a sort. Parametric search example

We can add text search.

Interfaces for term browsing

The User in Information Access Stop Information need Explore results Formulate/ Reformulate Done? Query Send to system Receive results yes no User Find starting point

Explore Results Determine: Do these results answer my question? Summarization More generally: provide context Hypertext navigation: Can I find the answer by following a link? Browsing and clustering (again) Browse to explore results

Explore Results: Context We can’t present complete documents in the result set – too much information. Present information about each doc Must be concise (so we can show many docs) Must be informative Typical information about each document Summary Context of query words Meta data: date, author, language, file name/url Context of document in collection Information about structure of document

Context in Collection: Cha-Cha

Category Labels Advantages: Interpretable Capture summary information Describe multiple facets of content Domain dependent, and so descriptive Disadvantages Do not scale well (for organizing documents) Domain dependent, so costly to acquire May mis-match users’ interests Credit: Marti Hearst

Evaluate Results Context in Hierarchy: Cat-a-Cone

Explore Results: Summarization Query-dependent summarization KWIC (keyword in context) lines (a la google) Query-independent summarization Summary written by author (if available) Exploit genre (news stories) Sentence extraction Natural language generation

Evaluate Results Structure of document: SeeSoft

Personalization Query Augmentation Interests Demographics Click Stream Search History Application Usage Result Processing Outride Schema User x Content x History x Demographics Intranet Search Web Search Search Engine Schema Keyword x Doc ID x Link Rank Outride Personalized Search System User Query Result Set Outride Side Bar Interface

How Long to Get an Answer? Average Task Completion Time in Seconds SOURCE: ZDLabs/eTesting, Inc. October 2000

Time (Seconds) User Skill Level SOURCE: ZDLabs/eTesting, Inc. October 2000 Novices versus Experts (Average Time to Complete Task)

Performance of Interactive Retrieval

Boolean Queries: Interface Issues Boolean logic is difficult for the average user. Much research was done on interfaces facilitating the creation of boolean queries by non-experts. Much of this research was made obsolete by the web. Current view is that non-expert users are best served with non-boolean or simple +/- boolean (pioneered by altavista). But boolean queries are the standard for certain groups of expert users (eg, lawyers).

User Interfaces: Other Issues Technical HCI issues How to use screen real estate One monolithic window or many? Undo operator Give access to history Alternative interfaces for novel/expert users Disabilities

Take-Away Don’t ignore the user in information retrieval. Finding matching documents for a query is only part of information access and “knowledge work”. In addition to core information retrieval, information access interfaces need to support Finding starting points Formulation/reformulation of queries Exploring/evaluating results

Exercise Current information retrieval user interfaces are designed for typical computer screens. How would you design a user interface for a wall-size screen?

Resources MIR Ch – 10.7 Donna Harman, Overview of the fourth text retrieval conference (TREC 4), National Institute of Standards and Technology. Cutting, Karger, Pedersen, Tukey. Scatter/Gather. ACM SIGIR. Hearst, Cat-a-cone, an interactive interface for specifying searches and viewing retrieving results in a large category hierarchy, ACM SIGIR.