Slides Please download the slides from www.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf 2
LBSC 796/CMSC 838o Daqing He, Douglas W. Oard Session 5, March 8, 2004 Interactions LBSC 796/CMSC 838o Daqing He, Douglas W. Oard Session 5, March 8, 2004 1
Slides Please download the slides from www.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf 2
Agenda Interactions in retrieval systems Query formulation Selection Examination Document delivery 2
System Oriented Retrieval Model Search Query Ranked List Indexing Index Acquisition Collection
Whose Process Is It? Who initiates a search process? Who controls the progress? Who ends a search process? 8
User Oriented Retrieval Model Source Selection Query Formulation IR System Search Query Ranked List Collection Indexing Index Document Selection Document Document Examination Document Collection Acquisition Document Delivery
Taylor’s Conceptual Framework Four levels of “information needs” Visceral What you really want to know Conscious What you recognize that you want to know Formalized (e.g., TREC topics) How you articulate what you want to know Compromised (e.g., TREC queries) How you express what you want to know to a system [Taylor 68] 8
Belkin’s ASK model Users are concerned with a problem But do not clearly understand the problem itself the information need to solve the problem Anomalous State of Knowledge Need clarification process to form a query [Belkin 80, Belkin, Oddy, Brooks 82] 8
What are humans good at? Sense low level stimuli Recognize patterns Reason inductively Communicate with multiple channels Apply multiple strategies Adapt to changes or unexpected events Fuzzy and hard things From Ben Shneiderman’s “designing user interfaces”
What are computers good at? Sense stimuli outside human’s range Calculate fast and mechanical Store large quantities and recall accurately Response rapidly and consistently Perform repetitive actions reliably Maintain performance under heavy load and extended time “Simple and sharply defined things” again paraphrasing George Miller From Ben Shneiderman’s “designing user interfaces”
What should Interaction be? Synergic Humans do things that human are good at Computers do things that computers are good at the strength of one covers the weakness of the other
Source Selection People have their own preference Different tasks require different sources Possible choices ask help from people or machines browsing or search, or combination general purpose vs specific domain IR system different collections
Query Formulation User Query Formulation Search Collection Indexing
User’s Goals User’s goals How can the user achieve this goal? Identify the right query for the current need conscious/formalized need => compromised need How can the user achieve this goal? Infer the right query terms Infer the right composition of terms
System’s Goals Help the user build links between needs know more about the system and the collection
How does System Achieve Its Goals? Ask more from the user Encourage long/complex queries Provide a large text entry area Use forms filling or direct manipulation Initiate interactions Ask questions related to the needs Engage a dialogue with the user Infer from relevant items Infer from previous queries Infer from previous retrieved documents
Query Formulation Interaction Styles Shneiderman 97 Command Language Form Fillin Menu Selection Direct Manipulation Natural Language Credit: Marti Hearst
Form-Based Query Specification (Melvyl) Credit: Marti Hearst
Form-based Query Specification (Infoseek) Credit: Marti Hearst
Direct Manipulation Spec. VQUERY (Jones 98) Credit: Marti Hearst
High-Accuracy Retrieval of Documents Topic Statement Search Engine Baseline Results Answers to Clarification Questions New track in TREC Study interaction between a user and a system Only one chance to interact with the user Query formulation is still the system’s task Extended batch IR model Acknowledge queries not equal to needs Allow asking user a set of clarification questions Designed for controlled evaluation Clarification questions generated in batch mode, Clarification questions only generated once Use ranked list as the outcome of the search Reasons for participation Human factor in IR process Controlled evaluation is hard in full interactive IR experiment HARD Results Clarification Questions
UMD HARD 2003 retrieval model Clarification Questions HARD retrieval process Preference among subtopic areas Query Expansion Recently viewed relevant documents Document Reranking Refined Ranked List Preference to sub-collections or genres One way of achieving personalization is to include user’s search contexts into the retrieval process. To illustrate this, let’s look at a scenario. Desired result formats Passage Retrieval Ranked List Merging [He & Demner, 2003]
Dialogues in Need Negotiation Information Need Document Collection 1. Formulate a Query Search Engine 2. Need negotiation Lets see what do I mean. Our interests are in the retrieval process. Which in a library situation would be: 1. A person with an information need 2. He formulates a query and consults with an experienced human intermediary who knows a lot about the document collection the library has. 3. She looks through the collection and finds the documents that match with the user query 3. Find Documents Matching the Query Search Results
Personalization through User’s Search Contexts Incremental Learner Casablanca Context African Queen Context One way of achieving personalization is to include user’s search contexts into the retrieval process. To illustrate this, let’s look at a scenario. Romantic Films Context Information Retrieval System Romantic Films [Goker & He, 2000]
Things That Hurt Obscure ranking methods Counterintuitive statistics Unpredictable effects of adding or deleting terms Only single-term queries avoid this problem Counterintuitive statistics “clis”: AltaVista says 3,882 docs match the query “clis library”: 27,025 docs match the query! Every document with either term was counted 11
Browsing Retrieved Set User Query Formulation Query Search Ranked List Document Selection Document Query Reformulation Document Reselection Document Examination
Indicative vs. Informative Terms often applied to document abstracts Indicative abstracts support selection They describe the contents of a document Informative abstracts support understanding They summarize the contents of a document Applies to any information presentation Presented for indicative or informative purposes 15
User’s Browsing Goals Identify documents for some form of delivery An indicative purpose Query Enrichment Relevance feedback (indicative) User designates “more like this” documents System adds terms from those documents to the query Manual reformulation (informative) Better approximation of visceral information need 16
System’s Goals Assist the user to Identify relevant documents Identify potential useful terms for clarifying the right information need for generating better queries
Browsing Retrieved Set User Query Formulation Query Search Ranked List Document Selection Document Query Reformulation Document Reselection Document Examination
A Selection Interface Taxonomy One dimensional lists Content: title, source, date, summary, ratings, ... Order: retrieval status value, date, alphabetic, ... Size: scrolling, specified number, RSV threshold Two dimensional displays Construction: clustering, starfields, projection Navigation: jump, pan, zoom Three dimensional displays Contour maps, fishtank VR, immersive VR 18
Extraction-Based Summarization Robust technique for making disfluent summaries Four broad types: Single-document vs. multi-document Term-oriented vs. sentence-oriented Combination of evidence for selection: Salience: similarity to the query Selectivity: IDF or chi-squared Emphasis: title, first sentence For multi-document, suppress duplication 27
Generated Summaries Fluent summaries for a specific domain Define a knowledge structure for the domain Frames are commonly used Analysis: process documents to fill the structure Studied separately as “information extraction” Compression: select which facts to retain Generation: create fluent summaries Templates for initial candidates Use language model to select an alternative 27
Google’s KWIC Summary For Query “University of Maryland College Park” 20
Teoma’s Query Refine Suggestions url: www.teoma.com 20
Vivisimo’s Clustering Results url: vivisimo.com
Kartoo’s Cluster Visualization url: kartoo.com
Cluster Formation Based on inter-document similarity Computed using the cosine measure, for example Heuristic methods can be fairly efficient Pick any document as the first cluster “seed” Add the most similar document to each cluster Adding the same document will join two clusters Check to see if each cluster should be split Does it contain two or more fairly coherent groups? Lots of variations on this have been tried 20
Starfield http://www.cs.umd.edu/hcil/spotfire/ 21
Dynamic Queries: IVEE/Spotfire/Filmfinder (Ahlberg & Shneiderman 93) http://www.cs.umd.edu/hcil/eosdis/
Constructing Starfield Displays Two attributes determine the position Can be dynamically selected from a list Numeric position attributes work best Date, length, rating, … Other attributes can affect the display Displayed as color, size, shape, orientation, … Each point can represent a cluster Interactively specified using “dynamic queries” 22
Projection Depict many numeric attributes in 2 dimensions While preserving important spatial relationships Typically based on the vector space model Which has about 100,000 numeric attributes! Approximates multidimensional scaling Heuristic approaches are reasonably fast Often visualized as a starfield But the dimensions lack any particular meaning 23
Contour Map Displays Display a cluster density as terrain elevation Fit a smooth opaque surface to the data Visualize in three dimensions Project two 2-D and allow manipulation Use stereo glasses to create a virtual “fishtank” Create an immersive virtual reality experience Mead mounted stereo monitors and head tracking “Cave” with wall projection and body tracking 24
ThemeView Credit to: Pacific Northwest National Laboratory
Browsing Retrieved Set User Query Formulation Query Search Ranked List Document Selection Document Query Reformulation Document Reselection Document Examination
Full-Text Examination Interfaces Most use scroll and/or jump navigation Some experiments with zooming Long documents need special features “Best passage” function helps users get started Overlapping 300 word passages work well “Next search term” function facilitates browsing Integrated functions for relevance feedback Passage selection, query term weighting, … 26
A Long Document
Document lens Robertson & Mackinlay, UIST'93, Atlanta, 1993
TileBar [Hearst et al 95]
SeeSoft [Eric 94]
Things That Help Show the query in the selection interface It provides context for the display Explain what the system has done It is hard to control a tool you don’t understand Highlight search terms, for example Complement what the system has done Users add value by doing things the system can’t Expose the information users need to judge utility 28
Document Delivery User Document Examination Document Document Delivery
Delivery Modalities On-screen viewing Printing Fax-on-demand Good for hypertext, multimedia, cut-and-paste, … Printing Better resolution, portability, annotations, … Fax-on-demand Really just another way to get to a printer Synthesized speech Useful for telephone and hands-free applications 30
Take-Away Messages IR process belongs to users Matching documents for a query is only part of the whole IR process But IR system can help users And IR systems need to support Query formulation/reformulation Document Selection/Examination
Two Minute Paper When examining documents in the selection and examination interfaces, which type of information need (visceral, conscious, formalized, or compromised) guides the user’s decisions? Please justify your answer. What was the muddiest point in today’s lecture? 35
Alternate Query Modalities Spoken queries Used for telephone and hands-free applications Reasonable performance with limited vocabularies But some error correction method must be included Handwritten queries Palm pilot graffiti, touch-screens, … Fairly effective if some form of shorthand is used Ordinary handwriting often has too much ambiguity 13