Download presentation
Presentation is loading. Please wait.
Published byPearl Townsend Modified over 9 years ago
1
What Happens After the Search? User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC
2
Search is only part of the Information Analysis Process Repositories Workspace Goals
3
Outline A. Background: Search and Short Queries The role of Graphics in Retrieval Results B. Throw light on Retrieval Results by Showing Context 1. Context of query terms in docs (TileBars) 2. Inter-document context (Scatter/Gather) C. Initial attempts at Evaluation D. Conclusions
4
Search Results (Scope of this work) l “Ad hoc” searches n Unanticipated, not tried previously n As opposed to filtering, monitoring l External collections n Personal information spaces probably require special consideration l Naïve users n As opposed to intermediaries l Full text, general document collections
5
Search Goal Types ANSWER A QUESTION n How old is Bob Dole? n Who wrote Dante’s Inferno? FIND A PARTICULAR DOCUMENT n Dan Rose’s Home Page n IRS Form 1040 Schedule D MAKE AN INFORMED OPINION / ASSESS TIMELY SITUATION n What are the tradeoffs and side effects of this treatment? n Should I wait for the new CD technology? n How will Apple’s new CEO affect sales next quarter? GET BACKGROUND INFORMATION n How to plant annual bulbs in Northern California n What aspects of 3D compact disk technology are patented?
6
What is the Goal of the Search? Different goal types require different collections and different search techniques and different retrieval result display strategies E.g., a question should receive an answer rather than a document Focus of this work: n General, ill-defined queries n General collections n Naïve, or inexperienced, users
7
Problems with Short Queries TOO MANY DOCUMENTS n If only a few words supplied, there is little basis upon which to decide how to order the documents n The fewer words there are, the less they serve to mutually disambiguate one another
8
Why Short Queries? THE USERS DON’T KNOW n What they want n How to express what they want n How what they want is expressed in the collection LONG QUERIES CAN BACKFIRE n If ANDing terms, get empty results n If ORing terms, get matches on useless subsets of terms n If using Similarity Search, can’t specify important terms n Some search engines can’t handle long queries
9
Balancing Text and Graphics THE CHALLENGE: HOW TO COMBINE GRAPHICAL AND TEXTUAL REPRESENTATIONS USEFULLY? g 1 (t) R1R1 Graphics and animation are very useful for summarizing complex data However, text content is difficult to graph
10
“Fixing” Short Queries: Help Users Understand Result Sets TWO APPROACHES, FROM TWO DIRECTIONS Context of Query Terms (within documents) Inter-document Context Show info about many docs simultaneously
11
Showing Context of Query Terms l Existing Approaches: n Lists of titles + ranks n This augmented with other meta-information n This augmented with how often each search term occurred n Graphical display of which subset of query terms occurred
12
Brief Summaries
13
List Query Terms
14
Idea: Show Which Terms Occur How Often A B C D l Problem: Which words did what? l Solution: One symbol per term Term B was most frequent, followed by Term A and Term D. Term C did not appear.
15
Represent Document Structure l Recognize the structure of the document l Represent this structure graphically l Simultaneously display representation of query term frequencies and doc structure l Term distribution becomes explicit l Many docs’ info can be seen simultaneously A B C D A B C D 12345
16
Add Structure to the Query Problem: Only room for a few terms Solution: Structure the Query n A list of Topics »Can be category labels, lists of synonyms, etc. n Translated into Conjunctive Normal Form »User doesn’t need to know this »No need for special syntax »Allows for a variety of ranking algorithms n Creates a feedback loop between query structure and graphical representation
17
Graphical Landscapes Problems: n No Titles! n No Frequencies! n No Document Lengths! n No Nearness/Overlap! n Each document classified only one way! A C B D BB DD CC AA EE
18
“Fixing” Short Queries Other Techniques l Term Expansion l Relevance Feedback l Category Information
19
Short Queries: Imprecisely Understood Goals A tack to be pursued in future: Identify the goal type. Then n Suggest a relevant collection n Suggest a search strategy n Suggest links to sources of expertise n Create information sources tailored to the goal type A more general, but less powerful tack: Provide useful descriptions of the space of retrieved information
20
Dealing with Short Queries l Using Text Analysis to find Context l Finding a Useful Mix of Text and Graphics n TileBars Query-document context Shows structure of document and query Compact: many docs compared at once n Scatter/Gather Clustering Inter-document Context Shows summary information textually Uses state/animation for relationships among clusters l Add simple structure to the Query Format l Future work: incorpore into Workspace / SenseMaking environment (e.g., Information Visualizer, Card et al.)
21
Background: A Brief History of IR l Card Catalogs: Boolean Search on title words and subject codes l Abstracts and Newswire: Vector Space Model, Probabilistic Ranking, and “Soft” Boolean l Full Text (NIST TREC): Vector Space and Probabilistic Methods on very long queries l WWW: Boolean+ Search on Short Queries
22
Naïve Users Write Short Queries l 88% of queries on the THOMAS system (Congressional bills) used <= 3 words (Croft et al. 95) l Average query length is 7 words on the MEAD news system (Lu and Keefer 94) l Most systems perform poorly on short queries on full-text collections (compared to long queries) (Jing and Croft 94, Voorhees 94)
23
The Vector Space Model l Represent each document as a term vector n If term does not appear in doc, value = 0 n Otherwise, record frequency or weight of term l Represent the query as a similar term vector l Compare the query vector to every document vector n Usually some variation on the inner product n Various strategies for different aspects of normalization Probabilistic models: approximately the same idea, but try to predict the relevance of a document given a query
24
Conclusions In a general search situation: l We can organize large retrieval results collections for user viewing and manipulation (Scatter/Gather) l We can show, compactly and informatively, the patterns of distributions of query terms in retrieved documents (TileBars) l We need still more powerful ways to reveal context and structure of retrieval results l Future: get a better understanding of the user goals in order to build better interfaces
25
Term Overlap l Problem: Several query terms appear… l … but have nothing to do with one another. Out, damned spot! … … Throw physics to the dogs, I’ll none of it. … He has kill’d me, Mother. Run away, I pray you !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.