What Happens After the Search? User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC.

What Happens After the Search? User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC

Search is only part of the Information Analysis Process Repositories Workspace Goals

Outline A. Background: Search and Short Queries The role of Graphics in Retrieval Results B. Throw light on Retrieval Results by Showing Context 1. Context of query terms in docs (TileBars) 2. Inter-document context (Scatter/Gather) C. Initial attempts at Evaluation D. Conclusions

Search Results (Scope of this work) l “Ad hoc” searches n Unanticipated, not tried previously n As opposed to filtering, monitoring l External collections n Personal information spaces probably require special consideration l Naïve users n As opposed to intermediaries l Full text, general document collections

Search Goal Types ANSWER A QUESTION n How old is Bob Dole? n Who wrote Dante’s Inferno? FIND A PARTICULAR DOCUMENT n Dan Rose’s Home Page n IRS Form 1040 Schedule D MAKE AN INFORMED OPINION / ASSESS TIMELY SITUATION n What are the tradeoffs and side effects of this treatment? n Should I wait for the new CD technology? n How will Apple’s new CEO affect sales next quarter? GET BACKGROUND INFORMATION n How to plant annual bulbs in Northern California n What aspects of 3D compact disk technology are patented?

What is the Goal of the Search? Different goal types require different collections and different search techniques and different retrieval result display strategies E.g., a question should receive an answer rather than a document Focus of this work: n General, ill-defined queries n General collections n Naïve, or inexperienced, users

Problems with Short Queries TOO MANY DOCUMENTS n If only a few words supplied, there is little basis upon which to decide how to order the documents n The fewer words there are, the less they serve to mutually disambiguate one another

Why Short Queries? THE USERS DON’T KNOW n What they want n How to express what they want n How what they want is expressed in the collection LONG QUERIES CAN BACKFIRE n If ANDing terms, get empty results n If ORing terms, get matches on useless subsets of terms n If using Similarity Search, can’t specify important terms n Some search engines can’t handle long queries

Balancing Text and Graphics THE CHALLENGE: HOW TO COMBINE GRAPHICAL AND TEXTUAL REPRESENTATIONS USEFULLY? g 1 (t) R1R1 Graphics and animation are very useful for summarizing complex data However, text content is difficult to graph

“Fixing” Short Queries: Help Users Understand Result Sets TWO APPROACHES, FROM TWO DIRECTIONS Context of Query Terms (within documents) Inter-document Context Show info about many docs simultaneously

Showing Context of Query Terms l Existing Approaches: n Lists of titles + ranks n This augmented with other meta-information n This augmented with how often each search term occurred n Graphical display of which subset of query terms occurred

Brief Summaries

List Query Terms

Idea: Show Which Terms Occur How Often A B C D l Problem: Which words did what? l Solution: One symbol per term Term B was most frequent, followed by Term A and Term D. Term C did not appear.

Represent Document Structure l Recognize the structure of the document l Represent this structure graphically l Simultaneously display representation of query term frequencies and doc structure l Term distribution becomes explicit l Many docs’ info can be seen simultaneously A B C D A B C D 12345

Add Structure to the Query Problem: Only room for a few terms Solution: Structure the Query n A list of Topics »Can be category labels, lists of synonyms, etc. n Translated into Conjunctive Normal Form »User doesn’t need to know this »No need for special syntax »Allows for a variety of ranking algorithms n Creates a feedback loop between query structure and graphical representation

Graphical Landscapes Problems: n No Titles! n No Frequencies! n No Document Lengths! n No Nearness/Overlap! n Each document classified only one way! A C B D BB DD CC AA EE

“Fixing” Short Queries Other Techniques l Term Expansion l Relevance Feedback l Category Information

Short Queries: Imprecisely Understood Goals A tack to be pursued in future: Identify the goal type. Then n Suggest a relevant collection n Suggest a search strategy n Suggest links to sources of expertise n Create information sources tailored to the goal type A more general, but less powerful tack: Provide useful descriptions of the space of retrieved information

Dealing with Short Queries l Using Text Analysis to find Context l Finding a Useful Mix of Text and Graphics n TileBars Query-document context Shows structure of document and query Compact: many docs compared at once n Scatter/Gather Clustering Inter-document Context Shows summary information textually Uses state/animation for relationships among clusters l Add simple structure to the Query Format l Future work: incorpore into Workspace / SenseMaking environment (e.g., Information Visualizer, Card et al.)

Background: A Brief History of IR l Card Catalogs: Boolean Search on title words and subject codes l Abstracts and Newswire: Vector Space Model, Probabilistic Ranking, and “Soft” Boolean l Full Text (NIST TREC): Vector Space and Probabilistic Methods on very long queries l WWW: Boolean+ Search on Short Queries

Naïve Users Write Short Queries l 88% of queries on the THOMAS system (Congressional bills) used <= 3 words (Croft et al. 95) l Average query length is 7 words on the MEAD news system (Lu and Keefer 94) l Most systems perform poorly on short queries on full-text collections (compared to long queries) (Jing and Croft 94, Voorhees 94)

The Vector Space Model l Represent each document as a term vector n If term does not appear in doc, value = 0 n Otherwise, record frequency or weight of term l Represent the query as a similar term vector l Compare the query vector to every document vector n Usually some variation on the inner product n Various strategies for different aspects of normalization Probabilistic models: approximately the same idea, but try to predict the relevance of a document given a query

Conclusions In a general search situation: l We can organize large retrieval results collections for user viewing and manipulation (Scatter/Gather) l We can show, compactly and informatively, the patterns of distributions of query terms in retrieved documents (TileBars) l We need still more powerful ways to reveal context and structure of retrieval results l Future: get a better understanding of the user goals in order to build better interfaces

Term Overlap l Problem: Several query terms appear… l … but have nothing to do with one another. Out, damned spot! … … Throw physics to the dogs, I’ll none of it. … He has kill’d me, Mother. Run away, I pray you !

What Happens After the Search? User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC.

Similar presentations

Presentation on theme: "What Happens After the Search? User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

What Happens After the Search? User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC.

Similar presentations

Presentation on theme: "What Happens After the Search? User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC."— Presentation transcript:

Similar presentations

About project

Feedback