Presentation is loading. Please wait.

Presentation is loading. Please wait.

What Happens After the Search? User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC.

Similar presentations


Presentation on theme: "What Happens After the Search? User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC."— Presentation transcript:

1 What Happens After the Search? User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC

2 Search is only part of the Information Analysis Process Repositories Workspace Goals

3 Outline A. Background: Search and Short Queries The role of Graphics in Retrieval Results B. Throw light on Retrieval Results by Showing Context 1. Context of query terms in docs (TileBars) 2. Inter-document context (Scatter/Gather) C. Initial attempts at Evaluation D. Conclusions

4 Search Results (Scope of this work) l “Ad hoc” searches n Unanticipated, not tried previously n As opposed to filtering, monitoring l External collections n Personal information spaces probably require special consideration l Naïve users n As opposed to intermediaries l Full text, general document collections

5 Search Goal Types ANSWER A QUESTION n How old is Bob Dole? n Who wrote Dante’s Inferno? FIND A PARTICULAR DOCUMENT n Dan Rose’s Home Page n IRS Form 1040 Schedule D MAKE AN INFORMED OPINION / ASSESS TIMELY SITUATION n What are the tradeoffs and side effects of this treatment? n Should I wait for the new CD technology? n How will Apple’s new CEO affect sales next quarter? GET BACKGROUND INFORMATION n How to plant annual bulbs in Northern California n What aspects of 3D compact disk technology are patented?

6 What is the Goal of the Search? Different goal types require different collections and different search techniques and different retrieval result display strategies E.g., a question should receive an answer rather than a document Focus of this work: n General, ill-defined queries n General collections n Naïve, or inexperienced, users

7 Problems with Short Queries TOO MANY DOCUMENTS n If only a few words supplied, there is little basis upon which to decide how to order the documents n The fewer words there are, the less they serve to mutually disambiguate one another

8 Why Short Queries? THE USERS DON’T KNOW n What they want n How to express what they want n How what they want is expressed in the collection LONG QUERIES CAN BACKFIRE n If ANDing terms, get empty results n If ORing terms, get matches on useless subsets of terms n If using Similarity Search, can’t specify important terms n Some search engines can’t handle long queries

9 Balancing Text and Graphics THE CHALLENGE: HOW TO COMBINE GRAPHICAL AND TEXTUAL REPRESENTATIONS USEFULLY? g 1 (t) R1R1 Graphics and animation are very useful for summarizing complex data However, text content is difficult to graph

10 “Fixing” Short Queries: Help Users Understand Result Sets TWO APPROACHES, FROM TWO DIRECTIONS Context of Query Terms (within documents) Inter-document Context Show info about many docs simultaneously

11 Showing Context of Query Terms l Existing Approaches: n Lists of titles + ranks n This augmented with other meta-information n This augmented with how often each search term occurred n Graphical display of which subset of query terms occurred

12 Brief Summaries

13 List Query Terms

14 Idea: Show Which Terms Occur How Often A B C D l Problem: Which words did what? l Solution: One symbol per term Term B was most frequent, followed by Term A and Term D. Term C did not appear.

15 Represent Document Structure l Recognize the structure of the document l Represent this structure graphically l Simultaneously display representation of query term frequencies and doc structure l Term distribution becomes explicit l Many docs’ info can be seen simultaneously A B C D A B C D 12345

16 Add Structure to the Query Problem: Only room for a few terms Solution: Structure the Query n A list of Topics »Can be category labels, lists of synonyms, etc. n Translated into Conjunctive Normal Form »User doesn’t need to know this »No need for special syntax »Allows for a variety of ranking algorithms n Creates a feedback loop between query structure and graphical representation

17 Graphical Landscapes Problems: n No Titles! n No Frequencies! n No Document Lengths! n No Nearness/Overlap! n Each document classified only one way! A C B D BB DD CC AA EE

18 “Fixing” Short Queries Other Techniques l Term Expansion l Relevance Feedback l Category Information

19 Short Queries: Imprecisely Understood Goals A tack to be pursued in future: Identify the goal type. Then n Suggest a relevant collection n Suggest a search strategy n Suggest links to sources of expertise n Create information sources tailored to the goal type A more general, but less powerful tack: Provide useful descriptions of the space of retrieved information

20 Dealing with Short Queries l Using Text Analysis to find Context l Finding a Useful Mix of Text and Graphics n TileBars Query-document context Shows structure of document and query Compact: many docs compared at once n Scatter/Gather Clustering Inter-document Context Shows summary information textually Uses state/animation for relationships among clusters l Add simple structure to the Query Format l Future work: incorpore into Workspace / SenseMaking environment (e.g., Information Visualizer, Card et al.)

21 Background: A Brief History of IR l Card Catalogs: Boolean Search on title words and subject codes l Abstracts and Newswire: Vector Space Model, Probabilistic Ranking, and “Soft” Boolean l Full Text (NIST TREC): Vector Space and Probabilistic Methods on very long queries l WWW: Boolean+ Search on Short Queries

22 Naïve Users Write Short Queries l 88% of queries on the THOMAS system (Congressional bills) used <= 3 words (Croft et al. 95) l Average query length is 7 words on the MEAD news system (Lu and Keefer 94) l Most systems perform poorly on short queries on full-text collections (compared to long queries) (Jing and Croft 94, Voorhees 94)

23 The Vector Space Model l Represent each document as a term vector n If term does not appear in doc, value = 0 n Otherwise, record frequency or weight of term l Represent the query as a similar term vector l Compare the query vector to every document vector n Usually some variation on the inner product n Various strategies for different aspects of normalization Probabilistic models: approximately the same idea, but try to predict the relevance of a document given a query

24 Conclusions In a general search situation: l We can organize large retrieval results collections for user viewing and manipulation (Scatter/Gather) l We can show, compactly and informatively, the patterns of distributions of query terms in retrieved documents (TileBars) l We need still more powerful ways to reveal context and structure of retrieval results l Future: get a better understanding of the user goals in order to build better interfaces

25 Term Overlap l Problem: Several query terms appear… l … but have nothing to do with one another. Out, damned spot! … … Throw physics to the dogs, I’ll none of it. … He has kill’d me, Mother. Run away, I pray you !


Download ppt "What Happens After the Search? User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC."

Similar presentations


Ads by Google