SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall SIMS 202: Information Organization and Retrieval
SLIDE 2IS 202 – FALL 2002 Lecture Overview Review and Continuation –Introduction to HCI –Why Interfaces Don’t Work –Early Visions: Memex Interfaces for Information Retrieval II –Collection Selection –Query Specification –Query Results –Query Reformulation Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
SLIDE 3IS 202 – FALL 2002 Lecture Overview Review and Continuation –Introduction to HCI –Why Interfaces Don’t Work –Early Visions: Memex Interfaces for Information Retrieval II –Collection Selection –Query Specification –Query Results –Query Reformulation Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
SLIDE 4 Human-Computer Interaction (HCI) Human –The end-users of a program –The others in the organization Computer –The machines the programs run on Interaction –The users tell the computers what they want –The computers communicate results
SLIDE 5IS 202 – FALL 2002 Shneiderman on HCI Well-designed interactive computer systems –Promote Positive feelings of success Competence Mastery –Allow users to concentrate on their work, exploration, or pleasure, rather than on the system or the interface
SLIDE 6IS 202 – FALL 2002 Shneiderman’s Design Principles Provide informative feedback Permit easy reversal of actions Support an internal locus of control Reduce working memory load Provide alternative interfaces for expert and novice users
SLIDE 7 How to Design and Build UIs Task analysis Rapid prototyping Evaluation Implementation Design Prototype Evaluate Iterate at every stage!
SLIDE 8IS 202 – FALL 2002 Information Visualization Utility –Inherently visual data –Making the abstract concrete –Making the invisible visible Techniques –Icons –Color highlighting –Brushing and linking –Panning and zooming –Focus-plus-context –Magic lenses –Animation
SLIDE 9IS 202 – FALL 2002 Lecture Overview Review and Continuation –Introduction to HCI –Why Interfaces Don’t Work –Early Visions: Memex Interfaces for Information Retrieval II –Collection Selection –Query Specification –Query Results –Query Reformulation Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
SLIDE 10IS 202 – FALL 2002 Why Interfaces Don’t Work Because… –We still think of using the interface –We still talk of designing the interface –We still talk of improving the interface “We need to aid the task, not the interface to the task.” “The computer of the future should be invisible.”
SLIDE 11IS 202 – FALL 2002 Norman on Design Priorities 1.The user—what does the person really need to have accomplished? 2.The task—analyze the task. How best can the job be done?, taking into account the whole setting in which it is embedded, including the other tasks to be accomplished, the social setting, the people, and the organization. 3.As much as possible, make the task dominate; make the tools invisible. 4.Then, get the interaction right, making things the right things visible, exploiting affordances and constraints, providing the proper mental models, and so on—the rules of good design for the user, written about many, many times in many, many places.
SLIDE 12IS 202 – FALL 2002 Lecture Overview Review and Continuation –Introduction to HCI –Why Interfaces Don’t Work –Early Visions: Memex Interfaces for Information Retrieval II –Collection Selection –Query Specification –Query Results –Query Reformulation Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
SLIDE 13IS 202 – FALL 2002 “What Dr. Bush Foresees” Cyclops Camera Worn on forehead, it would photograph anything you see and want to record. Film would be developed at once by dry photography. Microfilm It could reduce Encyclopaedia Britannica to volume of a matchbox. Material cost: 5¢. Thus a whole library could be kept in a desk. Vocoder A machine which could type when talked to. But you might have to talk a special phonetic language to this mechanical supersecretary. Thinking machine A development of the mathematical calculator. Give it premises and it would pass out conclusions, all in accordance with logic. Memex An aid to memory. Like the brain, Memex would file material by association. Press a key and it would run through a “trail” of facts.
SLIDE 14IS 202 – FALL 2002 Memex
SLIDE 15IS 202 – FALL 2002 Memex Detail
SLIDE 16IS 202 – FALL 2002 Cyclops Camera
SLIDE 17IS 202 – FALL 2002 Vocoder: “Supersecretary”
SLIDE 18IS 202 – FALL 2002 Investigator at Work “One can now picture a future investigator in his laboratory. His hands are free, and he is not anchored. As he moves about and observes, he photographs and comments. Time is automatically recorded to tie the two records together. If he goes into the field, he may be connected by radio to his recorder. As he ponders over his notes in the evening, he again talks his comments into the record. His typed record, as well as his photographs, may be both in miniature, so that he projects them for examination.”
SLIDE 19IS 202 – FALL 2002 Memex “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.”
SLIDE 20IS 202 – FALL 2002 Associative Indexing “[…] associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another. This is the essential feature of memex. The process of tying two items together is the important thing.”
SLIDE 21IS 202 – FALL 2002 The WWW circa 1945 “It is exactly as though the physical items had been gathered together from widely separated sources and bound together to form a new book. But it is more than this; for any item can be joined into numerous trails, the trails can bifurcate, and they can give birth to side trails.” “Wholly new forms of encyclopaedias will appear, ready-made with a mesh of associative trails running them, ready to be dropped into the memex and there amplified.”
SLIDE 22IS 202 – FALL 2002 Selection “The heart of the problem, and of the personal machine we have here considered, is the task of selection. And here, in spite of great progress, we are still lame. Selection, in the broad sense, is still a stone adze in the hands of a cabinetmaker.” —“Memex Revisited” (Bush 1965)
SLIDE 23IS 202 – FALL 2002 Interaction Paradigms for IR Direct manipulation –Query specification –Query refinement –Result selection Delegation –Agents –Recommender systems –Filtering
SLIDE 24IS 202 – FALL 2002 The “Adaptive” Memex “In an adaptive Memex, the owner has delegated to the machine the ability to propose or effect changes in the stored information. By analogy to business practice, the Memex is said to be functioning as an agent (Kay, 1984). The machine is playing an autonomous role within a restricted charter: to attempt a more effective organization of the information based on observations of actual use and topical similarities.”
SLIDE 25IS 202 – FALL 2002 Lecture Overview Review and Continuation –Introduction to HCI –Why Interfaces Don’t Work –Early Visions: Memex Interfaces for Information Retrieval II –Collection Selection –Query Specification –Query Results –Query Reformulation Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
SLIDE 26IS 202 – FALL 2002 Task = Information Access The standard interaction model for information access 1)Start with an information need 2)Select a system and collections to search on 3)Formulate a query 4)Send the query to the system 5)Receive the results 6)Scan, evaluate, and interpret the results 7)Stop, or 8)Reformulate the query and go to Step 4
SLIDE 27IS 202 – FALL 2002 HCI Questions for IR Where does a user start? –Faced with a large set of collections, how can a user choose one to begin with? How will a user formulate a query? How will a user scan, evaluate, and interpret the results? How can a user reformulate a query?
SLIDE 28IS 202 – FALL 2002 HCI for IR: Collection Selection Question 1: Where does the user start?
SLIDE 29IS 202 – FALL 2002 Starting Points for Search Faced with a prompt or an empty entry form … how to start? –Lists of sources –Overviews Clusters Category Hierarchies/Subject Codes Co-citation links –Examples, Wizards, and Guided Tours –Automatic source selection
SLIDE 30IS 202 – FALL 2002 List of Sources Have to guess based on the name Requires prior exposure/experience
SLIDE 31IS 202 – FALL 2002 Old Lexis-Nexis Interface
SLIDE 32IS 202 – FALL 2002 Overviews Supervised (manual) category overviews –Yahoo! –HiBrowse –MeSHBrowse Unsupervised (automated) groupings –Clustering –Kohonen feature maps
SLIDE 33IS 202 – FALL 2002 Yahoo! Interface
SLIDE 34IS 202 – FALL 2002 Example: MeSH and MedLine MeSH category hierarchy –Medical Subject Headings –~18,000 labels –Manually assigned –~8 labels/article on average –Average depth: 4.5 –Max depth: 9 Top level categories: anatomydiagnosisrelated disc animalspsychtechnology diseasebiologyhumanities drugsphysics
SLIDE 35IS 202 – FALL 2002 MeshBrowse (Korn & Shneiderman 95)
SLIDE 36IS 202 – FALL 2002 HiBrowse (Pollitt 97)
SLIDE 37IS 202 – FALL 2002 Summary: Category Labels Advantages: –Interpretable –Capture summary information –Describe multiple facets of content –Domain dependent, and so descriptive Disadvantages –Do not scale well (for organizing documents) –Domain dependent, so costly to acquire –May mis-match users’ interests
SLIDE 38IS 202 – FALL 2002 Text Clustering What clustering does: –Finds overall similarities among groups of documents –Finds overall similarities among groups of tokens –Picks out some themes, ignores others How clustering works: –Cluster entire collection –Find cluster centroid that best matches the query –Problems with clustering It is expensive It doesn’t work well
SLIDE 39IS 202 – FALL 2002 Scatter/Gather Cutting, Pedersen, Tukey & Karger 92, 93, Hearst & Pedersen 95 How it works –Cluster sets of documents into general “themes”, like a table of contents –Display the contents of the clusters by showing topical terms and typical titles –User chooses subsets of the clusters and re-clusters the documents within –Resulting new groups have different “themes” Originally used to give collection overview Evidence suggests more appropriate for displaying retrieval results in context
SLIDE 40IS 202 – FALL 2002 S/G Example: Query on “star” Encyclopedia text 14 sports 8 symbols47 film, tv 68 film, tv (p) 7 music 97 astrophysics 67 astronomy(p)12 stellar phenomena 10 flora/fauna 49 galaxies, stars 29 constellations 7 miscellaneous Clustering and re-clustering is entirely automated
SLIDE 41IS 202 – FALL 2002 Scatter/Gather Interface
SLIDE 42IS 202 – FALL 2002 Another Use of Clustering Use clustering to map the entire huge multidimensional document space into a number of small clusters “Project” these onto a 2D graphical representation –Group by doc: SPIRE, Kohonen maps –Group by words: Galaxy of News, HotSauce, Semio
SLIDE 43IS 202 – FALL 2002 “ThemeScapes” Clustering
SLIDE 44IS 202 – FALL 2002 Kohonen Feature Maps on Text
SLIDE 45 Study of Kohonen Feature Maps H. Chen, A. Houston, R. Sewell, and B. Schatz, JASIS 49(7) Comparison: Kohonen Map and Yahoo Task: –“Window shop” for interesting home page –Repeat with other interface Results: –Starting with map could repeat in Yahoo (8/11) –Starting with Yahoo unable to repeat in map (2/14)
SLIDE 46 What Study Participants Liked Correspondence of region size to number of documents Overview (but also wanted zoom) Ease of jumping from one topic to another Multiple routes to topics Use of category and subcategory labels
SLIDE 47 What Study Participants Wanted Hierarchical organization Other ordering of concepts (alphabetical) Integration of browsing and search Correspondence of color to meaning More meaningful labels Labels at same level of abstraction Fit more labels in the given space Combined keyword and category search Multiple category assignment (sports+entertain)
SLIDE 48IS 202 – FALL 2002 Summary: Clustering Advantages: –Get an overview of main themes –Domain independent Disadvantages: –Many of the ways documents could group together are not shown –Not always easy to understand what they mean –Can’t see what documents are about –Documents forced into one position in semantic space –Hard to view titles Perhaps more suited for pattern discovery –Problem: often only one view on the space
SLIDE 49IS 202 – FALL 2002 Lecture Overview Review and Continuation –Introduction to HCI –Why Interfaces Don’t Work –Early Visions: Memex Interfaces for Information Retrieval II –Collection Selection –Query Specification –Query Results –Query Reformulation Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
SLIDE 50IS 202 – FALL 2002 HCI for IR: Query Specification Question 2: How will a user specify a query?
SLIDE 51IS 202 – FALL 2002 Query Specification Interaction styles (Shneiderman 97) –Command language –Form fill –Menu selection –Direct manipulation –Natural language What about gesture, eye-tracking, or implicit inputs like reading habits?
SLIDE 52IS 202 – FALL 2002 Command-Based Query Specification COMMAND ATTRIBUTE value CONNECTOR … –FIND PA shneiderman AND TW interface What are the ATTRIBUTE names? What are the COMMAND names? What are allowable values?
SLIDE 53IS 202 – FALL 2002 Form-Based Query Specification
SLIDE 54IS 202 – FALL 2002 Form-Based Query Specification
SLIDE 55IS 202 – FALL 2002 Direct Manipulation Query Specification
SLIDE 56IS 202 – FALL 2002 Menu-Based Query Specification
SLIDE 57IS 202 – FALL 2002 Natural Language Query AskJeeves –
SLIDE 58IS 202 – FALL 2002 Lecture Overview Review and Continuation –Introduction to HCI –Why Interfaces Don’t Work –Early Visions: Memex Interfaces for Information Retrieval II –Collection Selection –Query Specification –Query Results –Query Reformulation Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
SLIDE 59IS 202 – FALL 2002 HCI for IR: Viewing Results Question 3: How will a user scan, evaluate, and interpret the results?
SLIDE 60IS 202 – FALL 2002 Display of Retrieval Results Goal: –Minimize time/effort for deciding which documents to examine in detail Idea: –Show the roles of the query terms in the retrieved documents, making use of document structure
SLIDE 61IS 202 – FALL 2002 Putting Results in Context Interfaces should –Give hints about the roles terms play in the collection –Give hints about what will happen if various terms are combined –Show explicitly why documents are retrieved in response to the query –Summarize compactly the subset of interest
SLIDE 62IS 202 – FALL 2002 Putting Results in Context Visualizations of query term distribution –KWIC, TileBars, SeeSoft, Virtual Shakespeare Visualizing shared subsets of query terms –InfoCrystal, VIBE Table of contents as context –SuperBook, Cha-Cha
SLIDE 63IS 202 – FALL 2002 KWIC (Keyword in Context)
SLIDE 64IS 202 – FALL 2002 TileBars Graphical representation of term distribution and overlap Simultaneously indicate: –Relative document length –Query term frequencies –Query term distributions –Query term overlap
SLIDE 65IS 202 – FALL 2002 TileBars Example Mainly about both DBMS & reliability Mainly about DBMS, discusses reliability Mainly about, say, banking, with a subtopic discussion on DBMS/Reliability Mainly about high-tech layoffs Query terms: What roles do they play in retrieved documents? DBMS (Database Systems) Reliability
SLIDE 66IS 202 – FALL 2002 TileBars Example
SLIDE 67IS 202 – FALL 2002 SeeSoft (Eick & Wills 95)
SLIDE 68IS 202 – FALL 2002 David Small: Virtual Shakespeare
SLIDE 69IS 202 – FALL 2002 Other Approaches Show how often each query term occurs in sets of retrieved documents –VIBE (Korfhage ‘91) –InfoCrystal (Spoerri ‘94)
SLIDE 70IS 202 – FALL 2002 VIBE (Olson et al. 93, Korfhage 93)
SLIDE 71IS 202 – FALL 2002 InfoCrystal (Spoerri 94)
SLIDE 72IS 202 – FALL 2002 Problems with InfoCrystal Can’t see proximity or frequency of terms within documents Quantities not represented graphically More than 4 terms hard to handle No help in selecting terms to begin with
SLIDE 73IS 202 – FALL 2002 Cha-Cha (Chen & Hearst 98) Shows “Table- Of-Contents”- like view, like SuperBook Focus+Context using hyperlinks to create the TOC Integrates Web Site structure navigation with search
SLIDE 74IS 202 – FALL 2002 Lecture Overview Review and Continuation –Introduction to HCI –Why Interfaces Don’t Work –Early Visions: Memex Interfaces for Information Retrieval II –Collection Selection –Query Specification –Query Results –Query Reformulation Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
SLIDE 75IS 202 – FALL 2002 HCI for IR: Query Reformulation Question 4: How can a user reformulate a query?
SLIDE 76IS 202 – FALL 2002 Query Reformulation Thesaurus expansion –Suggest terms similar to query terms Relevance feedback –Suggest terms (and documents) similar to retrieved documents that have been judged to be relevant –“More like this” interaction
SLIDE 77IS 202 – FALL 2002 Relevance Feedback Modify existing query based on relevance judgements –Extract terms from relevant documents and add them to the query –And/or re-weight the terms already in the query Two main approaches: –Automatic (pseudo-relevance feedback) –Users select relevant documents Users/system select terms from an automatically generated list
SLIDE 78IS 202 – FALL 2002 Relevance Feedback Interface
SLIDE 79IS 202 – FALL 2002 Revealing Internals Opaque (black box) –(Like web search engines) Transparent –(See used terms after Relevance Feedback ) Penetrable –(Choose suggested terms before Relevance Feedback ) Which do you think worked best?
SLIDE 80IS 202 – FALL 2002 Effectiveness Results Subjects using Relevance Feedback showed 17% - 34% better performance than without Relevance Feedback Subjects with penetration case did 15% better as a group than those in opaque and transparent cases
SLIDE 81IS 202 – FALL 2002 Summary: Relevance Feedback Iterative query modification can improve precision and recall for a standing query In at least one study, users were able to make good choices by seeing which terms were suggested for Relevance Feedback and selecting among them So … “more like this” can be useful!
SLIDE 82IS 202 – FALL 2002 Summary: HCI for IR Focus on the task, not the tool Be aware of –User abilities and differences –Prior work and innovations –Design guidelines and rules-of-thumb Iterate, iterate, iterate It is very difficult to design good UIs It is very difficult to evaluate search UIs Better interfaces in future should produce better IR experiences
SLIDE 83IS 202 – FALL 2002 Next Time Web Search Architecture and Crawling (Avi Rappoport, President of SearchTools.com)