SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.

Slides:



Advertisements
Similar presentations
Recuperação de Informação B Cap. 10: User Interfaces and Visualization 10.1,10.2,10.3 November 17, 1999.
Advertisements

Recuperação de Informação B Cap. 10: User Interfaces and Visualization , , , November 29, 1999.
Chapter 11 Designing the User Interface
ORGANIZING THE CONTENT Physical Structure
User Interface Design Yonsei University 2 nd Semester, 2013 Sanghyun Park.
10/4/01 IS202: Information Organization & Retrieval Interfaces for Information Retrieval Ray Larson & Warren Sack IS202: Information Organization and Retrieval.
Jane Reid, AMSc IRIC, QMUL, 13/11/01 1 IR interfaces Purpose: to support users in information-seeking tasks Issues: –Functionality –Usability Motivations.
Information Retrieval Visualization CPSC 533c Class Presentation Qixing Zheng March 22, 2004.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Interfaces for Retrieval Results. Information Retrieval Activities Selecting a collection –Talked about last class –Lists, overviews, wizards, automatic.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
More Interfaces for Retrieval. Information Retrieval Activities Selecting a collection –Lists, overviews, wizards, automatic selection Submitting a request.
9/18/2001Information Organization and Retrieval Vector Representation, Term Weights and Clustering (continued) Ray Larson & Warren Sack University of California,
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.
Interfaces for Selecting and Understanding Collections.
INFM 700: Session 7 Unstructured Information (Part II) Jimmy Lin The iSchool University of Maryland Monday, March 10, 2008 This work is licensed under.
Lecture 7 Date: 23rd February
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
SIMS 296a-3: UI Background Marti Hearst Fall ‘98.
A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Marti Hearst SIMS 247 SIMS 247 Lecture 20 Visualizing Text & Text Collections (cont.) April 2, 1998.
WMES3103: INFORMATION RETRIEVAL WEEK 10 : USER INTERFACES AND VISUALIZATION.
Interfaces for Querying Collections. Information Retrieval Activities Selecting a collection –Lists, overviews, wizards, automatic selection Submitting.
User interface design Designing effective interfaces for software systems Objectives To suggest some general design principles for user interface design.
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
ISP 433/633 Week 12 User Interface in IR. Why care about User Interface in IR Human Search using IR depends on –Search in IR and search in human memory.
SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and.
Chapter 13: Designing the User Interface
Ch 6 - Menu-Based and Form Fill-In Interactions Yonglei Tao School of Computing & Info Systems GVSU.
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
1 Adapting the TileBar Interface for Visualizing Resource Usage Session 602 Adapting the TileBar Interface for Visualizing Resource Usage Session 602 Larry.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
Using Metadata in Search Prof. Marti Hearst SIMS 202, Lecture 27.
Session 1 SESSION 1 Working with Dreamweaver 8.0.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
Chapter 6: NavigationCopyright © 2004 by Prentice Hall 6. Navigation Design Site-level navigation: making it easy for the user to get around the site Page-level.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Software Architecture
Document Collections cs5984: Information Visualization Chris North.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
Recuperação de Informação B Cap. 10: User Interfaces and Visualization , , 10.9 November 29, 1999.
Copyright © 2005, Pearson Education, Inc. Slides from resources for: Designing the User Interface 4th Edition by Ben Shneiderman & Catherine Plaisant Slides.
Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:
Interaction LBSC 734 Module 4 Doug Oard. Agenda Where interaction fits Query formulation Selection part 1: Snippets  Selection part 2: Result sets Examination.
CMPS 435 F08 These slides are designed to accompany Web Engineering: A Practitioner’s Approach (McGraw-Hill 2008) by Roger Pressman and David Lowe, copyright.
CS3041 – Final week Today: Searching and Visualization Friday: Software tools –Study guide distributed (in class only) Monday: Social Imps –Study guide.
CS276A Text Information Retrieval, Mining, and Exploitation Lecture 9 5 Nov 2002.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Support for the dynamic process - history mechanisms Vijayshankar Raman.
Search and Retrieval: Query Languages Prof. Marti Hearst SIMS 202, Lecture 19.
User Interfaces for Information Access Prof. Marti Hearst SIMS 202, Lecture 26.
What Happens After the Search? User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
SIMS 202, Marti Hearst Content Analysis Prof. Marti Hearst SIMS 202, Lecture 15.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
SIMS 202 Information Organization and Retrieval
Visualization of Web Search Results in 3D
Visualizing Document Collections
Document Clustering Matt Hughes.
Introduction to Information Retrieval
Planning and Storyboarding a Web Site
CHAPTER 7: Information Visualization
Presentation transcript:

SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Last Time l Starting Points for Search –Lists –Overviews »Categories

Today and Next Time l Starting points (cont) –Clusters –Examples as starting points –Automated Source Selection l UIs for Query Specification l UIs for Putting Results in Context l UIs to support the Search Process

Starting Points for Search l Faced with a prompt or an empty entry form … how to start? –Lists of sources –Overviews »Clusters »Category Hierarchies/Subject Codes »Co-citation links –Examples, Wizards, and Guided Tours –Automatic source selection

Category Combinations l HiBrowse Problem: –Search is not integrated with browsing of categories –Only see the subset of categories selected (and the corresponding number of documents)

Cat-a-Cone: Multiple Simultaneous Categories l Key Ideas: –Separate documents from category labels –Show both simultaneously l Link the two for iterative feedback l Distinguish between: –Searching for Documents vs. –Searching for Categories

Cat-a-Cone Interface

Cat-a-Cone l Catacomb: (definition 2b, online Websters) “A complex set of interrelated things” l Makes use of earlier PARC work on 3D+animation: Rooms Henderson and Card 86 IV: Cone Tree Robertson, Card, Mackinlay 93 Web Book Card, Robertson, York 96

Category Hierarchy browse

search Category Hierarchy

Collection Retrieved Documents search Category Hierarchy query terms

Collection Retrieved Documents search Category Hierarchy browse query terms

Collection Retrieved Documents search Category Hierarchy browse query terms

ConeTree for Category Labels l Browse/explore category hierarchy –by search on label names –by growing/shrinking subtrees –by spinning subtrees l Affordances –learn meaning via ancestors, siblings –disambiguate meanings –all cats simultaneously viewable

Virtual Book for Result Sets –Categories on Page (Retrieved Document) linked to Categories in Tree –Flipping through Book Pages causes some Subtrees to Expand and Contract –Most Subtrees remain unchanged –Book can be Stored for later Re-Use

Improvements over Standard Category Interfaces Integrate category selection with viewing of categories Integrate category selection with viewing of categories Show all categories + context Show all categories + context Show relationship of retrieved documents to the category structure Show relationship of retrieved documents to the category structure

Text Clustering l Finds overall similarities among groups of documents l Finds overall similarities among groups of tokens l Picks out some themes, ignores others

S/G Example: query on “star” Encyclopedia text 14 sports 8 symbols47 film, tv 68 film, tv (p) 7 music 97 astrophysics 67 astronomy(p)12 steller phenomena 10 flora/fauna 49 galaxies, stars 29 constellations 7 miscelleneous Clustering and re-clustering is entirely automated

Using Clustering in Document Ranking l Cluster entire collection l Find cluster centroid that best matches the query l This has been explored extensively –it is expensive –it doesn’t work well

Two Queries: Two Clusterings AUTO, CAR, ELECTRICAUTO, CAR, SAFETY The main differences are the clusters that are central to the query 8 control drive accident … 25 battery california technology … 48 import j. rate honda toyota … 16 export international unit japan 3 service employee automatic … 6 control inventory integrate … 10 investigation washington … 12 study fuel death bag air … 61 sale domestic truck import … 11 japan export defect unite …

Another use of clustering l Use clustering to map the entire huge multidimensional document space into a huge number of small clusters. l “Project” these onto a 2D graphical representation –Group by doc: SPIRE/Kohonen maps –Group by words: Galaxy of News/HotSauce/Semio

Clustering Multi-Dimensional Document Space (image from Wise et al 95)

Kohonen Feature Maps on Text (from Chen et al., JASIS 49(7))

UWMS Data Mining Workshop Study of Kohonen Feature Maps l H. Chen, A. Houston, R. Sewell, and B. Schatz, JASIS 49(7) l Comparison: Kohonen Map and Yahoo l Task: –“Window shop” for interesting home page –Repeat with other interface l Results: –Starting with map could repeat in Yahoo (8/11) –Starting with Yahoo unable to repeat in map (2/14)

UWMS Data Mining Workshop Study (cont.) l Participants liked: –Correspondence of region size to # documents –Overview (but also wanted zoom) –Ease of jumping from one topic to another –Multiple routes to topics –Use of category and subcategory labels

UWMS Data Mining Workshop Study (cont.) l Participants wanted: –hierarchical organization –other ordering of concepts (alphabetical) –integration of browsing and search –corresponce of color to meaning –more meaningful labels –labels at same level of abstraction –fit more labels in the given space –combined keyword and category search –multiple category assignment (sports+entertain)

Visualization of Clusters –Huge 2D maps may be inappropriate focus for information retrieval »Can’t see what documents are about »Documents forced into one position in semantic space »Space is difficult to use for IR purposes »Hard to view titles –Perhaps more suited for pattern discovery »problem: often only one view on the space

Summary: Clustering l Advantages: –Get an overview of main themes –Domain independent l Disadvantages: –Many of the ways documents could group together are not shown –Not always easy to understand what they mean –Different levels of granularity

Automated Source Selection l Compare the query against summaries of what is contained in the collection –GLOSS (Tomasic et al. 97) »Predict which of several sources is most likely »Based on how many instances of each query term occurs in the collection –SavvySearch (Howe & Dreilinger 97, in reader) »Predict which of several search engines is likely to produce a good answer to a given query »Based on number of pages returned and amount of time users spend on retrieved pages

Query Specification

l Interaction Styles (Shneiderman 97) –Command Language –Form Fillin –Menu Selection –Direct Manipulation –Natural Language l Example: –How do each apply to Boolean Queries

Command-Based Query Specification command attribute value connector … –find pa shneiderman and tw user# l What are the attribute names? l What are the command names? l What are allowable values?

Form-Based Query Specification (Altavista)

Form-Based Query Specification (Melvyl)

Form-based Query Specification (Infoseek)

Direct Manipulation Spec. VQUERY (Jones 98)

Menu-based Query Specification (Young & Shneiderman 93)

Context

Putting Results in Context l Visualizations of Query Term Distribution –KWIC, TileBars, SeeSoft l Visualizing Shared Subsets of Query Terms –InfoCrystal, VIBE, Lattice Views l Table of Contents as Context –Superbook, Cha-Cha, DynaCat l Organizing Results with Tables –Envision, SenseMaker l Using Hyperlinks –WebCutter

Putting Results in Context l Interfaces should –give hints about the roles terms play in the collection –give hints about what will happen if various terms are combined –show explicitly why documents are retrieved in response to the query –summarize compactly the subset of interest

KWIC (Keyword in Context) l An old standard, ignored by internet search engines –used in some intranet engines, e.g., Cha-Cha

Display of Retrieval Results Goal: minimize time/effort for deciding which documents to examine in detail Idea: show the roles of the query terms in the retrieved documents, making use of document structure

TileBars v Graphical Representation of Term Distribution and Overlap v Simultaneously Indicate: –relative document length –query term frequencies –query term distributions –query term overlap

Query terms: What roles do they play in retrieved documents? DBMS (Database Systems) Reliability Mainly about both DBMS & reliability Mainly about DBMS, discusses reliability Mainly about, say, banking, with a subtopic discussion on DBMS/Reliability Mainly about high-tech layoffs Example

Exploiting Visual Properties –Variation in gray scale saturation imposes a universal, perceptual order (Bertin et al. ‘83) –Varying shades of gray show varying quantities better than color (Tufte ‘83) –Differences in shading should align with the values being presented (Kosslyn et al. ‘83)

Key Aspect: Faceted Queries l Conjunct of disjuncts l Each disjunct is a concept –osteoporosis, bone loss –prevention, cure –research, Mayo clinic, study l User does not have to specify which are main topics, which are subtopics l Ranking algorithm gives higher weight to overlap of topics

Main Topic Context l Potential Problem with TileBars Given retrieved documents in which no query terms are well-distributed, The user does not know the context in which the query terms are used l Solution: Accompany with main topic display

TileBars Summary l Compact, graphical representation of term distribution for full text retrieval results –simultaneously display term frequency, distribution, overlap, and doc length –allow for simple user-determined ordering strategies l Part of a larger effort: user-centric, content-sensitive information access

TileBars Summary v Preliminary User Studies vusers understand them vfind them helpful in some situations vsometimes terms need to be disambiguated

SeeSoft: Showing Text Content using a linear representation and brushing and linking (Eick & Wills 95)

Query Term Subsets Show which subsets of query terms occur in which subsets of documents occurs in which subsets of retrieved documents

Other Approaches Show how often each query term occurs in retrieved documents –VIBE (Korfhage ‘91) –InfoCrystal (Spoerri ‘94) –Problems: »can’t see overlap of terms within docs »quantities not represented graphically »more than 4 terms hard to handle »no help in selecting terms to begin with

InfoCrystal (Spoerri 94)

VIBE (Olson et al. 93, Korfhage 93)

Superbook (Remde et al. 87) l Next-generation hyper-media book l Functions: –Word Lookup: »Show a list query words, stems, and word combinations –Table of Contents: Dynamic fisheye view of the hierarchical topics list »Search words can be highlighted here too –Page of Text: show selected page and highlighted search terms l Hypertext features linking through search words rather than page links

Superbook (

DynaCat (Pratt 97) l Decide on important question types in an advance –What are the adverse effects of drug D? –What is the prognosis for treatment T? l Make use of MeSH categories l Retain only those types of categories known to be useful for this type of query.

DynaCat (Pratt, Hearst, & Fagan 99)

DynaCat Study l Design –Three queries –24 cancer patients –Compared three interfaces »ranked list, clusters, categories l Results –Participants strongly preferred categories –Participants found more answers using categories –Participants took same amount of time with all three interfaces

Cha-Cha (Chen & Hearst 98) l Shows “table-of-contents”-like view, like Superbook l Takes advantage of human-created structure within hyperlinks to create the TOC

Supporting the Process l Interfaces to support the process of information seeking –Standard Model »Infogrid »Superbook –Berry Picking Model »SketchTrieve »DLITE –Retaining Search History

How to Present the Search Process? l What sequence of operations is allowed? l Which GUI layout style is used? –One window –Overlapping windows –Tiled windows –Monolithic layout »One big window containing specialized internal windows that always occupy the same position and function

Slide by Shankar Raman l A general search interface architecture –Itemstash -- retrieved docs –Search Event -- current query –History -- history of queries –Result Item -- view selected docs + metadata InfoGrid/Protofoil (Rao et al. 92)

Infogrid (design mockup) (Rao et al. 92)

Infogrid Design Mockups (Rao et al. 92)

Protofoil (Rao et al. 94)

Monolithic Layouts Protofoil Layout (Hypothetical)Superbook Layout

l Experimented with many variations of the layout and interaction sequence. –Several studies have shown that too many different options are worse than an interface that is too restrictive. l Considered different screen sizes –Monolithic layout favored, however... –Sequence of interactions is what matters –Smaller screen can force designers to consider the interaction sequence carefully SuperBook (Egan et al. 89)

Supporting the Information Seeking Process Two recent similar approaches that focus on supporting the process –SketchTrieve (Hendry & Harper 97) –DLITE (Cousins 97)

Informal Interface l Informal does not necessarily mean less useful l Show how the search is –unfolding or evolving –expanding or contracting l Prompt the user to –reformulate and abandon plans –backtrack to points of task deferral –make side-by-side comparisons –define and discuss problems

Slide by Shankar Raman DLITE l UI to a digital library l Direct manipulation interface to a distributed info. system – must show network, remote server status l Workcenter approach –lots of handy tools for one task –experts create workcenters –contents persistent –concurrently shareable across sites l Web browser used to display document or collection metadata

Slide by Shankar Raman DLITE (Cousins 97) l Drag and Drop interface l Reify queries, sources, retrieval results l Animation to keep track of activity

Slide by Shankar Raman Components/tools in DLITE l Documents (search results, or local documents) l Collections of components (e.g. result sets) l Queries -- translator used to apply same query to many sources l Services -- search services, summarization, OCR, translation … l People (for access control, payment …)

Slide by Shankar Raman Interaction Pointing at object brings up tooltip -- metadata Activating object -- component specific action –5 types for result set component Drag-and-drop data onto program Animation used to show what happens with drag-and-drop (e.g. “waggling”)

Slide by Shankar Raman Comments l Users seem to have lots of problem with flexibility (result set icon activation) l Workcenter -- customization, acts as reminder l Animation used to track progess, (partial) results

Keeping Track of History l Examples –List of prior queries and results (standard) –Graphical hierarchy for web browsing –“Slide sorter” view, snapshots of earlier interactions

Slide by Shankar Raman PadPrints (Hightower et al. 98) l Tree-based history of recently visited web-pages history map placed to left of browser window l Zoomable, can shrink sub-hierarchies] l Node = title + thumbnail

PadPrints (Hightower et al. 98)

Slide by Shankar Raman l 13.4% unable to find recently visited pages only 0.1% use History button, 42% use Back l problems with history list (according to authors) –incomplete, lose out on every branch –textual (not necessarily a problem! ) –pull down menu cumbersome -- cannot see history along with current document Initial User Study of PadPrints

Slide by Shankar Raman Second User Study of Padprints l Changed the task to involve revisiting web pages –CHI database, National Park Service website l Only correctly answered questions considered –20-30% fewer pages accessed –faster response time for tasks that involve revisiting pages –slightly better user satisfaction ratings

Summary: UIs for Information Access l The part of the system that the user sees and interacts with l Better interfaces in future should produce better search experiences l UIs for search should –Help users keep track of what they have done –Suggest next choices –Support the process of search l It is very difficult to design good UIs l It is very difficult to evaluate search UIs