Marti Hearst SIMS 247 SIMS 247 Lecture 20 Visualizing Text & Text Collections (cont.) April 2, 1998.

Slides:



Advertisements
Similar presentations
Interaction Design: Visio
Advertisements

ENV Envisioning Information Lecture 6 – Document Visualization Ken Brodlie
10/4/01 IS202: Information Organization & Retrieval Interfaces for Information Retrieval Ray Larson & Warren Sack IS202: Information Organization and Retrieval.
Jane Reid, AMSc IRIC, QMUL, 13/11/01 1 IR interfaces Purpose: to support users in information-seeking tasks Issues: –Functionality –Usability Motivations.
Information Retrieval Visualization CPSC 533c Class Presentation Qixing Zheng March 22, 2004.
IS 466 ADVANCED TOPICS IN INFORMATION SYSTEMS LECTURER : NOUF ALMUJALLY 20 – 11 – 2011 College Of Computer Science and Information, Information Systems.
Information Retrieval in Practice
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Interfaces for Retrieval Results. Information Retrieval Activities Selecting a collection –Talked about last class –Lists, overviews, wizards, automatic.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
© Anselm Spoerri Lecture 10 Visual Tools for Text Retrieval (cont.)
SIMS 213: User Interface Design & Development Marti Hearst Thurs, March 3, 2005.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
9/18/2001Information Organization and Retrieval Vector Representation, Term Weights and Clustering (continued) Ray Larson & Warren Sack University of California,
1 CS 430 / INFO 430 Information Retrieval Lecture 15 Usability 3.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, Feb 26, 2004.
Types of Data Displays Based on the 2008 AZ State Mathematics Standard.
INFM 700: Session 7 Unstructured Information (Part II) Jimmy Lin The iSchool University of Maryland Monday, March 10, 2008 This work is licensed under.
Organizing Data & Information
SIMS 296a-3: UI Background Marti Hearst Fall ‘98.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.
Information Retrieval
ISP 433/633 Week 12 User Interface in IR. Why care about User Interface in IR Human Search using IR depends on –Search in IR and search in human memory.
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
Overview of Search Engines
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
Dreamweaver MX Unit A CIS 205—Web Site Design & Development.
1 Adapting the TileBar Interface for Visualizing Resource Usage Session 602 Adapting the TileBar Interface for Visualizing Resource Usage Session 602 Larry.
Using Metadata in Search Prof. Marti Hearst SIMS 202, Lecture 27.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Information Systems & Databases 2.2) Organisation methods.
CSM06 Information Retrieval Lecture 6: Visualising the Results Set Dr Andrew Salway
Chapter 4c, Database H Definition H Structure H Parts H Types.
Attributed Visualization of Collaborative Workspaces Mao Lin Huang, Quang Vinh Nguyen and Tom Hintz Faculty of Information Technology University of Technology,
Document Collections cs5984: Information Visualization Chris North.
Recuperação de Informação B Cap. 10: User Interfaces and Visualization , , 10.9 November 29, 1999.
ITGS Databases.
Interaction LBSC 734 Module 4 Doug Oard. Agenda Where interaction fits Query formulation Selection part 1: Snippets  Selection part 2: Result sets Examination.
Building a Topic Map Repository Xia Lin Drexel University Philadelphia, PA Jian Qin Syracuse University Syracuse, NY * Presented at Knowledge Technologies.
Algorithmic Detection of Semantic Similarity WWW 2005.
Visualizing textual data CPSC A. Butt / Feb. 26 '09.
Information Retrieval
1 CS 430: Information Discovery Lecture 5 Ranking.
Search and Retrieval: Query Languages Prof. Marti Hearst SIMS 202, Lecture 19.
User Interfaces for Information Access Prof. Marti Hearst SIMS 202, Lecture 26.
What Happens After the Search? User Interface Ideas for Information Retrieval Results Marti A. Hearst Xerox PARC.
1 CS 430: Information Discovery Lecture 21 Interactive Retrieval.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
SIMS 202, Marti Hearst Content Analysis Prof. Marti Hearst SIMS 202, Lecture 15.
1 Midterm Examination. 2 General Observations Examination was too long! Most people submitted by .
Information Retrieval in Practice
Christopher Hirt Daniel Wells
Search Engine Architecture
Federated & Meta Search
Professor John Canny Spring 2003
Text Visualization Lecture 11
SIMS 202 Information Organization and Retrieval
Visualization of Web Search Results in 3D
Information Retrieval
Visualizing Document Collections
Data Mining Chapter 6 Search Engines
Document Clustering Matt Hughes.
Introduction to Information Retrieval
Planning and Storyboarding a Web Site
cs5984: Information Visualization Chris North
Presentation transcript:

Marti Hearst SIMS 247 SIMS 247 Lecture 20 Visualizing Text & Text Collections (cont.) April 2, 1998

Marti Hearst SIMS 247 Today Visualizing Collection Overviews (cont.)Visualizing Collection Overviews (cont.) Visualizing Query SpecificationsVisualizing Query Specifications –Selecting Term Subsets –Viewing Metadata Visualizing Retrieval ResultsVisualizing Retrieval Results –Show Hyperlink Structure (WebCutter) –Term Hit Distribution (TileBars, SeeSoft) –Group by Shared Metadata (Cat-a-Cone)

Marti Hearst SIMS 247 Showing Collection Overviews From Last time:From Last time: –Show documents as icons –Link together or place near one another according to: inter-document similarity hyperlink structure citation structure –Advantages can see large grouping patterns –Disadvantages what do the groups mean? documents usually belong in multiple groups groups are often somewhat arbitrary

Marti Hearst SIMS 247 Mapping Documents onto Landscapes (Chalmers 96)

Marti Hearst SIMS 247 Visualizing Query Term Specification Query term intersectionQuery term intersection –VIBE –Infocrystal Incremental Term SpecificationIncremental Term Specification –Lyberworld –WSJ online interface

Marti Hearst SIMS 247 Visualizing Query Term Intersection VIBEVIBE –establish points of interest (POI) on a 2D plane –these correspond to terms or concepts –position documents according to their intersections among POI

Marti Hearst SIMS 247 VIBE (Olsen et al. 93) A C B D

Marti Hearst SIMS 247 Visualizing Query Term Intersection InfoCrystalInfoCrystal –convert and extend Venn diagrams –show how many docs contain each subset of up to five terms

Marti Hearst SIMS 247 InfoCrystal (Spoerri 93) A C B D # of docs containg A # of docs containg B and D # of docs containg A, C, and B 34

Marti Hearst SIMS 247 Hyperlink Relevant Documents WebCutter (Maarek & Shaul 97) Show documents as icons or text labelsShow documents as icons or text labels Choose a starting point for searchChoose a starting point for search Find documents that are linked to starting point and are most relevant to queryFind documents that are linked to starting point and are most relevant to query Continue searching most promising linksContinue searching most promising links Show link structure graphicallyShow link structure graphically

Marti Hearst SIMS 247 WebCutter (Maarek & Shaul 97)

Marti Hearst SIMS 247 WebCutter (Maarek & Shaul 97)

Marti Hearst SIMS 247 TileBars: Viewing Retrieval Results Goal: minimize time/effort for deciding which documents to examine in detail Idea: show the roles of the query terms in the retrieved documents, making use of document structure

Marti Hearst SIMS 247 TileBars vGraphical Representation of Term Distribution and Overlap vSimultaneously Indicate: –relative document length –query term frequencies –query term distributions –query term overlap

Marti Hearst SIMS 247 Query terms: What roles do they play in retrieved documents? DBMS (Database Systems) Reliability Mainly about both DBMS & reliability Mainly about DBMS, discusses reliability Mainly about, say, banking, with a subtopic discussion on DBMS/Reliability Mainly about high-tech layoffs TileBars Example

Marti Hearst SIMS 247

Marti Hearst SIMS 247

Marti Hearst SIMS 247 Exploiting Visual Properties –Variation in gray scale saturation imposes a universal, perceptual order (Bertin et al. ‘83) –Varying shades of gray show varying quantities better than color (Tufte ‘83) –Differences in shading should align with the values being presented (Kosslyn et al. ‘83)

Marti Hearst SIMS 247 Represent Software as One-Dimensional Text SeeSoft (Eick 94) –Originally for software development –Show lines of code graphically how often modified written by whom highlight search terms –Extend to text show locations of search terms show recurring features –e.g., characters in a story

Marti Hearst SIMS 247 SeeSoft: Changes of Lines of Code over Time (Eick 94)

Marti Hearst SIMS 247 SeeSoft: Characters in Stories (Eick 94)

Marti Hearst SIMS 247 SeeDiff: Compare Differences between Two Files (Eick and Ball)

Marti Hearst SIMS 247 Alternative Way to Group Documents: Category MetaData Last time we saw ways to visualizedLast time we saw ways to visualized –clusters of documents –clusters of words taken from documents Clusters are data-drivenClusters are data-driven –depend on what documents were clustered –can find main themes –sometimes are hard to understand Alternative: human-generated categoriesAlternative: human-generated categories

Marti Hearst SIMS 247 What is Category Metadata for? “Normalizing” natural language“Normalizing” natural language –distinguish homonyms –group synonyms together Organizing informationOrganizing information –for search –for browsing/navigation Examples:Examples: –Yahoo directory –ACM keyword hierarchy

Marti Hearst SIMS 247 Example: MeSH and MedLine MeSH Medical Category HierarchyMeSH Medical Category Hierarchy –~18,000 labels –manually assigned –~8 labels/article on average –avg depth: 4.5, max depth 9 Top Level Categories:Top Level Categories: anatomydiagnosisrelated disc animalspsychtechnology diseasebiologyhumanities drugsphysics

Marti Hearst SIMS 247 What Categories Do Summarize a document according to pre-defined main topicsSummarize a document according to pre-defined main topics Compress the many ways of representing a concept into oneCompress the many ways of representing a concept into one Identify which subset of attributes are salient for a collectionIdentify which subset of attributes are salient for a collection

Clusters vs. Categories Clusters vs. Categories CLUSTERS Tailored to data Overall themes Require interpretation CATEGORIES Pre-assigned Particular attributes Familiar terminology

Marti Hearst SIMS 247 Large Category Sets Problems for User InterfacesProblems for User Interfaces Too many categories to browse Too many docs per category Docs belong to multiple categories Need to integrate search Need to show the documents

Marti Hearst SIMS 247 Multiple Categories per Document DrugSymptom Anatomy D1S1A1 D2S2A2 D3S3A3 Medical articles contain combinations of these concept types

Marti Hearst SIMS 247 [D1 S3 A1] [D3 S2 S3] [D1 D2 S2 A2] … Dx Sx Ax Dx Sx A1Dx S1 AxD1 Sx Ax Dx S1 A1D1 S1 AxD1 Sx A1 D1 S1 A1 How to Group the Category Types? A Lattice is Infeasible

Marti Hearst SIMS 247 Cat-a-Cone: Interactive Category Interface (Hearst & Karadi 97) Key: Separate representation of documents from categoriesKey: Separate representation of documents from categories –Place categories in 3D animated Tree –Collect retrieved documents into a re- usable “Book” –Link categories from Book to Tree –Innovative query specification

Marti Hearst SIMS 247 Cat-a-Cone: Integrate Navigation and Search (Hearst & Karadi 97) Interface that smoothly integratesInterface that smoothly integrates –search over multiple categories –search over document contents –browsing of multiple categories –browsing of retrieved documents Iterative, InteractiveIterative, Interactive

Marti Hearst SIMS 247 Collection Retrieved Documents search Category Hierarchy browse query terms

Marti Hearst SIMS 247 Collection Retrieved Documents search Category Hierarchy browse query terms

Marti Hearst SIMS 247 Cat-a-Cone (Hearst & Karadi 97)

Marti Hearst SIMS 247 ConeTree for Category Labels Browse/explore category hierarchyBrowse/explore category hierarchy –by search on label names –by growing/shrinking subtrees –by spinning subtrees AffordancesAffordances –learn meaning via ancestors, siblings –disambiguate meanings –all cats simultaneously viewable

Marti Hearst SIMS 247 Virtual Book for Result Sets –Categories on Page (Retrieved Document) linked to Categories in Tree –Flipping through Book Pages causes some Subtrees to Expand and Contract –Most Subtrees remain unchanged –Book can be Stored for later Re-Use

Marti Hearst SIMS 247 Cat-a-Cone (Hearst & Karadi 97) Catacomb:Catacomb: (definition 2b, online Websters) “A complex set of interrelated things” Makes use of earlier PARC work on 3D+animation:Makes use of earlier PARC work on 3D+animation: Rooms Henderson and Card 86 IV: Cone Tree Robertson, Card, Mackinlay 93 Web Book Card, Robertson, York 96

Marti Hearst SIMS 247 Summary: Cat-a-Cone Interface that smoothly integratesInterface that smoothly integrates –search over multiple categories –search over document contents –browsing of multiple categories –browsing of retrieved documents Iterative, InteractiveIterative, Interactive Retain partial results in a workspaceRetain partial results in a workspace

Marti Hearst SIMS 247 Summary: Visualizing Text Text is difficult to visualizeText is difficult to visualize –represents abstract concepts –many combinations of these abstract concepts Main visualization approaches:Main visualization approaches: –collection overviews based on 2D or 3D views of document clusters –graphical displays of relationships to query terms (for information access) –graphical displays of relationships to category subsets Open Questions:Open Questions: –How to walk the border between useful and gratuitous graphics? –Is anything better than showing titles?