Presentation is loading. Please wait.

Presentation is loading. Please wait.

SIMS 296a-3: UI Background Marti Hearst Fall ‘98.

Similar presentations


Presentation on theme: "SIMS 296a-3: UI Background Marti Hearst Fall ‘98."— Presentation transcript:

1 SIMS 296a-3: UI Background Marti Hearst Fall ‘98

2 Marti Hearst UCB SIMS, Fall 98 Interface Topics Today (Other topics will be covered later) (Other topics will be covered later) Supporting the Dynamic Continuing Process of Search Supporting the Dynamic Continuing Process of Search Search Starting Points Search Starting Points

3 Marti Hearst UCB SIMS, Fall 98 Human Information Seeking Behavior

4 Marti Hearst UCB SIMS, Fall 98 Standard Model Assumptions: Assumptions: Maximizing precision and recall simultaneously Maximizing precision and recall simultaneously The information need remains static The information need remains static The value is in the resulting document set The value is in the resulting document set

5 User’s Information Need Index Pre-process Parse Collections Rank or Match Query text input Query Reformulation

6 Marti Hearst UCB SIMS, Fall 98 “Berry-Picking” as an Information Seeking Strategy (Bates 90) Standard IR model Standard IR model The information need remains the same throughout the search session. The information need remains the same throughout the search session. Goal is to produce a perfect set of relevant docs. Goal is to produce a perfect set of relevant docs. Berry-picking model Berry-picking model The query is continually shifting. The query is continually shifting. Users may move through a variety of sources. Users may move through a variety of sources. New information may yield new ideas and new directions. New information may yield new ideas and new directions. The value of search is on the bits and pieces picked up along the way. The value of search is on the bits and pieces picked up along the way.

7 Marti Hearst UCB SIMS, Fall 98 A sketch of a searcher… “moving through many actions towards a general goal of satisfactory completion of research related to an information need.” (after Bates 90) Q0 Q1 Q2 Q3 Q4 Q5

8 Marti Hearst UCB SIMS, Fall 98 Implications Interfaces should make it easy to store intermediate results Interfaces should make it easy to store intermediate results Interfaces should make it easy to follow trails with unanticipated results Interfaces should make it easy to follow trails with unanticipated results Difficulties with evaluation Difficulties with evaluation

9 Marti Hearst UCB SIMS, Fall 98 Supporting the Information Seeking Process Two recent similar approaches that focus on supporting the process Two recent similar approaches that focus on supporting the process SketchTrieve (Hendry & Harper 97) SketchTrieve (Hendry & Harper 97) DLITE (Cousins 97) DLITE (Cousins 97)

10 Marti Hearst UCB SIMS, Fall 98 Informal Interface Informal does not mean less useful Informal does not mean less useful Show how the search is Show how the search is unfolding or evolving unfolding or evolving expanding or contracting expanding or contracting Prompt the user to Prompt the user to reformulate and abandon plans reformulate and abandon plans backtrack to points of task deferral backtrack to points of task deferral make side-by-side comparisons make side-by-side comparisons define and discuss problems define and discuss problems

11 Marti Hearst UCB SIMS, Fall 98 SketchTrieve: An Informal Interface (Hendry & Harper 96, 97) A “spreadsheet” for information access A “spreadsheet” for information access Make use of layout, space, and locality Make use of layout, space, and locality comprehension and explanation comprehension and explanation search planning search planning A data-flow notation for information seeking A data-flow notation for information seeking link sources to queries link sources to queries link both to retrieved documents link both to retrieved documents align results in space for comparison align results in space for comparison

12 Marti Hearst UCB SIMS, Fall 98 SketchTrieve: Connecting Results with Next Query

13 Marti Hearst UCB SIMS, Fall 98 DLITE (Cousins 97) Drag and Drop interface Drag and Drop interface Reify queries, sources, retrieval results Reify queries, sources, retrieval results Animation to keep track of activity Animation to keep track of activity

14 Marti Hearst UCB SIMS, Fall 98 Starting Points for Search Faced with a prompt or an empty entry form … how to start? Faced with a prompt or an empty entry form … how to start? Lists of sources Lists of sources Overviews Overviews Clusters Clusters Category Hierarchies/Subject Codes Category Hierarchies/Subject Codes Co-citation Links Co-citation Links Examples Examples Automatic source selection Automatic source selection

15 Marti Hearst UCB SIMS, Fall 98 List of Sources Have to guess based on the name Have to guess based on the name Requires prior exposure/experience Requires prior exposure/experience

16 Marti Hearst UCB SIMS, Fall 98

17 Marti Hearst UCB SIMS, Fall 98 Overviews in the User Interface Unsupervised Groupings Unsupervised Groupings Clustering Clustering Kohonen Feature Maps Kohonen Feature Maps Supervised Categories Supervised Categories Yahoo! Yahoo! Superbook Superbook HiBrowse HiBrowse Cat-a-Cone Cat-a-Cone Combinations Combinations DynaCat DynaCat SONIA SONIA

18 Marti Hearst UCB SIMS, Fall 98 Text Clustering Finds overall similarities among groups of documents Finds overall similarities among groups of documents Finds overall similarities among groups of tokens Finds overall similarities among groups of tokens Picks out some themes, ignores others Picks out some themes, ignores others

19 Marti Hearst UCB SIMS, Fall 98 Text Clustering Clustering is “The art of finding groups in data.” -- Kaufmann and Rousseeu Term 1 Term 2

20 Marti Hearst UCB SIMS, Fall 98 Text Clustering Term 1 Term 2 Clustering is “The art of finding groups in data.” -- Kaufmann and Rousseeu

21 Marti Hearst UCB SIMS, Fall 98 Document/Document Matrix

22 Marti Hearst UCB SIMS, Fall 98 Agglomerative Clustering ABCDEFGHIABCDEFGHI

23 Marti Hearst UCB SIMS, Fall 98 Agglomerative Clustering ABCDEFGHIABCDEFGHI

24 Marti Hearst UCB SIMS, Fall 98 Agglomerative Clustering ABCDEFGHIABCDEFGHI

25 Marti Hearst UCB SIMS, Fall 98 K-Means Clustering 1 Create a pair-wise similarity measure 1 Create a pair-wise similarity measure 2 Find K centers using agglomerative clustering 2 Find K centers using agglomerative clustering take a small sample take a small sample group bottom up until K groups found group bottom up until K groups found 3 Assign each document to nearest center, forming new clusters 3 Assign each document to nearest center, forming new clusters 4 Repeat 3 as necessary 4 Repeat 3 as necessary

26 Marti Hearst UCB SIMS, Fall 98 The Cluster Hypothesis “Closely associated documents tend to be relevant to the same requests.” van Rijsbergen 1979 “… I would claim that document clustering can lead to more effective retrieval than linear search [which] ignores the relationships that exist between documents.” van Rijsbergen 1979

27 Marti Hearst UCB SIMS, Fall 98 Clustering as Categorization “In a traditional library environment … the items are classified first into subject areas, and a search is restricted to times within a few chosen subject classes. The same device can also be used … [to construct] groups of related documents and confining the search to certain groups only.” Salton 71

28 Marti Hearst UCB SIMS, Fall 98 Clustering as Categorization “… In experiments we often want to vary the cluster representatives at search time. … Of course, were we to design an operational classification, the cluster representatives would be constructed once and for all at cluster time. van Rijsbergen 79

29 Marti Hearst UCB SIMS, Fall 98 Scatter/Gather Cutting, Pedersen, Tukey & Karger 92, 93 Hearst & Pedersen 95 Cluster sets of documents into general “themes”, like a table of contents Cluster sets of documents into general “themes”, like a table of contents Display the contents of the clusters by showing topical terms and typical titles Display the contents of the clusters by showing topical terms and typical titles User chooses subsets of the clusters and re-clusters the documents within User chooses subsets of the clusters and re-clusters the documents within Resulting new groups have different “themes” Resulting new groups have different “themes”

30 query Collection Cluster Rank

31 Marti Hearst UCB SIMS, Fall 98 S/G Example: query on “star” Encyclopedia text 14 sports 8 symbols47 film, tv 8 symbols47 film, tv 68 film, tv (p) 7 music 97 astrophysics 67 astronomy(p)12 steller phenomena 10 flora/fauna 49 galaxies, stars 29 constellations 7 miscelleneous 7 miscelleneous Clustering and re-clustering is entirely automated

32

33

34

35 Marti Hearst UCB SIMS, Fall 98 Two Queries: Two Clusterings AUTO, CAR, ELECTRICAUTO, CAR, SAFETY The main differences are the clusters that are central to the query 8 control drive accident … 25 battery california technology … 48 import j. rate honda toyota … 16 export international unit japan 3 service employee automatic … 6 control inventory integrate … 10 investigation washington … 12 study fuel death bag air … 61 sale domestic truck import … 11 japan export defect unite …

36 Marti Hearst UCB SIMS, Fall 98 Publication History of Scatter/Gather 1991 Patents Filed 1991 Patents Filed SIGIR 92 Initial Algorithm Introduced SIGIR 92 Initial Algorithm Introduced SIGIR 93Optimizations Presented SIGIR 93Optimizations Presented AAAIFS 95 Examples of Use on Retrieval Results AAAIFS 95 Examples of Use on Retrieval Results TREC 95Use in Interactive Track Experiments TREC 95Use in Interactive Track Experiments CHI 96Experiments providing evidence that users learn collection structure CHI 96Experiments providing evidence that users learn collection structure SIGIR 96Evidence that clustering can improve ranking for TREC-like scenario SIGIR 96Evidence that clustering can improve ranking for TREC-like scenario (Publication timing may lag significantly behind when the work was done)

37 Marti Hearst UCB SIMS, Fall 98 Another use of clustering Use clustering to map the entire huge multidimensional document space into a huge number of small clusters. Use clustering to map the entire huge multidimensional document space into a huge number of small clusters. “Project” these onto a 2D graphical representation: “Project” these onto a 2D graphical representation:

38 Marti Hearst UCB SIMS, Fall 98 Clustering Multi-Dimensional Document Space (image from Wise et al 95)

39 Marti Hearst UCB SIMS, Fall 98 Concept “Landscapes” Pharmocology Anatomy Legal Disease Hospitals Built using Kohonen Feature Maps Xia Lin, H.C. Chen

40 Marti Hearst UCB SIMS, Fall 98 Visualization of Clusters Huge 2D maps may be inappropriate focus for information retrieval Huge 2D maps may be inappropriate focus for information retrieval Can’t see what documents are about Can’t see what documents are about Documents forced into one position in semantic space Documents forced into one position in semantic space Space is difficult to use for IR purposes Space is difficult to use for IR purposes Hard to view titles Hard to view titles Perhaps more suited for pattern discovery Perhaps more suited for pattern discovery problem: often only one view on the space problem: often only one view on the space

41 Marti Hearst UCB SIMS, Fall 98 Using Clustering in Document Ranking Cluster entire collection Cluster entire collection Find cluster centroid that best matches the query Find cluster centroid that best matches the query This has been explored extensively This has been explored extensively it is expensive it is expensive it doesn’t work well it doesn’t work well

42 Marti Hearst UCB SIMS, Fall 98 Using Clustering in Interfaces Alternative (scatter/gather): Alternative (scatter/gather): cluster top-ranked documents cluster top-ranked documents show cluster summaries to user show cluster summaries to user Seems useful Seems useful experiments show relevant docs tend to end up in the same cluster experiments show relevant docs tend to end up in the same cluster users seem able to interpret and use the cluster summaries some of the time users seem able to interpret and use the cluster summaries some of the time More computationally feasible More computationally feasible

43 Marti Hearst UCB SIMS, Fall 98 Summary: Clustering Advantages: Advantages: Get an overview of main themes Get an overview of main themes Disadvantage: Disadvantage: Many of the ways documents could group together are not shown Many of the ways documents could group together are not shown Not always easy to understand what they mean Not always easy to understand what they mean Different levels of granularity Different levels of granularity

44 Clustering Advantages: Advantages: Sometimes discover meaningful themes Sometimes discover meaningful themes Data-driven, so reflect emphases present in the collection of documents Data-driven, so reflect emphases present in the collection of documents Can differentiate heterogeneous collections Can differentiate heterogeneous collections Domain independent Domain independent Disadvantages Disadvantages Variability in quality of results Variability in quality of results Only one view on documents’ themes Only one view on documents’ themes Not good at differentiating homogenous collections Not good at differentiating homogenous collections Require interpretation Require interpretation May mis-match users’ interests May mis-match users’ interests

45 Marti Hearst UCB SIMS, Fall 98 Incorporating Categories into the Interface Yahoo is the standard method Yahoo is the standard method Problems: Problems: Hard to search, meant to be navigated. Hard to search, meant to be navigated. Only one category per document (usually) Only one category per document (usually)

46 Marti Hearst UCB SIMS, Fall 98

47 Marti Hearst UCB SIMS, Fall 98 Integrated Browsing & Search Search for category labels Search for category labels Browse category labels Browse category labels Search within document collection Search within document collection Browse resulting documents in book Browse resulting documents in book

48 Marti Hearst UCB SIMS, Fall 98 Example: MeSH and MedLine MeSH Category Hierarchy MeSH Category Hierarchy ~18,000 labels ~18,000 labels manually assigned manually assigned ~8 labels/article on average ~8 labels/article on average avg depth: 4.5, max depth 9 avg depth: 4.5, max depth 9 Top Level Categories: Top Level Categories: anatomydiagnosisrelated disc animalspsychtechnology diseasebiologyhumanities drugsphysics

49 Marti Hearst UCB SIMS, Fall 98 Large Category Sets Problems for User Interfaces Problems for User Interfaces Too many categories to browse Too many categories to browse Too many docs per category Too many docs per category Docs belong to multiple categories Docs belong to multiple categories Need to integrate search Need to integrate search Need to show the documents Need to show the documents We’ll discuss this more next week. We’ll discuss this more next week.

50 Marti Hearst UCB SIMS, Fall 98 Category Labels Advantages: Advantages: Interpretable Interpretable Capture summary information Capture summary information Describe multiple facets of content Describe multiple facets of content Domain dependent, and so descriptive Domain dependent, and so descriptive Disadvantages Disadvantages Do not scale well (for organizing documents) Do not scale well (for organizing documents) Domain dependent, so costly to acquire Domain dependent, so costly to acquire May mis-match users’ interests May mis-match users’ interests

51 Marti Hearst UCB SIMS, Fall 98 Other Starting Points Approaches Co-citation Links Co-citation Links Examples, Guided Tours Examples, Guided Tours

52 Marti Hearst UCB SIMS, Fall 98 Next Week Interfaces for Subject Codes/Category Hierarchies Interfaces for Subject Codes/Category Hierarchies Leader: Alison Brandt Leader: Alison Brandt


Download ppt "SIMS 296a-3: UI Background Marti Hearst Fall ‘98."

Similar presentations


Ads by Google