Interaction LBSC 734 Module 4 Doug Oard. Agenda Where interaction fits Query formulation Selection part 1: Snippets  Selection part 2: Result sets Examination.

Slides:



Advertisements
Similar presentations
Visualizing Information: Using WebTheme to Visualize Internet Search Results Karen Buxton and Mary Frances Lembo The Value of Information: American Society.
Advertisements

1 Ranked-Listed or Categorized Results in IR Zheng Zhu, Ingemar J. Cox, Mark Levene Birkbeck College, University of London UCL.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, February 23, 2011.
Ranked Retrieval INST 734 Module 3 Doug Oard. Agenda  Ranked retrieval Similarity-based ranking Probability-based ranking.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Jane Reid, AMSc IRIC, QMUL, 13/11/01 1 IR interfaces Purpose: to support users in information-seeking tasks Issues: –Functionality –Usability Motivations.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Interfaces for Retrieval Results. Information Retrieval Activities Selecting a collection –Talked about last class –Lists, overviews, wizards, automatic.
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
9/18/2001Information Organization and Retrieval Vector Representation, Term Weights and Clustering (continued) Ray Larson & Warren Sack University of California,
I256 Applied Natural Language Processing Fall 2009
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.
LBSC 796/INFM 718R: Week 10 Clustering, Classifying, and Filtering Jimmy Lin College of Information Studies University of Maryland Monday, April 10, 2006.
INFM 700: Session 7 Unstructured Information (Part II) Jimmy Lin The iSchool University of Maryland Monday, March 10, 2008 This work is licensed under.
Information Retrieval Interaction CMSC 838S Douglas W. Oard April 27, 2006.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 16: Flat Clustering 1.
SIMS 296a-3: UI Background Marti Hearst Fall ‘98.
Information Retrieval Ch Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.
9/13/2001Information Organization and Retrieval Vector Representation, Term Weights and Clustering Ray Larson & Warren Sack University of California, Berkeley.
1 Discussion Class 12 User Interfaces and Visualization.
9/14/2000Information Organization and Retrieval Vector Representation, Term Weights and Clustering Ray Larson & Marti Hearst University of California,
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
9/19/2000Information Organization and Retrieval Vector and Probabilistic Ranking Ray Larson & Marti Hearst University of California, Berkeley School of.
Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.
ISP 433/633 Week 12 User Interface in IR. Why care about User Interface in IR Human Search using IR depends on –Search in IR and search in human memory.
Overview of Search Engines
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
1 Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections Zhao-Yan Ming, Kai Wang and Tat-Seng Chua School of Computing,
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
User Interface Design LBSC 708A/CMSC 838L Douglas W. Oard Session 5, October 9, 2001.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
1 LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora Chien-Chung Huang Shui-Lung Chuang Lee-Feng Chien Presented by: Vu LONG.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
SCATTER/GATHER : A CLUSTER BASED APPROACH FOR BROWSING LARGE DOCUMENT COLLECTIONS GROUPER : A DYNAMIC CLUSTERING INTERFACE TO WEB SEARCH RESULTS MINAL.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Advanced Scientific Visualization
Faceted Search Zhao Jing Outline  What is faceted search?  Why use faceted search?  Topics of interests  Faceted Search in Dataspace.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
V Material obtained from summer workshop in Guildford County, July-2014.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Clustering C.Watters CS6403.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining and Text Mining. The Standard Data Mining process.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
SIMS 202, Marti Hearst Content Analysis Prof. Marti Hearst SIMS 202, Lecture 15.
Information Organization: Overview
Slides Please download the slides from
Visualizing Document Collections
Text Categorization Rong Jin.
Document Clustering Matt Hughes.
Text Categorization Berlin Chen 2003 Reference:
Information Organization: Overview
Presentation transcript:

Interaction LBSC 734 Module 4 Doug Oard

Agenda Where interaction fits Query formulation Selection part 1: Snippets  Selection part 2: Result sets Examination

The Cluster Hypothesis “Closely associated documents tend to be relevant to the same requests.” van Rijsbergen 1979

Single Link: Group two most similar members Complete Link: Group two least similar members Group Average:Group two most similar centroids Centroids

Clustered Results

Diversity Ranking Query ambiguity –UPS: United Parcel Service –UPS: Uninteruptable power supply –UPS: University of Puget Sound Query aspects –United Parcel Service: store locations –United Parcel Service: delivery tracking –United Parcel Service: stock price

Scatter/Gather System clusters documents into “themes” –Displays clusters by showing: Topical terms Typical titles User chooses a subset of the clusters System re-clusters documents in selected cluster –New clusters have different, more refined, “themes” Marti A. Hearst and Jan O. Pedersen. (1996) Reexaming the Cluster Hypothesis: Scatter/Gather on Retrieval Results. Proceedings of SIGIR 1996.

symbols8 docs film, tv 68 docs astrophysics97 docs astronomy67 docs flora/fauna10 docs sports14 docs film, tv47 docs music7 docs stellar phenomena12 docs galaxies, stars49 docs constellations29 docs miscellaneous7 docs Query = “star” Scatter/Gather Example

Hierarchical Agglomerative Clustering Start with each document in its own cluster Until there is only one cluster: –Determine the two most similar clusters c i and c j –Replace c i and c j with a single cluster c i  c j

Kartoo’s Cluster Visualization

Summary: Clustering Advantages: –Provides an overview of main themes in search results –Makes it easier to skip over similar documents Disadvantages: –Not always easy to understand the theme of a cluster –Documents can be clustered in many ways –Correct level of granularity can be hard to guess –Computationally costly

Open Directory Project

SWISH: Faceted Browsing List Display Category Display Query: jaguar Chen and Dumais, Bringing Order to the Web: Automatically Categorizing Search Results, CHI 2000

Text Classification Obtain a training set with ground truth labels Use “supervised learning” to train a classifier –This is equivalent to learning a query –Many techniques: kNN, SVM, decision tree, … Apply classifier to new documents –Assigns labels according to patterns learned in training

Example: k Nearest Neighbor (kNN) Select k most similar labeled documents Have them “vote” on the best label: –Each document gets one vote, or –More similar documents get a larger vote

Visualization: ThemeView Pacific Northwest National Laboratory

WebTheme

An Interface Taxonomy List (one-dimensional) –Navigation:Pagination, continuous scrolling, … –Content: Title, source, date, summary, ratings,... –Order: “Relevance,” date, alphabetic,... Screen (two-dimensional) –Construction: Clustering, classification, scatterplot, … –Navigation: Jump, pan, zoom Virtual reality (three-dimensional) –Navigation: “Fishtank” VR, immersive VR

Selection Recap Summarization –Query-biased snippets work well Clustering –Basis for “diversity ranking” Classification –Basis for “faceted browsing” Visualization –Useful for exploratory search

Agenda Where interaction fits Query formulation Selection part 1: Snippets Selection part 2: Result sets  Examination