INFO624 -- Week 9 Effective Information Retrieval Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.

Slides:



Advertisements
Similar presentations
Recuperação de Informação B Cap. 10: User Interfaces and Visualization 10.1,10.2,10.3 November 17, 1999.
Advertisements

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Information Retrieval in Practice
Search Engines and Information Retrieval
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Help and Documentation zUser support issues ydifferent types of support at different times yimplementation and presentation both important yall need careful.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
INFO 624 Week 3 Retrieval System Evaluation
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
Information Retrieval Interaction CMSC 838S Douglas W. Oard April 27, 2006.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
HCI Part 2 and Testing Session 9 INFM 718N Web-Enabled Databases.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
CS 5764 Information Visualization Dr. Chris North.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
WMES3103: INFORMATION RETRIEVAL WEEK 10 : USER INTERFACES AND VISUALIZATION.
ISP 433/633 Week 12 User Interface in IR. Why care about User Interface in IR Human Search using IR depends on –Search in IR and search in human memory.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Search Engines
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Information Design and Visualization
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Search Engines and Information Retrieval Chapter 1.
Ch 6 - Menu-Based and Form Fill-In Interactions Yonglei Tao School of Computing & Info Systems GVSU.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
JASS 2005 Next-Generation User-Centered Information Management Information visualization Alexander S. Babaev Faculty of Applied Mathematics.
INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
1 Adapting the TileBar Interface for Visualizing Resource Usage Session 602 Adapting the TileBar Interface for Visualizing Resource Usage Session 602 Larry.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Information Retrieval Evaluation and the Retrieval Process.
Medline on OvidSP. Medline Facts Extensive MeSH thesaurus structure with many synonyms used in mapping and multidatabase searching with Embase Thesaurus.
Information Visualization: Ten Years in Review Xia Lin Drexel University.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.
Media Arts and Technology Graduate Program UC Santa Barbara MAT 259 Visualizing Information Winter 2006George Legrady1 MAT 259 Visualizing Information.
Recuperação de Informação B Cap. 10: User Interfaces and Visualization , , 10.9 November 29, 1999.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Chap#11 What is User Support?
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
Information Architecture & Design Week 9 Schedule - Web Research Papers Due Now - Questions about Metaphors and Icons with Labels - Design 2- the Web -
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
W orkshops in I nformation S kills and E lectronic R esources Oxford University Library Services – Information Skills Training Finding quality information.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Searching the Web for academic information Ruth Stubbings.
Agents & Agency What do we mean by agents? Are agents just a metaphor?
Multimedia Information Retrieval
Information Design and Visualization
Introduction to Information Retrieval
Chapter 11 user support.
Planning and Storyboarding a Web Site
Information Retrieval and Web Design
Presentation transcript:

INFO Week 9 Effective Information Retrieval Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University

Effective Information Retrieval System’s perspectives System’s perspectives  Fast indexing and retrieval algorithms  Inverted indexing. Tree structures, Hash tables  Semantic indexing and mapping  Subject indexing  Latent semantic indexing  Intelligent information retrieval  Knowledge representation  Logical inferences

Effective Information Retrieval User’s perspectives User’s perspectives  Iteration  Relevance Feedback  Use User's Profiles  Graphical Display of Search Results  Browsing/Interactive Searching We can’t change the user. We should make the system to adapt to the user’s needs We can’t change the user. We should make the system to adapt to the user’s needs

Iteration Most search needs to be done iteratively Most search needs to be done iteratively  From the user’s point of view  The first query often does not retrieve what the user wants  The user needs to see the output of previous queries to construct the next query  The user often needs to reconstruct his/her information needs after they read/browse search results.

Iteration – User’s strategies Modify queries repeatedly based on some goals Modify queries repeatedly based on some goals  Starting with high precision  Use a specific query first  Broaden queries to include more relevant documents "pearl growing""pearl growing"  Starting with high recall  Use a very broad query  Improve precision gradually  "onion peeling"  Starting with known items  Find documents similar to the known items Browsing/interactive searching Browsing/interactive searching

Iteration – System’s strategies If the system can “learn” from the user’s activities, the system likely can retrieve better results to meet user’s needs. If the system can “learn” from the user’s activities, the system likely can retrieve better results to meet user’s needs.  Relevance feedback  User’s profiles The system should provide better output representations to help the user The system should provide better output representations to help the user  Browse  Conduct interactive searches.

Relevance Feedback Feedback: The user provides information that the system can use to modify its next search or next display Feedback: The user provides information that the system can use to modify its next search or next display Relevant Feedback: Relevant Feedback:  Users let the system know  what documents are relevant to their information needs  What concepts or terms are related to their information needs  What weights they would like the system to put on each relevant documents/terms

Relevant Feedback – System’s Strategy The system should invite the user to select relevant documents/terms from the retrieved results before the second retrieval is conducted The system should invite the user to select relevant documents/terms from the retrieved results before the second retrieval is conducted The system should use information from user's feedback to conduct next search. The system should use information from user's feedback to conduct next search.

Design IR Systems with relevance feedback Collect relevance feedback through Collect relevance feedback through  Binary vs. scales  Positive and negative feedback Apply relevance feedback to Apply relevance feedback to  Query  Profile  Document  Retrieval algorithm

User Profiles User profiles User profiles  information about the user’s information needs that IR system can use to modify its search process. Simple user profiles Simple user profiles  A list of terms that the user selects to represent his/her information needs  A list of terms with weights

Extended user profiles Extended user profiles  More complex term structures  Information use patterns  levels of interests  User’s background information  User’s browsing behaviors  What pages the user has visited last week, last month, …  From which page to which page …

Use of user Profiles Selective Dissemination of Information (SDI) Selective Dissemination of Information (SDI)  The system regularly runs the search to get any new information that matches user’s profiles.  The user can set up several profiles  Once they are set up, the queries are always the same.  The user can set the frequency of the update searches.

SDI Advantages of SDI Advantages of SDI  Automatic retrieval of new information for the user  Set up a profile once, use the profile for retrieval many times.  The user can change the profiles or the search frequency as needed. Disadvantages of SDI Disadvantages of SDI  The query based on the profile is static  Timing problems  Information in need is information indeed.  Something I am very interest, but it did not come at the time I want to read it.

Use profiles during the search Modify the query Modify the query  When the user sends a query, the system automatically adds some terms to the query from the user’s profiles.  When the user sends a query, the system checks if the query terms is in user’s profile. If it is, increase the weight for the terms. Organize the search results Organize the search results  When the user sends a query, the system uses the profiles information to organize the search results (such as clustering, ranking, )

Browsing Browsing is an act of human information seeking Browsing is an act of human information seeking a mental process of identifying and choosing information a mental process of identifying and choosing information a dynamic process that varies in time and depends on intermediate results. a dynamic process that varies in time and depends on intermediate results. a part of process of decision making, problem solving, etc. a part of process of decision making, problem solving, etc.

Browsing for Information Retrieval A kind of searching process in which the initial search criteria or goals are only partly defined A kind of searching process in which the initial search criteria or goals are only partly defined  general-purpose web browsing An art of not knowing what one wants until one finds it An art of not knowing what one wants until one finds it  visual recognition  content recognition

Browsing for Information Retrieval A learning activity that emphasizes structures and interactive process A learning activity that emphasizes structures and interactive process  exploratory  movements based on feedback A process of finding and navigating in a unknown or unfamiliar information space A process of finding and navigating in a unknown or unfamiliar information space  becoming aware of new contents  finding unexpected results

Search or Browse? Would you like to search using a search engine or would you like to browse from pages to pages (or through a hierarchy)?   Depend on what?

Factors of browsing Purposes Purposes  Fact retrieval  Concept formation or interpretation  Current awareness Tasks Tasks  Well-defined tasks  Ill-defined tasks  number of items to browse

Factors of browsing Individual characteristics Individual characteristics  Motivation  Experience and knowledge  Cognitive styles Context Context  Subject disciplines  Organizational schemes  Nature of text/information Medium Medium  Does the system support browsing?

IR Systems that support browsing Good navigation tools Good navigation tools  Easy to move from one item to another  Links  good structures  fast access  Easy to back track  Correct any errors  make new selections

IR Systems that support browsing Good displays Good displays  easy to read  meaningful orders of retrieval results  graphical presentation Meaningful content organization Meaningful content organization  contextual hierarchical structures  Grouping of related items  Contextual landmarks

“why just browse when you can fly?” HotSauce is an innovative 3D fly-through interface for navigating information spaces. It was developed, largely as a one-man effort, by Ramanathan V. Guha while at Apple Research in the mid-1990s. HotSauce was a specific 3D spatialization of the Meta Content Framework (MCF) also developed by Guha. HotSauce is an innovative 3D fly-through interface for navigating information spaces. It was developed, largely as a one-man effort, by Ramanathan V. Guha while at Apple Research in the mid-1990s. HotSauce was a specific 3D spatialization of the Meta Content Framework (MCF) also developed by Guha.

HotSauce

Why Surf alone? What if you had an assistant always looking ahead for you [when browsing the web]…. What if you had an assistant always looking ahead for you [when browsing the web]….  The assistant could warn you if the page was irrelevant, could alert you if that link or some other link merited your attention.  The assistant could save you time and frustration. CACM,44(8), p.71, 2001

Information Agents a software that applies user profiles, dynamically and intelligently, to search tasks a software that applies user profiles, dynamically and intelligently, to search tasks Search distributed, possibly heterogeneous information resources on the user’s behalf. Search distributed, possibly heterogeneous information resources on the user’s behalf. Gather and integrate search results by some Artificial Intelligence techniques Gather and integrate search results by some Artificial Intelligence techniques Accept user’s feedback and use the feedback to modify the user profiles and search strategies Accept user’s feedback and use the feedback to modify the user profiles and search strategies

Architecting Browsable Websites Design site structures Design site structures  Metaphor Exploration  Organizational metaphors  Functional metaphors  Visual metaphors  Define Navigation  Global navigation  Local navigation  Design Document

Interactive Systems “When an interactive system is well-designed, the interface almost disappears, enabling users to concentrate on their work, exploration, or pleasure.” “When an interactive system is well-designed, the interface almost disappears, enabling users to concentrate on their work, exploration, or pleasure.” – Ben Shneiderman

Design Principles Offer informative feedbacks Offer informative feedbacks  Relationships between query and documents retrieved  Relationships among retrieved documents  Relationships between metadata and documents Reducing working memory load Reducing working memory load  Keep tracks of choices made during the search process  Allow user to return temporarily abandoned strategies or jump from one strategy to another  Retain information and context across search session.

Provide alternative interfaces for novice and expert users. Provide alternative interfaces for novice and expert users.  Simplicity vs. power

Output Presentation for Search engines Two major issues Two major issues  What information to present?  How to organize the output items? Information in the output display Information in the output display  Traditional databases  Document reference numbers (unique number)  Citations (author, title, source)  Document surrogate (citation plus abstract and/or indexing terms)  fulltext

 On the web  title, url  First few sentences/related sentences/summaries  Dates / page sizes  Degree of relevance  special links “find similar one”“find similar one”  Types of links  Related categories

What other information you may wish to have in the retrieval output? What other information you may wish to have in the retrieval output?  Citations (or links from this document)?  Critique or evaluation?  Access information (how many times it was accessed in last 6 months)?  Links to this document  Author contact information ?  Why documents were retrieved?

Output organization Linear Linear  a list of documents  listed by  best match  alphabetical orders  dates  order of selected fields (authors, titles, web sites)

Linear display Linear display  Practical and most popular  easy to generate  users know how to use it  Did not shown relationships among documents!  Document relationships are more complex than a linear one

Hierarchical display Hierarchical display  Separate data into different levels or branches  Branches can be expanded/collapsed.  Show more data in less space  Show the organization of the data

Graphical displays Graphical displays  Show more complex relationships  Use location, colors, dimensions, etc to represent documents, terms or concepts.  Provide more interactive functions

What is IV? The use of computer-supported, interactive, visual representations of abstract data The use of computer-supported, interactive, visual representations of abstract data  to assist navigation in large information spaces  to reveal complex information structures  to amplify cognition System- centered View User- centered

IV and IR Both need to process a large amount of information Both need to process a large amount of information Both are tools to assist the cognitive process of finding, learning, and understanding information. Both are tools to assist the cognitive process of finding, learning, and understanding information. Both face the challenge of “uncertainty” Both face the challenge of “uncertainty”  Not an “Exact science” Both subject to human’s interpretation. Both subject to human’s interpretation.

VIRI -- Visual Information Retrieval Interfaces 2-dimensional graphical display 2-dimensional graphical display  use graphical objects (icons, dots etc.) to represent documents  Use geographical relationships to indicate document relationships  use colors to group/differentiate documents  use animation to assist interaction

Concept Visualization AltaVista LiveTopic AltaVista LiveTopic HiBrowse Interface HiBrowse Interface SemioMap SemioMap Hyperbolic Trees Hyperbolic Trees Visual Thesaurus Visual Thesaurus Visual Concept Explorer Visual Concept Explorer

Alta Vista’s LiveTopic

ConceptSpace

HiBrowse Interface

SemioMap

Inxight.com

Topic Maps  Highwire:

Visual Thesaurus

Visual Concept Explorer

Concept Mapping

MedLine Search

IBM – Visualization Space This information system understands the user. It "hears" users' voice commands and "sees"their gestures and body positions. Interactions are natural, more like human-to-human interactions.

Visual Search Engines TheBrain TheBrain Mooter Mooter Kartoo Kartoo MapStan MapStan Grokker Grokker ToughGraph ToughGraph StarNight StarNight NewsLink NewsLink

WebBrain

Mooter:

Kartoo:

MapStan:

Grokker

Touchgraph

StarrynightStarrynight from RHIZOME

Galaxy of News Rennison 95

Galaxy of News Rennison 95

Map of Information Scientists

Author Mapping

AuthorLink

NewsLink Integrate Integrate And cross mapping And cross mapping  Mapping on topics; displaying by people;  Mapping on people; display by organization;  Etc. NewsLink: NewsLink:

Discussion Information Visualization Information Visualization  What works and what does not?

VIRI Advantages Advantages  More representational power  show more information in a limited screen space  many different ways to group documents  can put both keywords and documents in the same 2-dimensional space  Provide good overview  Provide more interaction

VIRI Disadvantages Disadvantages  Difficult to generate  Not always easy to understand  Many not be specific enough  Hard to use

Evaluation of IR Systems Using Recall & Precision Using Recall & Precision  Conduct query searches  Try many different queries  Results may depend on sampling queries.  Compare results of Precision & Recall  Recall & Precision need to be considered together.

How to calculate Recall Determine recall for the whole collection Determine recall for the whole collection  Take a random sample to estimate  Use a broad query to select a sample collection for the estimation  Use “seed” documents Use relative recall Use relative recall  Use two more expert searches as the base.  Use one system as the base to estimate recall on other systems Use a small test collection Use a small test collection  Use experts to judge relevance of every document.  Prepare special collections.

Functionalities Precision and Recall are particularly useful for evaluating searching/indexing algorithms, and system features. Precision and Recall are particularly useful for evaluating searching/indexing algorithms, and system features.  Compare P & R with and without fuzzy search.  Compare P & R with different type of indexing options  Compare P & R across systems with the sample features Precision and Recall are query-oriented, not system-oriented. Precision and Recall are query-oriented, not system-oriented.

Evaluation without P & R The emphasis should be on the user and the interaction. The emphasis should be on the user and the interaction.  Be specific on data collection  How data are collected and indexed?  Is there a quality control for the data collection?  Be creative on the test questions and methods  Not just questionnaires  Be selective on subject groups

Quality Evaluation Data quality Data quality  Coverage of database  It will not be found if it is not in the database.  Completeness and accuracy of data  Indexing methods and indexing quality  It will not be found if it is not indexed.  indexing types  currency of indexing ( Is it updated often?)  indexing sizes

Interface Consideration User friendly interface User friendly interface  How long does it take for a user to learn advanced features?  How well can the user explore or interact with the query output?  How easy is it to customize output displays?

User Satisfaction User satisfaction User satisfaction  The final test is the user!  User satisfaction is more important then precision and recall Measuring user satisfaction Measuring user satisfaction  Survey  Use statistics  User experiments

User Experiments Observe and collect data on Observe and collect data on  System behaviors  User search behaviors  User-system interaction Interpret experiment results Interpret experiment results  for system comparisons  for understanding user’s information seeking behaviors  for developing new retrieval systems/interfaces