Download presentation
Presentation is loading. Please wait.
Published byAusten Small Modified over 9 years ago
1
INFO624 -- Week 9 Effective Information Retrieval Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University
2
Effective Information Retrieval System’s perspectives System’s perspectives Fast indexing and retrieval algorithms Inverted indexing. Tree structures, Hash tables Semantic indexing and mapping Subject indexing Latent semantic indexing Intelligent information retrieval Knowledge representation Logical inferences
3
Effective Information Retrieval User’s perspectives User’s perspectives Iteration Relevance Feedback Use User's Profiles Graphical Display of Search Results Browsing/Interactive Searching We can’t change the user. We should make the system to adapt to the user’s needs We can’t change the user. We should make the system to adapt to the user’s needs
4
Iteration Most search needs to be done iteratively Most search needs to be done iteratively From the user’s point of view The first query often does not retrieve what the user wants The user needs to see the output of previous queries to construct the next query The user often needs to reconstruct his/her information needs after they read/browse search results.
5
Iteration – User’s strategies Modify queries repeatedly based on some goals Modify queries repeatedly based on some goals Starting with high precision Use a specific query first Broaden queries to include more relevant documents "pearl growing""pearl growing" Starting with high recall Use a very broad query Improve precision gradually "onion peeling" Starting with known items Find documents similar to the known items Browsing/interactive searching Browsing/interactive searching
6
Iteration – System’s strategies If the system can “learn” from the user’s activities, the system likely can retrieve better results to meet user’s needs. If the system can “learn” from the user’s activities, the system likely can retrieve better results to meet user’s needs. Relevance feedback User’s profiles The system should provide better output representations to help the user The system should provide better output representations to help the user Browse Conduct interactive searches.
7
Relevance Feedback Feedback: The user provides information that the system can use to modify its next search or next display Feedback: The user provides information that the system can use to modify its next search or next display Relevant Feedback: Relevant Feedback: Users let the system know what documents are relevant to their information needs What concepts or terms are related to their information needs What weights they would like the system to put on each relevant documents/terms
8
Relevant Feedback – System’s Strategy The system should invite the user to select relevant documents/terms from the retrieved results before the second retrieval is conducted The system should invite the user to select relevant documents/terms from the retrieved results before the second retrieval is conducted The system should use information from user's feedback to conduct next search. The system should use information from user's feedback to conduct next search.
9
Design IR Systems with relevance feedback Collect relevance feedback through Collect relevance feedback through Binary vs. scales Positive and negative feedback Apply relevance feedback to Apply relevance feedback to Query Profile Document Retrieval algorithm
10
User Profiles User profiles User profiles information about the user’s information needs that IR system can use to modify its search process. Simple user profiles Simple user profiles A list of terms that the user selects to represent his/her information needs A list of terms with weights
11
Extended user profiles Extended user profiles More complex term structures Information use patterns levels of interests User’s background information User’s browsing behaviors What pages the user has visited last week, last month, … From which page to which page …
12
Use of user Profiles Selective Dissemination of Information (SDI) Selective Dissemination of Information (SDI) The system regularly runs the search to get any new information that matches user’s profiles. The user can set up several profiles Once they are set up, the queries are always the same. The user can set the frequency of the update searches.
13
SDI Advantages of SDI Advantages of SDI Automatic retrieval of new information for the user Set up a profile once, use the profile for retrieval many times. The user can change the profiles or the search frequency as needed. Disadvantages of SDI Disadvantages of SDI The query based on the profile is static Timing problems Information in need is information indeed. Something I am very interest, but it did not come at the time I want to read it.
14
Use profiles during the search Modify the query Modify the query When the user sends a query, the system automatically adds some terms to the query from the user’s profiles. When the user sends a query, the system checks if the query terms is in user’s profile. If it is, increase the weight for the terms. Organize the search results Organize the search results When the user sends a query, the system uses the profiles information to organize the search results (such as clustering, ranking, )
15
Browsing Browsing is an act of human information seeking Browsing is an act of human information seeking a mental process of identifying and choosing information a mental process of identifying and choosing information a dynamic process that varies in time and depends on intermediate results. a dynamic process that varies in time and depends on intermediate results. a part of process of decision making, problem solving, etc. a part of process of decision making, problem solving, etc.
16
Browsing for Information Retrieval A kind of searching process in which the initial search criteria or goals are only partly defined A kind of searching process in which the initial search criteria or goals are only partly defined general-purpose web browsing An art of not knowing what one wants until one finds it An art of not knowing what one wants until one finds it visual recognition content recognition
17
Browsing for Information Retrieval A learning activity that emphasizes structures and interactive process A learning activity that emphasizes structures and interactive process exploratory movements based on feedback A process of finding and navigating in a unknown or unfamiliar information space A process of finding and navigating in a unknown or unfamiliar information space becoming aware of new contents finding unexpected results
18
Search or Browse? Would you like to search using a search engine or would you like to browse from pages to pages (or through a hierarchy)? Depend on what?
19
Factors of browsing Purposes Purposes Fact retrieval Concept formation or interpretation Current awareness Tasks Tasks Well-defined tasks Ill-defined tasks number of items to browse
20
Factors of browsing Individual characteristics Individual characteristics Motivation Experience and knowledge Cognitive styles Context Context Subject disciplines Organizational schemes Nature of text/information Medium Medium Does the system support browsing?
21
IR Systems that support browsing Good navigation tools Good navigation tools Easy to move from one item to another Links good structures fast access Easy to back track Correct any errors make new selections
22
IR Systems that support browsing Good displays Good displays easy to read meaningful orders of retrieval results graphical presentation Meaningful content organization Meaningful content organization contextual hierarchical structures Grouping of related items Contextual landmarks
23
“why just browse when you can fly?” HotSauce is an innovative 3D fly-through interface for navigating information spaces. It was developed, largely as a one-man effort, by Ramanathan V. Guha while at Apple Research in the mid-1990s. HotSauce was a specific 3D spatialization of the Meta Content Framework (MCF) also developed by Guha. HotSauce is an innovative 3D fly-through interface for navigating information spaces. It was developed, largely as a one-man effort, by Ramanathan V. Guha while at Apple Research in the mid-1990s. HotSauce was a specific 3D spatialization of the Meta Content Framework (MCF) also developed by Guha.
24
HotSauce
25
Why Surf alone? What if you had an assistant always looking ahead for you [when browsing the web]…. What if you had an assistant always looking ahead for you [when browsing the web]…. The assistant could warn you if the page was irrelevant, could alert you if that link or some other link merited your attention. The assistant could save you time and frustration. CACM,44(8), p.71, 2001
26
Information Agents a software that applies user profiles, dynamically and intelligently, to search tasks a software that applies user profiles, dynamically and intelligently, to search tasks Search distributed, possibly heterogeneous information resources on the user’s behalf. Search distributed, possibly heterogeneous information resources on the user’s behalf. Gather and integrate search results by some Artificial Intelligence techniques Gather and integrate search results by some Artificial Intelligence techniques Accept user’s feedback and use the feedback to modify the user profiles and search strategies Accept user’s feedback and use the feedback to modify the user profiles and search strategies
27
Architecting Browsable Websites Design site structures Design site structures Metaphor Exploration Organizational metaphors Functional metaphors Visual metaphors Define Navigation Global navigation Local navigation Design Document
28
Interactive Systems “When an interactive system is well-designed, the interface almost disappears, enabling users to concentrate on their work, exploration, or pleasure.” “When an interactive system is well-designed, the interface almost disappears, enabling users to concentrate on their work, exploration, or pleasure.” – Ben Shneiderman
29
Design Principles Offer informative feedbacks Offer informative feedbacks Relationships between query and documents retrieved Relationships among retrieved documents Relationships between metadata and documents Reducing working memory load Reducing working memory load Keep tracks of choices made during the search process Allow user to return temporarily abandoned strategies or jump from one strategy to another Retain information and context across search session.
30
Provide alternative interfaces for novice and expert users. Provide alternative interfaces for novice and expert users. Simplicity vs. power
31
Output Presentation for Search engines Two major issues Two major issues What information to present? How to organize the output items? Information in the output display Information in the output display Traditional databases Document reference numbers (unique number) Citations (author, title, source) Document surrogate (citation plus abstract and/or indexing terms) fulltext
32
On the web title, url First few sentences/related sentences/summaries Dates / page sizes Degree of relevance special links “find similar one”“find similar one” Types of links Related categories
33
What other information you may wish to have in the retrieval output? What other information you may wish to have in the retrieval output? Citations (or links from this document)? Critique or evaluation? Access information (how many times it was accessed in last 6 months)? Links to this document Author contact information ? Why documents were retrieved?
34
Output organization Linear Linear a list of documents listed by best match alphabetical orders dates order of selected fields (authors, titles, web sites)
35
Linear display Linear display Practical and most popular easy to generate users know how to use it Did not shown relationships among documents! Document relationships are more complex than a linear one
36
Hierarchical display Hierarchical display Separate data into different levels or branches Branches can be expanded/collapsed. Show more data in less space Show the organization of the data
37
Graphical displays Graphical displays Show more complex relationships Use location, colors, dimensions, etc to represent documents, terms or concepts. Provide more interactive functions
38
What is IV? The use of computer-supported, interactive, visual representations of abstract data The use of computer-supported, interactive, visual representations of abstract data to assist navigation in large information spaces to reveal complex information structures to amplify cognition System- centered View User- centered
39
IV and IR Both need to process a large amount of information Both need to process a large amount of information Both are tools to assist the cognitive process of finding, learning, and understanding information. Both are tools to assist the cognitive process of finding, learning, and understanding information. Both face the challenge of “uncertainty” Both face the challenge of “uncertainty” Not an “Exact science” Both subject to human’s interpretation. Both subject to human’s interpretation.
40
VIRI -- Visual Information Retrieval Interfaces 2-dimensional graphical display 2-dimensional graphical display use graphical objects (icons, dots etc.) to represent documents Use geographical relationships to indicate document relationships use colors to group/differentiate documents use animation to assist interaction
41
Concept Visualization AltaVista LiveTopic AltaVista LiveTopic HiBrowse Interface HiBrowse Interface SemioMap SemioMap Hyperbolic Trees Hyperbolic Trees Visual Thesaurus Visual Thesaurus Visual Concept Explorer Visual Concept Explorer
42
Alta Vista’s LiveTopic
43
ConceptSpace
44
HiBrowse Interface
45
SemioMap
46
Inxight.com
47
Topic Maps Highwire: http://www.highwire.org http://www.highwire.org
48
Visual Thesaurus
49
Visual Concept Explorer
50
Concept Mapping
52
MedLine Search
53
IBM – Visualization Space This information system understands the user. It "hears" users' voice commands and "sees"their gestures and body positions. Interactions are natural, more like human-to-human interactions.
54
Visual Search Engines TheBrain TheBrain Mooter Mooter Kartoo Kartoo MapStan MapStan Grokker Grokker ToughGraph ToughGraph StarNight StarNight NewsLink NewsLink
55
WebBrain http://www.webbrain.com/ http://www.webbrain.com/
56
Mooter: http://www.mooter.com/http://www.mooter.com/
57
Kartoo: http://www.kartoo.com/http://www.kartoo.com/
58
MapStan: http://search.mapstan.com/
59
Grokker http://www.groxis.com/service/grok/g_products.html
60
Touchgraph http://www.touchgraph.com/ http://www.touchgraph.com/ http://www.touchgraph.com/
62
StarrynightStarrynight from RHIZOME
63
Galaxy of News Rennison 95
64
Galaxy of News Rennison 95
65
Map of Information Scientists
66
Author Mapping
67
AuthorLink
68
NewsLink Integrate Integrate And cross mapping And cross mapping Mapping on topics; displaying by people; Mapping on people; display by organization; Etc. NewsLink: http://project.cis.drexel.edu/lexislink/ NewsLink: http://project.cis.drexel.edu/lexislink/http://project.cis.drexel.edu/lexislink/
69
Discussion Information Visualization Information Visualization What works and what does not?
70
VIRI Advantages Advantages More representational power show more information in a limited screen space many different ways to group documents can put both keywords and documents in the same 2-dimensional space Provide good overview Provide more interaction
71
VIRI Disadvantages Disadvantages Difficult to generate Not always easy to understand Many not be specific enough Hard to use
72
Evaluation of IR Systems Using Recall & Precision Using Recall & Precision Conduct query searches Try many different queries Results may depend on sampling queries. Compare results of Precision & Recall Recall & Precision need to be considered together.
73
How to calculate Recall Determine recall for the whole collection Determine recall for the whole collection Take a random sample to estimate Use a broad query to select a sample collection for the estimation Use “seed” documents Use relative recall Use relative recall Use two more expert searches as the base. Use one system as the base to estimate recall on other systems Use a small test collection Use a small test collection Use experts to judge relevance of every document. Prepare special collections.
74
Functionalities Precision and Recall are particularly useful for evaluating searching/indexing algorithms, and system features. Precision and Recall are particularly useful for evaluating searching/indexing algorithms, and system features. Compare P & R with and without fuzzy search. Compare P & R with different type of indexing options Compare P & R across systems with the sample features Precision and Recall are query-oriented, not system-oriented. Precision and Recall are query-oriented, not system-oriented.
75
Evaluation without P & R The emphasis should be on the user and the interaction. The emphasis should be on the user and the interaction. Be specific on data collection How data are collected and indexed? Is there a quality control for the data collection? Be creative on the test questions and methods Not just questionnaires Be selective on subject groups
76
Quality Evaluation Data quality Data quality Coverage of database It will not be found if it is not in the database. Completeness and accuracy of data Indexing methods and indexing quality It will not be found if it is not indexed. indexing types currency of indexing ( Is it updated often?) indexing sizes
77
Interface Consideration User friendly interface User friendly interface How long does it take for a user to learn advanced features? How well can the user explore or interact with the query output? How easy is it to customize output displays?
78
User Satisfaction User satisfaction User satisfaction The final test is the user! User satisfaction is more important then precision and recall Measuring user satisfaction Measuring user satisfaction Survey Use statistics User experiments
79
User Experiments Observe and collect data on Observe and collect data on System behaviors User search behaviors User-system interaction Interpret experiment results Interpret experiment results for system comparisons for understanding user’s information seeking behaviors for developing new retrieval systems/interfaces
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.