Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Slides:



Advertisements
Similar presentations
Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
1 Relevance Feedback and other Query Modification Techniques 課程名稱 : 資訊擷取與推薦技術 指導教授 : 黃三益 教授 報告者 : 博一 楊錦生 (d ) 博一 曾繁絹 (d )
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Modern Information Retrieval Chapter 5 Query Operations.
1 Query Language Baeza-Yates and Navarro Modern Information Retrieval, 1999 Chapter 4.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Query Reformulation: User Relevance Feedback. Introduction Difficulty of formulating user queries –Users have insufficient knowledge of the collection.
Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms  Synonyms, stemming variations, terms.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Query Relevance Feedback and Ontologies How to Make Queries Better.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Query Expansion.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Class web page:
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
1 Query Operations Relevance Feedback & Query Expansion.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Document Clustering 文件分類 林頌堅 世新大學圖書資訊學系 Sung-Chien Lin Department of Library and Information Studies Shih-Hsin University.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
1 Computing Relevance, Similarity: The Vector Space Model.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
C.Watterscsci64031 Probabilistic Retrieval Model.
National Technical University of Ukraine “Kiev Polytechnic Institute” Heat and energy design faculty Department of automation design of energy processes.
Information Retrieval
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea.
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Hsin-Hsi Chen5-1 Chapter 5 Query Operations Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Information Retrieval and Web Search Lecture 9: Relevance feedback & query.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
User Errors in Formulating Queries and IR Techniques to Overcome Them Birger Larsen Information Interaction and Information Architecture Royal School of.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Lecture 9: Query Expansion. This lecture Improving results For high recall. E.g., searching for aircraft doesn’t match with plane; nor thermodynamic with.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Lecture 12: Relevance Feedback & Query Expansion - II
Multimedia Information Retrieval
Special Topics on Information Retrieval
موضوع پروژه : بازیابی اطلاعات Information Retrieval
Data Mining Chapter 6 Search Engines
CS 430: Information Discovery
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Retrieval Utilities Relevance feedback Clustering
Information Retrieval and Web Design
Information Retrieval and Web Design
CS 430: Information Discovery
Presentation transcript:

Query Expansion By: Sean McGettrick

What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted search. Query Expansion is the term given when a search engine adding search terms to a user’s weighted search. The goal is to improve precision and/or recall. The goal is to improve precision and/or recall. Example: User Query: “car”; Expanded Query: “car cars automobile automobiles auto” etc… Example: User Query: “car”; Expanded Query: “car cars automobile automobiles auto” etc…

Classes of Query Expansion Human and/or computer generated thesauri Human and/or computer generated thesauri Relevance feedback Relevance feedback Automatic query expansion Automatic query expansion

Query Expansion Issues Two major issues Two major issues Which terms to include? Which terms to include? Which terms to weight more? Which terms to weight more? Concept-Based vs. Term-Based Query Expansion Concept-Based vs. Term-Based Query Expansion Is it better to expand based upon the individual terms in the query, or the overall concept of the query? Is it better to expand based upon the individual terms in the query, or the overall concept of the query?

Relevance of Query Expansion Query expansion is very important on the web. Query expansion is very important on the web. The amount of information on the web is always increasing. The amount of information on the web is always increasing. In 1999, Google had 135 million pages. It now has over 3 billion. In 1999, Google had 135 million pages. It now has over 3 billion. Search engine users follow specific trends with their searches. Search engine users follow specific trends with their searches. 2-3 words 2-3 words Broad search term Broad search term Do not like to expand their queries either through refining search terms or using Boolean operators Do not like to expand their queries either through refining search terms or using Boolean operators

Thesauri What is a Thesauri in the IR world? What is a Thesauri in the IR world? “Any data structure that defines semantic relatedness between words.” “Any data structure that defines semantic relatedness between words.” Schutze and Pedersen (1997) Often more complex than normal Thesauri. Often more complex than normal Thesauri. Thought to be too broad to be useful. Thought to be too broad to be useful.

The Need For Thesauri Naturally assumed that pulling words from a thesauri would increase: Naturally assumed that pulling words from a thesauri would increase: The number of documents retrieved. The number of documents retrieved. Possibly precision. Possibly precision. The car example: “car” vs. “car, auto, automobile, vehicle, sedan, etc…” The car example: “car” vs. “car, auto, automobile, vehicle, sedan, etc…” Which would retrieve the largest number of documents? Which would retrieve the largest number of documents? Is larger necessarily better? Is larger necessarily better?

Human & Automatically Generated Thesauri Earliest work began in the 1950s. Earliest work began in the 1950s. H.P. Luhn H.P. Luhn Thesaurofacet – detailed list of engineering terms Thesaurofacet – detailed list of engineering terms Largely used in such industries as medicine, aerospace, and other technological fields. Largely used in such industries as medicine, aerospace, and other technological fields.

Drawbacks of Handcrafted Thesauri Cost Cost Development. Development. Maintenance. Maintenance. Cost often outweighs benefit. Cost often outweighs benefit. Time Time It often takes a long time for thesauri to develop. It often takes a long time for thesauri to develop. Hard to keep up with the pace of scientific and technological development. Hard to keep up with the pace of scientific and technological development.

Automatically Generated Thesauri Need grew from limitations of handcrafted thesauri. Need grew from limitations of handcrafted thesauri. No longer the cost of experts to generate thesauri. No longer the cost of experts to generate thesauri.

Automatically Generated Thesauri 3 Steps. 3 Steps. Extract word co-occurrences. Extract word co-occurrences. Define word similarities. Define word similarities. Based upon word co-occurrence or lexical relationship. Cluster words based upon their similarities. Cluster words based upon their similarities. Not proven very successful. Not proven very successful. As late as 1990 many industries were still using handcrafted thesauri. As late as 1990 many industries were still using handcrafted thesauri.

Relevance Feedback Began in the 1960s. Began in the 1960s. Significant improvement in recall and precision over early query expansion work. Significant improvement in recall and precision over early query expansion work. Basic process as follows. Basic process as follows. The user creates their initial query which returns an initial result set. The user creates their initial query which returns an initial result set. The user then selects a list of documents that are relevant to their search. The user then selects a list of documents that are relevant to their search. The system then re-weights and/or expands the query based upon the terms in the documents. The system then re-weights and/or expands the query based upon the terms in the documents.

Relevance Feedback Models Many different types of models. Many different types of models. Depend on methods and theories behind them. Depend on methods and theories behind them. Vector Space. Vector Space. Probabilistic. Probabilistic. Boolean. Boolean.

“Ide dec-hi” Method In this method, all the top ranked relevant documents are used as is the highest ranked non-relevant document. In this method, all the top ranked relevant documents are used as is the highest ranked non-relevant document. The non-relevant document is used a point in the vector space from which the feedback query is removed. The non-relevant document is used a point in the vector space from which the feedback query is removed. Up to 160% improvement over non- expanded queries. Up to 160% improvement over non- expanded queries.

Interactive Query Expansion Uses a thesaurus. Uses a thesaurus. After initial query is submitted, the system returns a list of associated and relevant words derived from both the result set and a thesaurus. After initial query is submitted, the system returns a list of associated and relevant words derived from both the result set and a thesaurus. Useful, but more research is needed. Useful, but more research is needed.

Pseudo-relevance Feedback Grew from problems involved in implementing relevance feedback systems. Grew from problems involved in implementing relevance feedback systems. Users do not like to give manual feedback to the system. Users do not like to give manual feedback to the system.

Pseudo-relevance Feedback Process The system returns an initial set of documents. The system returns an initial set of documents. The system assumes that the top n number of documents are relevant to the query. The system assumes that the top n number of documents are relevant to the query. The system takes terms from these documents to re-weight the query. The system takes terms from these documents to re-weight the query. Relies largely on the systems ability to initially retrieve relevant documents. Relies largely on the systems ability to initially retrieve relevant documents.

lol

Automatic Query Expansion The process of automatic query expansion using computer generated thesauri. The process of automatic query expansion using computer generated thesauri. Works somewhat like pseudo-relevance feedback. Works somewhat like pseudo-relevance feedback. Implementation not as useful, but still widely researched. Implementation not as useful, but still widely researched.

Term Co-occurrence Measures Process of developing relationships between words based upon their co-occurrence in documents. Process of developing relationships between words based upon their co-occurrence in documents. Clustering Clustering Documents that share a significant number of terms are grouped together. Documents that share a significant number of terms are grouped together. A thesaurus is then generated from the terms in these categories. A thesaurus is then generated from the terms in these categories. Categories sometimes too narrow or broad. Categories sometimes too narrow or broad. Does not account for synonyms. Does not account for synonyms.

Lexical Co-Occurrence Measures Instead of looking at the frequency of terms in a document, the proximity of words in a document is looked at. Instead of looking at the frequency of terms in a document, the proximity of words in a document is looked at. Context of words becomes important. Context of words becomes important. Some performance improvement shown in small document collections. Some performance improvement shown in small document collections. Not quite as good as relevance feedback, but better than pseudo-relevance feedback. Not quite as good as relevance feedback, but better than pseudo-relevance feedback.

Current State of Query Expansion Query Expansion technology has reached somewhat of a plateau. Query Expansion technology has reached somewhat of a plateau. This is due to limiting factors of relevance feedback and word co-occurrence. This is due to limiting factors of relevance feedback and word co-occurrence. Current research attempting to refine previous research in the field. Current research attempting to refine previous research in the field.

Where To Go From Here? Grammatical Based Thesauri Grammatical Based Thesauri Syntactical relationship between words Syntactical relationship between words Words placed into classes Words placed into classes Some improvement on small document collections. Failed on larger ones. Some improvement on small document collections. Failed on larger ones. AI Searching AI Searching Mostly theory Mostly theory Intelligent Agents Intelligent Agents Could be customized reflect specific needs of the user Could be customized reflect specific needs of the user Next logical step in IR, but still far off from commercial use Next logical step in IR, but still far off from commercial use

Works Cited Attardi, G., S. Di Marco and F. Sebastiani Automated Generation of Category-Specific Thesauri for Interactive Query Expansion. Grefenstette, G Use of Syntactic Context to Produce Term Association Lists for Text Retrieval. In Proceedings of the 15th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, ed. N. Belkin, P. Ingwersen and A. M. Pesjtersen: pp New York: ACM Press. Ide, E New Experiments in Relevance Feedback. In G. Salton. The SMART Retrieval System: Experiments in automatic document processing. Englewood Cliffs, NJ: Prentice-Hall. Qiu, Y., Concept Based Query Expansion. In Proceedings of SIGIR- 93, 16 th ACM International Conference on Research and Development in Information Retrieval. Schutze, H. and J. Pederson A Cooccurance-based Thesaurus and Two Applications to Information Retrieval. Information Processing and Management 33, no. 3: pp Walker, D Query Expansion Using Thesauri.