 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
IR Models: Overview, Boolean, and Vector
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Information Retrieval in Practice
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Parallel and Distributed IR
Distributed Information Retrieval Jamie Callan Carnegie Mellon University
Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
University of Kansas Data Discovery on the Information Highway Susan Gauch University of Kansas.
ICAIL 2007 DESI Workshop Panel presentation Marie-Francine Moens Centre for Law and ICT/ Department of Computer Science Katholieke Universiteit Leuven,
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
An introduction to databases In this module, you will learn: What exactly a database is How a database differs from an internet search engine How to find.
Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
Search Engines and Information Retrieval Chapter 1.
Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Data Structures & Algorithms and The Internet: A different way of thinking.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Chapter 6: Information Retrieval and Web Search
UMass at TDT 2000 James Allan and Victor Lavrenko (with David Frey and Vikas Khandelwal) Center for Intelligent Information Retrieval Department of Computer.
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Database VS. Search Engine Explore the difference between database* and search results Next.
Majid Sazvar Knowledge Engineering Research Group Ferdowsi University of Mashhad Semantic Web Reasoning.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Collecting High Quality Overlapping Labels at Low Cost Grace Hui Yang Language Technologies Institute Carnegie Mellon University Anton Mityagin Krysta.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Bell Ringer 2. Systems of Equations 4 A system of equations is a collection of two or more equations with a same set of unknowns A system of linear equations.
Reading Notes Wang Ning Lab of Database and Information Systems
Part 1: Boolean algebra & Venn diagrams Part 2: Starting Scopus
CADIAL search engine at INEX
CSCI 5417 Information Retrieval Systems Jim Martin
Learning to Rank Shubhra kanti karmaker (Santu)
Introduction to Information Retrieval
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Retrieval Performance Evaluation - Measures
Information Retrieval and Web Design
Information Retrieval and Web Design
Analysis of Structured or Semi-structured Data on a Hadoop Cluster
A Neural Passage Model for Ad-hoc Document Retrieval
Presentation transcript:

 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.

 Manmatha Introduction MetaSearch / Distributed Retrieval – Well defined problem Language Models are a good way to solve these problems. – Grand Challenge Massively Distributed Multi-lingual Retrieval

 Manmatha MetaSearch Combine results from different search engines. – Single Database – Or Highly Overlapped Databases. » Example, Web. – Multiple Databases or Multi-lingual databases. Challenges – Incompatible scores even if the same search engine is used for different databases. » Collection Differences, and engine differences. – Document Scores depend on query. Combination on a per query basis makes training difficult. Current Solutions involve learning how to map scores between different systems. – Alternative approach involves aggregating ranks.

 Manmatha Current Solutions for MetaSearch – Single Database Case Solutions – Reasonable solutions involving mapping scores either by simple normalization, equalizing score distributions, training – Rank Based methods – eg Borda counts, Markov Chains.. – Mapped scores are usually combined using linear weighting. – Performance improvement about 5 to 10%. – Search engines need to be similar in performance » May explain why simple normalization schemes work. Other Approaches – A Markov Chain approach has been tried. However, results on standard datasets are not available for comparison. – Shouldn’t be difficult to try more standard LM approaches.

 Manmatha Challenges – MetaSearch for Single Databases Can one combine search engines which differ a lot in performance effectively? – Improve performance even using poorly performing engines? How? – Or use resource selection like approach case to eliminate poorly performing engines on a per query basis. Techniques from other fields. – Techniques in economics and social sciences for voter aggregation may be useful (Borda count, Condorcet..) LM approaches – Will possibly improve performance by characterizing the scores at a finer granularity than say score distributions.

 Manmatha Multiple Databases Two main factors determine variation in document scores – Search engine scoring functions. – Collection variations which essentially change the IDF. Effective score normalization requires – Disregarding databases which are unlikely to have the answer » Resource Selection. – Normalizing out collection variations on a per query basis. – Mostly ad hoc normalizing functions. Language Models. – Resource Descriptions already provide language models for collections. – Could use these to factor out collection variations. – Tricky to do this for different search engines.

 Manmatha Multi-lingual Databases Normalizing scores across multiple databases. – Difficult Problem Possibility: – Create language models for each database. – Use simple translation models to map across databases. – Use this to normalize scores. – Difficult.

 Manmatha Distributed Web Search Distribute web search over multiple sites/servers. – Localized/ Regional. – Domain dependent. – Possibly no central coordination. – Server Selection/ Database Selection with/without explicit queries. Research Issues – Partial representations of the world. – Trust, Reliability. Peer to peer.

 Manmatha Challenges Formal Methods for Resource Descriptions, Ranking, Combination – Example. Language Modeling – Beyond collections as big documents Multi-lingual retrieval – Combining the outputs of systems searching databases in many languages. Peer to Peer Systems – Beyond broadcasting simple keyword searches. – Non-centralized – Networking considerations e.g. availability, latency, transfer time. Distributed Web Search Data, Web Data.