Query Caching in Agent-based Distributed Information Retrieval

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Multimedia Database Systems
Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.
Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
Under The Hood [Part I] Web-Based Information Architectures MSEC – Mini II 28-October-2003 Jaime Carbonell.
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou Shawn Jeffery CS294-4 Peer-to-Peer Systems.
Search Engines and Information Retrieval
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
ISP 433/633 Week 10 Vocabulary Problem & Latent Semantic Indexing Partly based on G.Furnas SI503 slides.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
CS/Info 430: Information Retrieval
Evaluating the Performance of IR Sytems
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Information Retrieval
CS246 Basic Information Retrieval. Today’s Topic  Basic Information Retrieval (IR)  Bag of words assumption  Boolean Model  Inverted index  Vector-space.
LIS618 lecture 11 i/r performance evaluation Thomas Krichel
IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.
Search Engines and Information Retrieval Chapter 1.
1 CS 430 / INFO 430 Information Retrieval Lecture 2 Text Based Information Retrieval.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
Querying Structured Text in an XML Database By Xuemei Luo.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan.
Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Diversifying Search Results Rakesh AgrawalSreenivas GollapudiSearch LabsMicrosoft Research Alan HalversonSamuel.
1 Information Retrieval LECTURE 1 : Introduction.
Performance Measurement. 2 Testing Environment.
What Does the User Really Want ? Relevance, Precision and Recall.
An Overview of Proxy Caching Algorithms Haifeng Wang.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Web Information Retrieval Textbook by Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze Notes Revised by X. Meng for SEU May 2014.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Introducing Information Retrieval and Web Search.
A Metric Cache for Similarity Search fabrizio falchi claudio lucchese salvatore orlando fausto rabitti raffaele perego.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
10/24/2002R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium, 2002.
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
Gleb Skobeltsyn Flavio Junqueira Vassilis Plachouras
Information Retrieval in Practice
Collection Fusion in Carrot2
Information Retrieval in Practice
Text Based Information Retrieval
Chapter 11: Indexing and Hashing
Modern Information Retrieval
IR Theory: Evaluation Methods
موضوع پروژه : بازیابی اطلاعات Information Retrieval
Data Mining Chapter 6 Search Engines
Q4 Measuring Effectiveness
Evaluation of IR Performance
CS246: Information Retrieval
Retrieval Utilities Relevance feedback Clustering
Information Retrieval and Web Design
Chapter 11: Indexing and Hashing
Presentation transcript:

Query Caching in Agent-based Distributed Information Retrieval Query Caching in Agent Based DIR September 27, 2002 Query Caching in Agent-based Distributed Information Retrieval Hemali Majithia October 24, 2002 Hemali Majithia - CADIP, UMBC Hemali Majithia, CADIP, UMBC

Hemali Majithia - CADIP, UMBC Problem Definition DIR (IR) systems access their collections to perform searches and answer queries Query resolution on large corpora is expensive in terms of time and resources Similar queries produce similar results Repetitive and redundant searching of the collections Resource Wastage and Inefficiency Solution – “ CACHING QUERIES ” October 24, 2002 Hemali Majithia - CADIP, UMBC

Hemali Majithia - CADIP, UMBC Solution Caching Mechanism Cache new queries along with the results Answer future similar queries using the cached queries New Query Query which has not been answered before Similar Query Query which is identical or similar to the queries existing in the cache Emphasis If similar queries exist, you can retrieve the results for those queries from the previous searched queries rather than exact match Retrieval  linear time  collection size October 24, 2002 Hemali Majithia - CADIP, UMBC

Hemali Majithia - CADIP, UMBC Caching Mechanism Two level Caching Mechanism First level  Exact Match Second level  Inverted Index of the queries Caching Algorithm Least Recent Used (LRU) Least Frequent Used (LFU) Lowest Relative Value (LRV) Similarity Metric Cosine Similarity October 24, 2002 Hemali Majithia - CADIP, UMBC

Hemali Majithia - CADIP, UMBC Caching in CARROT–II Node I Node II Secondary Cache Primary cache 5. Miss Query Agent 4. Query forwarded 9.. Update cache 10. Results returned 1. User query 11. Response 6. Query forwarded to best C2 C2 Agent C2 Agent C2 Agent C2 Agent C2 Agent C2 Agent 3. MISS 2. Lookup 8. HIT 7. Lookup October 24, 2002 Hemali Majithia - CADIP, UMBC

Metrics for Evaluation of Caching Mechanism Efficiency Round Trip Time (RTT) = Total time to answer queries fired at the system Hit Rate = For each agent cache and total hit rate Cost of caching = The over head caused by caching (assuming that the HIT rate is 0) Effectiveness Precision = fraction of retrieved documents that are relevant Recall =fraction of relevant documents that are retrieved October 24, 2002 Hemali Majithia - CADIP, UMBC