Query Caching in Agent-based Distributed Information Retrieval Query Caching in Agent Based DIR September 27, 2002 Query Caching in Agent-based Distributed Information Retrieval Hemali Majithia October 24, 2002 Hemali Majithia - CADIP, UMBC Hemali Majithia, CADIP, UMBC
Hemali Majithia - CADIP, UMBC Problem Definition DIR (IR) systems access their collections to perform searches and answer queries Query resolution on large corpora is expensive in terms of time and resources Similar queries produce similar results Repetitive and redundant searching of the collections Resource Wastage and Inefficiency Solution – “ CACHING QUERIES ” October 24, 2002 Hemali Majithia - CADIP, UMBC
Hemali Majithia - CADIP, UMBC Solution Caching Mechanism Cache new queries along with the results Answer future similar queries using the cached queries New Query Query which has not been answered before Similar Query Query which is identical or similar to the queries existing in the cache Emphasis If similar queries exist, you can retrieve the results for those queries from the previous searched queries rather than exact match Retrieval linear time collection size October 24, 2002 Hemali Majithia - CADIP, UMBC
Hemali Majithia - CADIP, UMBC Caching Mechanism Two level Caching Mechanism First level Exact Match Second level Inverted Index of the queries Caching Algorithm Least Recent Used (LRU) Least Frequent Used (LFU) Lowest Relative Value (LRV) Similarity Metric Cosine Similarity October 24, 2002 Hemali Majithia - CADIP, UMBC
Hemali Majithia - CADIP, UMBC Caching in CARROT–II Node I Node II Secondary Cache Primary cache 5. Miss Query Agent 4. Query forwarded 9.. Update cache 10. Results returned 1. User query 11. Response 6. Query forwarded to best C2 C2 Agent C2 Agent C2 Agent C2 Agent C2 Agent C2 Agent 3. MISS 2. Lookup 8. HIT 7. Lookup October 24, 2002 Hemali Majithia - CADIP, UMBC
Metrics for Evaluation of Caching Mechanism Efficiency Round Trip Time (RTT) = Total time to answer queries fired at the system Hit Rate = For each agent cache and total hit rate Cost of caching = The over head caused by caching (assuming that the HIT rate is 0) Effectiveness Precision = fraction of retrieved documents that are relevant Recall =fraction of relevant documents that are retrieved October 24, 2002 Hemali Majithia - CADIP, UMBC