Download presentation
Presentation is loading. Please wait.
Published byGloria Jordan Modified over 8 years ago
1
On Caching Search Engine Query Results Evangelos Markatos Evangelos Markatoshttp://archvlsi.ics.forth.gr/OS/os.html Computer Architecture and VLSI Systems Division Institute of Computer Science Foundation for Research and Technology Hellas Heraklion, Crete, Greece
2
CARV ICS, FORTH Outline Introduction - The Problem: Introduction - The Problem: Web Caching has focused on static data: an ever- decreasing percentage of URL requests Caching Dynamic Data Caching Dynamic Data Search Engine Query Results There exists significant locality of reference There exists significant locality of reference i.e. different people ask the same queries Medium-sized caches can exploit this locality Medium-sized caches can exploit this locality Conclusions Conclusions
3
CARV ICS, FORTH Caching static data is not enough anymore Web Caching has focused on static documents (files) Web Caching has focused on static documents (files) html pages, images, videos BUT: 40% of http requests are to dynamic data [Wolman 99] 40% of http requests are to dynamic data [Wolman 99] up from 7% in 1997 it will probably increase in the future
4
CARV ICS, FORTH Caching Search Engine Query Results Queries represent: Queries represent: 14% of all URL requests (1 out of 7) 30-50% of non-image URL requests (1 out of 3) Caching Query Results may Caching Query Results may increase overall hit rate reduce network traffic reduce search engine overload reduce client latency
5
CARV ICS, FORTH Caching Query Results Where? Where? At the client side little reuse - small hit rates At the proxy medium reuse At the (Web/database) server using inverse proxies - accelerators –maximum reuse - highest hit rates –controlled environment –close interaction with database
6
CARV ICS, FORTH Caching at the Web Server Avoids re-evaluation of the query Avoids re-evaluation of the query reduces computation overhead forking processes to process queries processing of database buffers reduces I/O (DB index and data) requests Main memory caching Main memory caching avoids disk requests
7
CARV ICS, FORTH Caching at the Web Werver Query Cache Database server Hit? no Query reply Query Reply Query request yes Search Engine
8
CARV ICS, FORTH The Traces 1M queries from EXCITE 1M queries from EXCITE 927,010 are keyword-based queries 927,010 are keyword-based queries FORMAT: FORMAT: uidkeywords user-id1dogs(first page) user-id1dogs(second page) user-id1dogs & cats (first page) user-id2 california (first page) Definition: Query is a single page of results of a keyword-based search
9
CARV ICS, FORTH Locality of Reference: Are there any popular Queries? Although people have a wide variety of interests there exist some very popular query topics Most popular query: 2219 accesses Most popular query: 2219 accesses 1000th most popular: 27 accesses 1000th most popular: 27 accesses
10
CARV ICS, FORTH What % of requests goes to popular Queries? 100 queries amount for 2.5% of the accesses 100 queries amount for 2.5% of the accesses 1000 queries amount for 7% of the accesses 1000 queries amount for 7% of the accesses
11
CARV ICS, FORTH Cache Placement All query requests are cached All query requests are cached All queries have the same size All queries have the same size 1 page of results at a time (~ 4Kbytes) All queries are served by one server All queries are served by one server
12
CARV ICS, FORTH Cache Replacement Cache Replacement using Cache Replacement using LRU (least recently used) keeps a queue sorted on the access time new accesses move to the head of the queue tail of the queue may be evicted SLRU much like LRU but: –accessing non-cached URLs puts them in the middle (not head) of sorted queue frequently accessed queries are given better chances of staying in the cache
13
CARV ICS, FORTH LRU Accessing: Time:1234 Hot Cold MRU LRU
14
CARV ICS, FORTH SLRU Accessing: Time:1234 Hot Cold MRU LRU
15
CARV ICS, FORTH Cache Effectiveness Hit Rate increases sharply with cache size: Hit Rate increases sharply with cache size: Max Hit Rate: 25% Max Hit Rate: 25% Frequency of reference important for small caches Frequency of reference important for small caches
16
CARV ICS, FORTH Using Warm Caches Use warm caches (1.6 Gbytes in size) Use warm caches (1.6 Gbytes in size) hit rate: calculated only for for the last 50K reqs max hit rate: 29% 1 our of 3.5 queries can be found in the cach
17
CARV ICS, FORTH Static Caching Don’t cache the recent queries Don’t cache the recent queries Cache the popular ones Cache the popular ones no cache pollution no cache replacement overheadBUT: may miss recent queries e.g. due to an earthquake yesterday’s popular queries may not be popular anymore
18
CARV ICS, FORTH Static Caching: Performance Static Caching: Static Caching: calculate popular queries of the first half traces cache them throughout the second half Static Caching is good for small caches Static Caching is good for small caches
19
CARV ICS, FORTH Related Work Alta-Vista traces [Silverstein 98] Alta-Vista traces [Silverstein 98] 1 billion-long query trace avg. number of accesses per query: 4 - 75% hit rate Active Caching [Zhang98, Meira99] Active Caching [Zhang98, Meira99] Cache at the proxy execute a server-provided “cachelet” on hit Query Containment [Luo00] Query Containment [Luo00] evaluate subqueries from cached queries “dogs and cats” is contained in “dogs”
20
CARV ICS, FORTH Conclusions Queries have locality of reference Queries have locality of reference 30% in our trace (75% in AV trace) Medium-size caches are effective Medium-size caches are effective 256 Mbytes result in 20% hit rate even higher (30%) for warm caches Both frequency and recency count Both frequency and recency count Static Caching is effective Static Caching is effective for small cache sizes
21
On Caching Search Engine Query Results Evangelos Markatos Evangelos Markatos http://archvlsi.ics.forth.gr/OS/os.html Computer Architecture and VLSI Systems Division Institute of Computer Science Foundation for Research and Technology Hellas Heraklion, Crete, Greece
22
CARV ICS, FORTH Temporal Locality 1,639 queries resubmitted in less than 100 time units 1,639 queries resubmitted in less than 100 time units 14K queries resubmitted in less than 1K time units 14K queries resubmitted in less than 1K time units
23
CARV ICS, FORTH Freshness of cached data Dynamic Caching may return stale data Dynamic Caching may return stale data But: But: our caching lasts for a few/several hours search engine data are several weeks old S. Engines dot archive the entire web every day Thus: Thus: Caching does not return more stale data
24
CARV ICS, FORTH Popular queries 1sex 2sex (second page) 3yahoo 4playboy 5chat 6porn 7princess diana 8adult-related 9sex (third page) 10adult-related 11adult-related 12jokes 13hotmail 14chat rooms 15music
25
CARV ICS, FORTH DB Caching [Labrinidis and Roussopoulos 00] [Labrinidis and Roussopoulos 00] Web server caching Web server caching is 1-2 orders of magnitude better than db caching gets better with load update rates
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.