On Caching Search Engine Query Results Evangelos Markatos Evangelos Markatoshttp://archvlsi.ics.forth.gr/OS/os.html Computer Architecture and VLSI Systems.

Slides:

Advertisements

Similar presentations

Dissemination-based Data Delivery Using Broadcast Disks.

Advertisements

Song Jiang1 and Xiaodong Zhang1,2 1College of William and Mary

A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:

ARC: A SELF-TUNING, LOW OVERHEAD REPLACEMENT CACHE

1 Cache and Caching David Sands CS 147 Spring 08 Dr. Sin-Min Lee.

Cache Definition Cache is pronounced cash. It is a temporary memory to store duplicate data that is originally stored elsewhere. Cache is used when the.

Cache Memory By JIA HUANG. "Computer Science has only three ideas: cache, hash, trash.“ - Greg Ganger, CMU.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

1 Prefetching the Means for Document Transfer: A New Approach for Reducing Web Latency 1. Introduction 2. Data Analysis 3. Pre-transfer Solutions 4. Performance.

The Effect of Consistency on Cache Response Time John Dilley and HP Laboratories IEEE Network, May-June 2000 Chun-Fu Kung System Laboratory Dept. of Computer.

Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.

ECE 7995 CACHING AND PREFETCHING TECHNIQUES. Locality In Search Engine Queries And Its Implications For Caching By: LAKSHMI JANARDHAN – ba8671 JUNAID.

Improving Proxy Cache Performance: Analysis of Three Replacement Policies John Dilley and Martin Arlitt IEEE internet computing volume3 Nov-Dec 1999 Chun-Fu.

Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,

Towards a Better Understanding of Web Resources and Server Responses for Improved Caching Craig E. Wills and Mikhail Mikhailov Computer Science Department.

1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.

Web Caching Robert Grimm New York University. Before We Get Started  Illustrating Results  Type Theory 101.

Web-Conscious Storage Management for Web Proxies Evangelos P. Markatos, Dionisios N. Pnevmatikatos, Member, IEEE, Michail D. Flouris, and Manolis G. H.

1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.

Web Caching Schemes For The Internet – cont. By Jia Wang.

Least Popularity-per-Byte Replacement Algorithm for a Proxy Cache Kyungbaek Kim and Daeyeon Park. Korea Advances Institute of Science and Technology (KAIST)

The Medusa Proxy A Tool For Exploring User- Perceived Web Performance Mimika Koletsou and Geoffrey M. Voelker University of California, San Diego Proceeding.

CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.

Internet Research Search Engines & Subject Directories.

Hybrid Prefetching for WWW Proxy Servers Yui-Wen Horng, Wen-Jou Lin, Hsing Mei Department of Computer Science and Information Engineering Fu Jen Catholic.

HTTP; The World Wide Web Protocol

1 One-Click Hosting Services: A File-Sharing Hideout Demetris Antoniades Evangelos P. Markatos ICS-FORTH Heraklion,

1 Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias INFOCOM ‘99.

1 Cache Me If You Can. NUS.SOC.CS5248 OOI WEI TSANG 2 You Are Here Network Encoder Sender Middlebox Receiver Decoder.

On the Scale and Performance of Cooperative Web Proxy Caching University of Washington Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin,

Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.

Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K. Vaidyanathan P. Balaji H. –W. Jin D.K. Panda Network-Based.

Search Engine Caching Rank-preserving two-level caching for scalable search engines, Paricia Correia Saraiva et al, September 2001

Web Caching By Neeraj Agrawal. Caching Caching is widely used for improving performance in many context( e.g processor caches in hardware, buffer pool.

« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)

CH1. Hardware: CPU: Ex: compute server (executes processor-intensive applications for clients), Other servers, such as file servers, do some computation.

Search Engine Comparisons By: Thomie Ventura. Search Engines Today, much, but not all, of the work we do revolves around the web Today, much, but not.

Design and Analysis of Advanced Replacement Policies for WWW Caching Kai Cheng, Yusuke Yokota, Yahiko Kambayashi Department of Social Informatics Graduate.

Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,

Buffers Let’s go for a swim. Buffers A buffer is simply a collection of bytes A buffer is simply a collection of bytes – a char[] if you will. Any information.

An Effective Disk Caching Algorithm in Data Grid Why Disk Caching in Data Grids?  It takes a long latency (up to several minutes) to load data from a.

Multicache-Based Content Management for Web Caching Kai Cheng and Yahiko Kambayashi Graduate School of Informatics, Kyoto University Kyoto JAPAN.

A Method for Transparent Admission Control and Request Scheduling in E-Commerce Web Sites S. Elnikety, E. Nahum, J. Tracey and W. Zwaenpoel Presented By.

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

On The Cooperation of Web Clients and Proxy Caches Yiu Fai Sit, Francis C.M. Lau, Cho-Li Wang Department of Computer Science The University of Hong Kong.

Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.

Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.

1 WWW. 2 World Wide Web Major application protocol used on the Internet Simple interface Two concepts –Point –Click.

1 Part VII Component-level Performance Models for the Web © 1998 Menascé & Almeida. All Rights Reserved.

Project Presentation By: Dean Morrison 12/6/2006 Dynamically Adaptive Prepaging for Effective Virtual Memory Management.

Elastic Cloud Caches for Accelerating Service-Oriented Computations Gagan Agrawal Ohio State University Columbus, OH David Chiu Washington State University.

ExLRU : A Unified Write Buffer Cache Management for Flash Memory EMSOFT '11 Liang Shi 1,2, Jianhua Li 1,2, Chun Jason Xue 1, Chengmo Yang 3 and Xuehai.

An Overview of Proxy Caching Algorithms Haifeng Wang.

MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.

ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 7 – Buffer Management.

for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.

Overview on Web Caching COSC 513 Class Presentation Instructor: Prof. M. Anvari Student name: Wei Wei ID:

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

Computer Science Department 1 Studying the Impact of More Complete Server Information on Web Caching Craig E. Wills and Mikhail Mikhailov Worcester Polytechnic.

General Architecture of Retrieval Systems 1Adrienn Skrop.

WWW and HTTP King Fahd University of Petroleum & Minerals

Cache Memory Presentation I

Memory Management for Scalable Web Data Servers

On the Scale and Performance of Cooperative Web Proxy Caching

Search Engines & Subject Directories

Group 3: Olena Hunsicker and Divya Josyula

Search Engines & Subject Directories

Search Engines & Subject Directories

Client-Server Model: Requesting a Web Page

Presentation transcript:

On Caching Search Engine Query Results Evangelos Markatos Evangelos Markatoshttp://archvlsi.ics.forth.gr/OS/os.html Computer Architecture and VLSI Systems Division Institute of Computer Science Foundation for Research and Technology Hellas Heraklion, Crete, Greece

CARV ICS, FORTH Outline Introduction - The Problem: Introduction - The Problem:  Web Caching has focused on static data: an ever- decreasing percentage of URL requests Caching Dynamic Data Caching Dynamic Data  Search Engine Query Results There exists significant locality of reference There exists significant locality of reference i.e. different people ask the same queries Medium-sized caches can exploit this locality Medium-sized caches can exploit this locality Conclusions Conclusions

CARV ICS, FORTH Caching static data is not enough anymore Web Caching has focused on static documents (files) Web Caching has focused on static documents (files)  html pages, images, videos  BUT: 40% of http requests are to dynamic data [Wolman 99] 40% of http requests are to dynamic data [Wolman 99]  up from 7% in 1997  it will probably increase in the future

CARV ICS, FORTH Caching Search Engine Query Results Queries represent: Queries represent:  14% of all URL requests (1 out of 7)  30-50% of non-image URL requests (1 out of 3) Caching Query Results may Caching Query Results may  increase overall hit rate  reduce network traffic  reduce search engine overload  reduce client latency

CARV ICS, FORTH Caching Query Results Where? Where?  At the client side little reuse - small hit rates  At the proxy medium reuse  At the (Web/database) server using inverse proxies - accelerators –maximum reuse - highest hit rates –controlled environment –close interaction with database

CARV ICS, FORTH Caching at the Web Server Avoids re-evaluation of the query Avoids re-evaluation of the query  reduces computation overhead forking processes to process queries processing of database buffers  reduces I/O (DB index and data) requests Main memory caching Main memory caching  avoids disk requests

CARV ICS, FORTH Caching at the Web Werver Query Cache Database server Hit? no Query reply Query Reply Query request yes Search Engine

CARV ICS, FORTH The Traces 1M queries from EXCITE 1M queries from EXCITE 927,010 are keyword-based queries 927,010 are keyword-based queries FORMAT: FORMAT: uidkeywords user-id1dogs(first page) user-id1dogs(second page) user-id1dogs & cats (first page) user-id2 california (first page) Definition: Query is a single page of results of a keyword-based search

CARV ICS, FORTH Locality of Reference: Are there any popular Queries? Although people have a wide variety of interests there exist some very popular query topics Most popular query: 2219 accesses Most popular query: 2219 accesses 1000th most popular: 27 accesses 1000th most popular: 27 accesses

CARV ICS, FORTH What % of requests goes to popular Queries? 100 queries amount for 2.5% of the accesses 100 queries amount for 2.5% of the accesses 1000 queries amount for 7% of the accesses 1000 queries amount for 7% of the accesses

CARV ICS, FORTH Cache Placement All query requests are cached All query requests are cached All queries have the same size All queries have the same size  1 page of results at a time (~ 4Kbytes) All queries are served by one server All queries are served by one server

CARV ICS, FORTH Cache Replacement Cache Replacement using Cache Replacement using  LRU (least recently used) keeps a queue sorted on the access time new accesses move to the head of the queue tail of the queue may be evicted  SLRU much like LRU but: –accessing non-cached URLs puts them in the middle (not head) of sorted queue frequently accessed queries are given better chances of staying in the cache

CARV ICS, FORTH LRU Accessing: Time:1234 Hot Cold MRU LRU

CARV ICS, FORTH SLRU Accessing: Time:1234 Hot Cold MRU LRU

CARV ICS, FORTH Cache Effectiveness Hit Rate increases sharply with cache size: Hit Rate increases sharply with cache size: Max Hit Rate: 25% Max Hit Rate: 25% Frequency of reference important for small caches Frequency of reference important for small caches

CARV ICS, FORTH Using Warm Caches Use warm caches (1.6 Gbytes in size) Use warm caches (1.6 Gbytes in size)  hit rate: calculated only for for the last 50K reqs  max hit rate: 29%  1 our of 3.5 queries can be found in the cach

CARV ICS, FORTH Static Caching Don’t cache the recent queries Don’t cache the recent queries Cache the popular ones Cache the popular ones  no cache pollution  no cache replacement overheadBUT:  may miss recent queries e.g. due to an earthquake  yesterday’s popular queries may not be popular anymore

CARV ICS, FORTH Static Caching: Performance Static Caching: Static Caching:  calculate popular queries of the first half traces  cache them throughout the second half Static Caching is good for small caches Static Caching is good for small caches

CARV ICS, FORTH Related Work Alta-Vista traces [Silverstein 98] Alta-Vista traces [Silverstein 98]  1 billion-long query trace  avg. number of accesses per query: % hit rate Active Caching [Zhang98, Meira99] Active Caching [Zhang98, Meira99]  Cache at the proxy  execute a server-provided “cachelet” on hit Query Containment [Luo00] Query Containment [Luo00]  evaluate subqueries from cached queries “dogs and cats” is contained in “dogs”

CARV ICS, FORTH Conclusions Queries have locality of reference Queries have locality of reference  30% in our trace (75% in AV trace) Medium-size caches are effective Medium-size caches are effective  256 Mbytes result in 20% hit rate even higher (30%) for warm caches Both frequency and recency count Both frequency and recency count Static Caching is effective Static Caching is effective  for small cache sizes

On Caching Search Engine Query Results Evangelos Markatos Evangelos Markatos  Computer Architecture and VLSI Systems Division Institute of Computer Science Foundation for Research and Technology Hellas Heraklion, Crete, Greece

CARV ICS, FORTH Temporal Locality 1,639 queries resubmitted in less than 100 time units 1,639 queries resubmitted in less than 100 time units 14K queries resubmitted in less than 1K time units 14K queries resubmitted in less than 1K time units

CARV ICS, FORTH Freshness of cached data Dynamic Caching may return stale data Dynamic Caching may return stale data But: But:  our caching lasts for a few/several hours  search engine data are several weeks old S. Engines dot archive the entire web every day Thus: Thus:  Caching does not return more stale data

CARV ICS, FORTH Popular queries 1sex 2sex (second page) 3yahoo 4playboy 5chat 6porn 7princess diana 8adult-related 9sex (third page) 10adult-related 11adult-related 12jokes 13hotmail 14chat rooms 15music

CARV ICS, FORTH DB Caching [Labrinidis and Roussopoulos 00] [Labrinidis and Roussopoulos 00] Web server caching Web server caching  is 1-2 orders of magnitude better than db caching  gets better with load update rates