ICP and the Squid Web Cache Duanc Wessels k Claffy August 13, 1997 元智大學系統實驗室 宮春富 2000/01/26
Outline ⊙ Introduction ⊙ Internet Cache Protocol ⊙ Implementation of ICP in Squid ⊙ ICP Delays
What is Caching ⊙ Caching has proven a useful technique for reducing end user experienced latency on the Web. ⊙ Caching is effective because many Web Documents are requested much more than once. ⊙ Cache is the intermediate storage of copies of popular Web documents close to the end users.
HTTP ⊙ An HTTP request is comprised of three major parts: a request method, a URL, and a set of request headers. ⊙ An HTTP reply consists of a numeric result code, a set of reply headers, and an optional reply body. ⊙ GET -> download; POST -> upload. ⊙ Max-age directive: age refers to the elapsed time since the origin server provide the data.
Cache Hierarchical ⊙ A set of child cache share a common parent cache. ⊙ A simple hierarchy is not appropriate to all situations. ⊙ The ICP is to provide a quick and efficient method of intercache communication, offers a mechanism for establishing complex cache hierarchies. PARENT CHILD
ICP Message Format ⊙ A cache will query its peers by sending each one an ICP_QUERY message. ⊙ The peer will reply with either an ICP_HIT or ICP_MISS. ⊙ Other codes: ICP_DENIED 、 ICP_HIT_OBJ. 031
ICP Transport ⊙ ICP could use TCP or UDP as the underlying delivery protocol. ⊙ A UDP is simpler to implement because each cache needs to maintain only a single UDP socket. ⊙ A ICP is intended as unreliable protocol and TCP would actually be detrimental.
⊙ One advantage: a cache can quickly parse and interpret an ICP message. ⊙ Two disadvantages: ICP doesn’t match HTTP; ICP increase the request latency by at least the network round-trip time to a neighbor cache. ICP vs. HTTP
ICP Query Algorithm ⊙ Squid supports the ability to restrict the range of ICP_QUERY messages it will send to different peers. ⊙ The cache_host_domain option lets one specify which domains to query for a given peer. ⊙ Another Squid configuration parameter, hierarchy_stoplist, allows one to exclude certain requests from the ICP query algorithm.
⊙ Extract and parses the URL. (ICP_INVALID) ⊙ Check local access controls. (ICP_DENIED) ⊙ Lookup the given URL. (ICP_MISS) ⊙ If object is small enough, return an ICP_HIT_OBJ message. ⊙ Otherwise, return an ICP_HIT message. Processing an ICP query
⊙ Squid collects replies until it receives an ICP_HIT or until all ICP_MISS replies arrive. ⊙ When receiving an ICP_HIT, Squid begins retrieving the object from that peer. ⊙ If ICP_HIT_OBJ reply is first arrive, Squid just takes the object data from the ICP message payload. ⊙ If no hit reply, then Squid retrieves the object from the parent. Collect ICP replies
⊙ One of the peers becoming unreachable would significantly increase the chances of suffering the two-second timeout. ⊙ We designate a peer as dead when it fails to reply to 20 consecutive ICP queries. ⊙ We still send the ICP_QUERY messages to dead peers, we just don’t expect to receive replies from them. Detecting Unreachable Peers
⊙ The Squid will return ICP_MISS_NOFETCH instead of ICP_MISS message. ⊙ This feature allows this parent to continue serving hits, but take itself out of the peer selection process for misses. More Network Failure INTERNET PARENT ROUTER CHILD A B
⊙ One problem is that it makes the UDP packet quite a bit larger. ⊙ Another problem is they require more time to generate. ⊙ The payload must actually consist of the URL followed by the object data. ICP_HIT_OBJ
⊙ We don’t claim these measurements prove that hierarchical caching with ICP gives improved performance. ⊙ We suspect it depends on the regional and/or local network situation. ⊙ We used a special program to alternate between sending ICMP echo request and ICP_QUERY messages. ICP Delay