Download presentation
Presentation is loading. Please wait.
Published byRafe Johnson Modified over 8 years ago
2
Internet Cache Protocol Erez Tal Assaf Oren Avner Cohen Submission Date: 5/2/01 Guides: Ran Wolff and Itai Dabran
3
Introduction “Confusion creates jobs.” Hoffsted’s Employment Principle Technology has advanced greatly since the first bits had been transmitted over what we call now “The Internet”. However, one of the major problems remains getting what we need fast. Although bandwidth of communication lines has widened substantially, it can’t compete with the enormous growing rate of the number of people who use the Internet, nor the size of the content which they want to download.
4
“A good solution can be successfully applied to almost any problem.” Big Al’s Law Why Proxy? One of the common solutions for optimizing the Internet performance is the Internet Cache, or “Proxy”. Like caches in other systems, the proxy is a mediator between the client, which is usually the personal computer connected to the Internet, and the server, in our case, an HTTP server. The proxy contains large-scale storage devices in order to store copies of web pages and objects, which might be requested by one of the clients.
5
Why Proxy? When our client requests a web page, the proxy will check first if a local copy exists. If it finds a valid copy of this page, it returns the page immediately to the client. Only when a local copy of the page is not found, it connects to the origin server (the server which originally contains the page) and fetches the page for the client. Client Origin Web Server Proxy HTTP The Internet
6
Major Advantages Getting a page from the local proxy of the ISP (Internet Service Provider) is much faster than getting it from the origin server, which might be far away. The amount of requests which go beyond the local network decreases dramatically. This allows the ISP to pay less for bandwidth and reduces “traffic jams” over the global communication lines of the Internet.
7
“New systems generate new problems.” The Fundamental Theorem of Systemantics Difficulties and Problems Adding our proxy to the equation has many advantages, but it also produces many problems. In this project we focused on two major issues.
8
Difficulties and Problems Our ISP might serve a great number of clients. If a proxy is installed, the clients will send all their requests through it. The proxy system, however powerful, might have a hard time handling a large number of concurrent requests. In addition, communication lines to it will be easily jammed, and its storage devices will be quickly filled up.
9
Difficulties and Problems Handling a request of a client must be as fast as possible. Otherwise, the proxy will be jammed with requests and the time to deal with a coming request will grow steadily. This isn’t what we meant when thought about this solution! One of the major time-consuming activities that the proxy does for each request is searching its huge database for the requested object. This time is one parameter that must be decreased in order to get good performance.
10
“The only important information in a hierarchy is who knows what.” Gate’s Law The Solution The solution we implemented to these problems divides into two parts: ICP – Internet Cache Protocol The main part is adding a bi-directional messaging protocol to our proxy server. The ICP protocol will allow an ISP to operate several different proxy servers simultaneously, while keeping it simple for the clients which use the proxy service.
11
The Solution Operating more than one proxy decreases the load on each proxy, while ICP allows them to share the web pages and objects they hold. If a proxy in such a cluster doesn’t have a requested object locally, it will ask the other proxies in the cluster if they have this object before fetching it from the origin server. This way we get a parallel and distributed system that function much like a regular proxy server, as far as the client knows. Client Proxy Cluster HTTP The Internet ICP Origin Web Server
12
The Solution Bloom Filters The problem of quickly determining if a proxy holds a specific object in the cache becomes even more serious as the topology of the proxy cluster gets more complicated. Bloom Filter is a data structure that allows the proxy to return replies rapidly. The Bloom Filter is a bit array that maps the existence of a web object in the local cache. By setting several bits according to its hash function, it can go over these bits when determining if a requested object exists locally, instead of searching through a complicated data structure.
13
The Solution In a more sophisticated implementation, this bit array might be sent frequently to each proxy in the cluster. This way a proxy can determine whether other proxies in the cluster contain an object, without keeping constant communication open with them, which is mush faster.
14
ICP – The Protocol An HTTP request is received by a proxy from a client. Local cache reports “MISS”. The proxy sends an ICP Query packet to the other proxies in the cluster (depending in its configuration). The proxies that receive this request, check with their local cache and send back to the querying proxy an ICP Reply which usually reports “HIT” or “MISS”. The querying proxy decides according to the replies and also to internal timeouts whether to get the requested object from another proxy or from the origin server. The object is fetched and usually cached locally.
15
ICP – The Protocol Client Proxy Cluster HTTP The Internet ICP GET MISS QUERY MISSHIT MISS Origin Web Server REPLY HTTP Example: “HIT” in a sibling proxy
16
ICP – The Protocol Client Proxy Cluster HTTP The Internet ICP GET MISS QUERY MISS Origin Web Server REPLY Example: “MISS” in all siblings SECHO HTTP
17
ICP – The Protocol LengthVersionOpcode Request Number Options Option Data Sender Address Payload ICP Packet HIT, MISS, QUERY, etc. URL
18
ICP – Topologies ICP allows defining complex topologies of ICP servers. An ICP server may be defined as a “parent” of another ICP server. Such a definition would allow the querying server to request an object from its parent servers, even when they send it a MISS as a reply. This will be an HTTP request, so the parent server will handle it as a regular client request. Client Proxy Cluster HTTP The Internet ICP Origin Web Server QUERY MISS HTTP REPLY
19
ICP – Topologies ICP servers may use a multicast UDP port to send and receive ICP queries. ICP replies are sent in unicast mode anyway. Proxy Cluster Multicast Address QUERY REPLY
20
Implementation - Schema Proxy Server HTTP Client ICP Handler ICP Sender Query Neighbor s Query Handler Reply Neighbors Config Response Structure Query Queue ICP Listener Replie s Querie s ICP Packets from Neighbors ICP PROXY Bloom Filtered Cache Local HIT? HTTP or ICP Server ICP Servers Original URL Alternative URL Replies for Current Request Querie s Statistics Database
21
The Proxy Each HTTP client request is handled by a new thread which runs the following algorithm (for cacheable URLs): If (HIT in local cache) then Send back requested object to the client. Else Start ICP Handler. Repeat Get alternative URL from ICP Handler. Fetch object header from URL. Until (object is found in URL). Fetch the entire object and transfer it to the client. If (object is cacheable) Cache object.
22
The ICP Handler Initiated by the proxy for each cacheable URL which is missing from the local cache. Initialization: Get a number for this request Open a new replies-queue in the Response Structure for this request number. Send ICP Queries to every suitable neighbor using the ICP Sender. Start ICP process timeout.
23
The ICP Handler On each request from the proxy thread, return a URL using the following algorithm: If (original URL already returned) Return original URL and set status flag to ENDICP If (original URL should be returned next) Return original URL and set status flag to RETORIGIN Repeat If a timeout is over, get the best URL so far. Get a reply from Response Structure (wait till timeout if empty). If (First MISS or DECHO from a parent proxy) Save URL as first miss from a parent proxy. Set first-miss timeout. Until (HIT received) Return the URL.
24
The ICP Handler Timeouts: The algorithm will stop and return the best URL so far when reaching at least one of the following timeouts: Overall process timeout – set at initialization. SECHO timeout – set as the origin server sends back an SECHO packet. First-Miss-Parent timeout – set as the first MISS/DECHO is received from a parent proxy server. The best URL so far? Usually, this is the original URL, but when getting the first miss from a parent proxy, we try requesting the object from it first.
25
Handling ICP Packets The ICP Listener: Listens at the ICP port (e.g. 3130) in order to receive ICP packets. Each received packet will be stored according to its opcode: ICP Queries – in the Query Queue ICP Replies (HIT, MISS, etc.) – in the Response Structure The Query Queue A simple FIFO queue with a limited size for storing queries from other proxy servers The Response Structure A hash table of queues, which holds a queue for each request sent by the ICP Handler. The unique request number is the key.
26
The Query Handler A multi-threaded handler for queries from other proxies. Each thread runs the following algorithm: Get a query from the Query Queue. If (querying server has permission to query) Check in local cache for the existence of the requested URL, and return HIT or MISS accordingly. Else Return DENY unless deny counters exceeds a specified limit. Reducing the chance for denial of service.
27
The ICP Sender A service module for sending ICP packets to other ICP servers simultaneously. For each ICP server configured, the ICP Sender will hold a queue of ICP packets that should be sent to it. Multiple threads will get the packets from the queue and send them to their destination. In addition, such a queue will be held for SECHO packets sent to origin servers, and another one to DENIED packets sent to improperly-configured proxy servers. ICP Packet Dest #1 Dest #2 DENIED SECHO Packets sent to Dest# 2
28
The Statistics Database For each configured or non-configured ICP server, the following information: IP, ICP Port, HTTP Port, Server Type (ICP/NON ICP), Relation (Sibling/Parent), Communication Type (Unicast/Multicast). Absolute counters of ICP packets sent to and received from the ICP server, for each ICP opcode. These counters count events since the system started working. Counters of ICP packets sent to and received from the ICP server, for each ICP opcode. These counters are decreased every period of time by the Statistics Descend thread. They are used as an approximation of the number of recent events in the past. These counters are used to decide whether to change a status of an ICP server.
29
The Statistics Database A list of domains handled by this ICP server. Only URLs of domains from this list will be requested from this server, unless the list is empty, which means that this server handles all domains. Handle Peer Queries flag – REGULAR mode or DENY mode are set by system administrator in the configuration. An ICP server in DENY mode which sends ICP queries may be changed to NOREPLY mode. Send Query State flag – TRUE if queries should be sent to this ICP server. System administrator may set this flag in the configuration. If a server does not reply constantly or sends a high percentage of DENIED or ERROR packets, it might change its state to FALSE. This module shows several statistics on-screen.
30
Bloom Filters This proxy server implementation uses Bloom Filters as a faster way to check the existence of a web object locally. However, this check is less accurate. When Inserting a URL, several bits in the bit array of the Bloom Filter are set. These bits are determined by a hash function operated on the URL. 11111 http://www.icp.com
31
Bloom Filters Now checking for URL existence is simple. The same hash function is operated on the URL, and the matching bits are checked. 11111 http://www.icp.com ? As we’ve already mentioned, this system isn’t accurate. Many web objects might be mapped to the same bits. In this case we might get a false hit when the bits of object we’re checking are already set because of other objects. However, we won’t get a false miss.
32
Bloom Filters There might be an even bigger problem with Bloom Filters. How do we delete a URL from the database and update the bit array? Since more than one URL might have set the bits of the deleted URL, we won’t know whether to reset that bit or not. The solution we implemented is keeping another array, at the same size of the bit array. This integers array will keep the number of URLs that marked each bit. On delete, we will decrease the counter in this array, and when it goes down to 0, we can reset the matching bit. 111110000222033
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.