The Squid caching proxy Chris Wichura

Slides:



Advertisements
Similar presentations
Enabling Secure Internet Access with ISA Server
Advertisements

PowerPoint presentation of first 25 pages of instructional manual Edith Fabiyi Essentials of Internet Access.
WEB AND WIRELESS AUTOMATION connecting people and processes InduSoft Web Solution Welcome.
PACS – 06/21/14 1 Cache? What is caching? A way to increase the average rate of a process by preferentially using a copy of data in a faster, closer, probably.
SQUID Running SQUID in freeBSD Sufi Faruq Ibne Abubakar AKTEL, TMIB Bangladesh.
How the Internet Works Course Objectives Introduce the various web browsers Introduce some new terms Explain the basic Internet to PC hookup  ISP  Wired.
1 Configuring Internet- related services (April 22, 2015) © Abdou Illia, Spring 2015.
Lesson 4: Web Browsing.
Transparent Caching The art of caching network traffic without requiring user / browser side configuration.
1 Caching in HTTP Representation and Management of Data on the Internet.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Hypertext Transfer Protocol Kyle Roth Mark Hoover.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
What’s a Web Cache? Why do people use them? Web cache location Web cache purpose There are two main reasons that Web cache are used:  to reduce latency.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
Security SIG: Introduction to Tripwire Chris Harwood John Ives.
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
How Clients and Servers Work Together. Objectives Learn about the interaction of clients and servers Explore the features and functions of Web servers.
CP476 Internet Computing Browser and Web Server 1 Web Browsers A client software program that allows you to access and view Web pages on the Internet –Examples.
What’s New in WatchGuard XCS 10.0 Update 3 WatchGuard Training.
© De Montfort University, Web Servers Chris Hand And Howell Istance De Montfort University.
Computer Network (MASQ/NAT/PROXY)
Firewall and Proxy Server Director: Dr. Mort Anvari Name: Anan Chen Date: Summer 2000.
 Proxy Servers are software that act as intermediaries between client and servers on the Internet.  They help users on private networks get information.
Web Proxy Server Anagh Pathak Jesus Cervantes Henry Tjhen Luis Luna.
Chapter 22 Web Hosting and Internet Servers Xuanxuan Su.
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
Linux Operations and Administration
Course 201 – Administration, Content Inspection and SSL VPN
/dev/urandom Barry Britt, Systems Support Group Department of Computer Science Iowa State University.
Chapter 7: Using Windows Servers to Share Information.
WWW Caching George Neisser Manchester Computing University of Manchester
Implementing ISA Server Publishing. Introduction What Are Web Publishing Rules? ISA Server uses Web publishing rules to make Web sites on protected networks.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
 2001 Prentice Hall, Inc. All rights reserved. 1 Chapter 21 - Web Servers (IIS, PWS and Apache) Outline 21.1 Introduction 21.2 HTTP Request Types 21.3.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
Hour 7 The Application Layer 1. What Is the Application Layer? The Application layer is the top layer in TCP/IP's protocol suite Some of the components.
Module 11: Implementing ISA Server 2004 Enterprise Edition.
Overview of Microsoft ISA Server. Introducing ISA Server New Product—Proxy Server In 1996, Netscape had begun to sell a web proxy product, which optimized.
NetCache Architecture and Deployment Peter Danzig Network Appliance, Santa Clara, CA 元智大學 系統實驗室 陳桂慧
Internet Information Server Name : Yao Gu Date : 10-June-2000 COSC : 573.
Module 9: Implementing Caching. Overview Caching Overview Configuring General Cache Properties Configuring Cache Rules Configuring Content Download Jobs.
HTTP evolution - TCP/IP issues Lecture 4 CM David De Roure
Setup and Management for the CacheRaQ. Confidential, Page 2 Cache Installation Outline – Setup & Wizard – Cache Configurations –ICP.
Unit - III. Providing a Caching Proxy Server (1) A caching proxy server is software that stores (caches) frequently requested internet objects such as.
ITGS Network Architecture. ITGS Network architecture –The way computers are logically organized on a network, and the role each takes. Client/server network.
Free Powerpoint Templates Page 1 Free Powerpoint Templates Users and Documents.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
1 Chapter Overview Creating Web Sites and FTP Sites Creating Virtual Directories Managing Site Security Troubleshooting IIS.
27.1 Chapter 27 WWW and HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
Web Server Security: Protecting Your Pages NOAA OAR WebShop 2001 August 2 nd, 2001 Jeremy Warren.
MICROSOFT AJAX CDN (CONTENT DELIVERY NETWORK) Make Your ASP.NET site faster to retrieve.
ALL THINGS IIS TERRI DONAHUE
Presented by Michael Rainey South Mississippi Linux Users Group
Web and Proxy Server.
Module 3: Enabling Access to Internet Resources
Apache web server Quick overview.
Enabling Secure Internet Access with TMG
WWW and HTTP King Fahd University of Petroleum & Minerals
Lesson 4: Web Browsing.
Data Virtualization Tutorial… CORS and CIS
Web Caching? Web Caching:.
Introduction to Networking
Internet Networking recitation #12
Configuring Internet-related services
Lesson 4: Web Browsing.
APACHE WEB SERVER.
Presentation transcript:

The Squid caching proxy Chris Wichura

What is Squid? A caching proxy for –HTTP, HTTPS (tunnel only) –FTP –Gopher –WAIS (requires additional software) –WHOIS (Squid version 2 only) Supports transparent proxying Supports proxy hierarchies (ICP protocol) Squid is not an origin server!

Other proxies Free-ware –Apache 1.2+ proxy support (abysmally bad!) Commercial –Netscape Proxy –Microsoft Proxy Server –NetAppliances NetCache (shares some code history with Squid in the distant past) –CacheFlow ( –Cisco Cache Engine

What is a proxy? Firewall device; internal users communicate with the proxy, which in turn talks to the big bad Internet –Gate private address space (RFC 1918) into publicly routable address space Allows one to implement policy –Restrict who can access the Internet –Restrict what sites users can access –Provides detailed logs of user activity

What is a caching proxy? Stores a local copy of objects fetched –Subsequent accesses by other users in the organization are served from the local cache, rather than the origin server –Reduces network bandwidth –Users experience faster web access

How proxies work (configuration) User configures web browser to use proxy instead of connecting directly to origin servers –Manual configuration for older PC based browsers, and many UNIX browsers (e.g., Lynx) –Proxy auto-configuration file for Netscape 2.x+ or Internet Explorer 4.x+ Far more flexible caching policy Simplifies user configuration, help desk support, etc.

How proxies work (user request) User requests a page: Browser forwards request to proxy Proxy optionally verifies users identity and checks policy for right to access uniforum.chi.il.us Assuming right is granted, fetches page and returns it to user

Squids page fetch algorithm Check cache for existing copy of object (lookup based on MD5 hash of URL) If it exists in cache –Check objects expire time; if expired, fall back to origin server –Check objects refresh rule; if expired, perform an If-Modified-Since against origin server –If object still considered fresh, return cached object to requester

Squids page fetch algorithm If object is not in cache, expired, or otherwise invalidated –Fetch object from origin server –If 500 error from origin server, and expired object available, returns expired object –Test object for cacheability; if cacheable, store local copy

Cacheable objects HTTP –Must have a Last-Modified: tag –If origin server required HTTP authentication for request, must have Cache-Control: public tag –Ideally also has an Expires or Cache-Control: max-age tag –Content provider decides what header tags to include Web servers can auto-generate some tags, such as Last-Modified and Content- Length, under certain conditions FTP –Squid sets Expires time to fetch timestamp + 2 days

Non-cacheable objects HTTPS, WAIS HTTP –No Last-Modified: tag –Authenticated objects –Cache-Control: private, no-cache, and no-store tags –URLs with cgi-bin or ? in them –POST method (form submission)

Implications for content providers Caching is a good thing for you! Make cgi and other dynamic content generators return Last-Modified and Expires/Cache-Control tags whenever possible –If at all possible, also include a Content-Length tag to enable use of persistent connections Consider using Cache-Control: public, must- revalidate for authenticated web sites

Implications for content providers (continued) If you need a page hit counter, make one small object on the page non-cacheable. FTP sites, due to lack of Last-Modified timestamps, are inherently non-cacheable. Put (large) downloads on your web site instead of on, or in addition to, an FTP site.

Implications for content providers (continued) Microsofts IIS with ASP generates non- cacheable pages by default Other scripting suites (e.g., Cold Fusion) also require special work to make cacheable Squid doesnt implement support for Vary: tag yet; considers object non-cacheable Squid currently treats Cache-Control: must- revalidate as Cache-Control: private

Transparent proxying Router forwards all traffic to port 80 to proxy machine using a route policy Pros –Requires no explicit proxy configuration in the users browser

Transparent proxying Cons –Route policies put excessive CPU load on routers on many (Cisco) platforms –Kernel hacks to support it on the proxy machine are still unstable –Often leads to mysterious page retrieval failures –Only proxies HTTP traffic on port 80; not FTP or HTTP on other ports –No redundancy in case of failure of the proxy

Transparent proxying Recommendation: Dont use it! –Create a proxy auto-configuration file and instruct users to point at it –If you want to force users to use your proxy, either Block all traffic to port 80 Use a route policy to redirect port 80 traffic to an origin web server and return a page explaining how to configure the various web browsers to access the proxy

Squid hardware requirements UNIX operating system (NT is not currently supported, nor has anyone announced work on a port) 128M RAM minimum recommended (scales by user count and size of disk cache) Disk –512M to 1G for small user counts –16G to 24G for large user counts –Squid 2.x is optimized for JBOD, not RAID

File system recommendations Use Veritas vxfs if you have it Disable last accessed time updates (for example, noatime mount option on Linux) Consider increasing sync frequency If using UFS –Optimize for space instead of time

Installing Squid (overview) Get distribution from Increase maximum file descriptors available per process before configuring Squid Run configure script with desired compile-time options Run make; make install Edit squid.conf file Run Squid -z to initialize cache directory structure Start Squid daemon Test Migrate users over to proxy

Squid distributions (versions) 1.x and 1.NOVM.x –No longer supported –Entire cache lost if even one disk in cache fails –Doesnt understand Cache-Control: tag –Other problems –Bottom line: dont use them

Squid distributions (versions) 2.0, 2.1, 2.2 –Redesigned disk storage algorithm much improved –Understands Cache-Control: tag –Better LRU/refresh rule engine –Supports proxy authentication –See documentation for full list of enhancements Recommendation: 2.1 is fairly stable, but move to 2.2 when 2.2STABLE released

Squid compile-time configuration --prefix=/var/squid --enable-asyncio –Only stable on Solaris and bleeding edge Linux –Can actually be slower on lightly loaded proxies --enable-dlmalloc --enable-icmp --enable-ipf-transparent for transparent proxy support on some systems (*BSD)

Squid compile-time configuration --enable-snmp if desired --enable-delay-pools if desired --enable-cachemgr-hostname= if using an alias for proxy or building on a different machine from the target proxy machine --enable-cache-digest and/or --enable-carp if using cache hierarchies

squid.conf runtime settings Default squid.conf file is heavily commented! Read it! Must set –cache_dir (one per disk) –cache_peer (one per peer) if participating in a hierarchy –cache_mem (8-16M preferred, even for large caches) –acl rules (default rules mostly work, but must reflect your address space)

squid.conf runtime settings Recommendations –ipcache_size, fqdncache_size to 4096 –log_fqdn off (use Apaches logresolve offline) –Increase dns_children, redirect_children, authenticate_children based on usage statistics (see cachemgr.cgi front-end) –Tweak refresh_pattern rules (Danger Will Robinson! -- I suggest starting with examples found in the squid mailing list archives)

squid.conf runtime settings Recommendations (continued) –quick_abort_min 128 KB, quick_abort_max 4096 KB, quick_abort_pct 75 Tailor based on your bandwidth to the Internet By default, squid will complete retrieval of any object requested, regardless of size; can burn considerable amounts of bandwidth! Too many other options in squid.conf to cover here; you really should read all the embedded comments!

squid.conf ACL example acl manager proto cache_object acl localhost src /32 acl managerhost src /32 acl managerhost src /32 acl managerhost src /32 acl cawtech src /24 acl cawtech-internal src /16 acl all src /

squid.conf ACL example acl SSL_ports port acl gopher_ports port 70 acl wais_ports port 210 acl whois_ports port 43 acl www_ports port acl ftp_ports port 21 acl Safe_ports port acl CONNECT method CONNECT acl FTP proto FTP acl HTTP proto HTTP acl WAIS proto WAIS acl GOPHER proto GOPHER acl WHOIS proto WHOIS

squid.conf ACL example http_access deny manager !localhost !managerhost http_access deny CONNECT !SSL_ports http_access deny HTTP !www_ports !Safe_ports http_access deny FTP !ftp_ports !Safe_ports http_access deny GOPHER !gopher_ports !Safe_ports http_access deny WAIS !wais_ports !Safe_ports http_access deny WHOIS !whois_ports !Safe_ports http_access allow localhost http_access allow cawtech http_access allow cawtech-internal http_access deny all

Creating a proxy auto- configuration file Associate.pac extension with MIME type application/x-ns-proxy-autoconfig Create Javascript file and place on origin web server (suggest style URL) See Netscape documentation at elnotes/demo/proxy-live.html

Sample proxy auto-configuration function FindProxyForURL(url, host) { if (isPlainHostName(host) || dnsDomainIs(host, ".cawtech.com")) return "DIRECT"; if ((url.substring(0, 5) == " || (url.substring(0, 6) == " || (url.substring(0, 4) == "ftp:") || (url.substring(0, 7) == "gopher:")) return "PROXY proxy.cawtech.com:3128; DIRECT"; return "DIRECT"; }

Managing Squid I recommend the Calamaris.pl logfile analysis script, available at Use modified MRTG with Squids SNMP support (see SNMP section in Squid FAQ for details)

Advanced topics briefly covered HTTP accelerator mode –Squid fronts a web server (or farm) –Particularly useful if server generates cacheable dynamic content, but generation is expensive Delay pools Cache hierarchies –Allows clustering and redundancy –World-wide hierarchies: NLANR, etc.

Squid resources Official web site: –Distributions –Mailing list archives and subscription info –FAQ –Link to Henriks web page for current patches and experimental features –Link to the Cache Now! web site (of particular interest to origin site implementers) –Lots of great information!