Scalability and planning for growth 1WUCM1
Content management issues Structural – Naming (e.g. file, URL) policy – File and directory naming needs: invent/design/borrow a scheme easy means to implement the scheme a way to check whether the scheme is being adhered to a way to fix breaches of the scheme – Names are difficult to 'fix' at a later stage – Poor design will cause maintenance grief Content – Update policy when? by whom? WUCM12
Content update policy Without control, a large web system will quickly spawn: – inconsistencies (between A and B) – errors (A is wrong) – inaccessible data (A cannot be reached) – etc. Update strategies: – update on demand – regular update schedule – hybrid (on-demand with regular clean-up) Consider a content management tool WUCM13
Possible server organisation WUCM14
Apache configuration issues 1 Apache directives with performance implications: – KeepAlive number Keeps the connection open for maximum this number of accesses – avoids hogging – KeepAliveTimeout seconds Max time to wait for next request – MaxKeepAliveRequests number Max number to keep open at one time – HostNameLookups [on|off|double] ‘on’ put hostname in log instead of IP address – MaxClients number Limits number of requests handled at once by server – MaxRequestsPerChild number each child process of Apache handles this many requests and dies (to tidy up memory leaks) – ThreadsPerChild number only relevant Win32. Default 50, may need increase for many simultaneous hits. (Microsoft issue..) WUCM15
Apache configuration issues 2 Other Apache directives: – UseCanonicalName on/off/dns Relates to DNS names – FollowSymLinks an Option, can cause Apache to waste time checking through file structure - security risk – Logging of all kinds slows Apache down –.htaccess files add overhead (read on each request) – Large configuration files also slow Apache, so thinning here is a good idea WUCM16
General server configuration issues CGI programs influence the performance of the website: – Consider FastCGI or mod_perl to speed matters – Writing efficient code is always important Other tricks – Force popular files to be memory resident Operating system may do that for you – Force secure transfers to have more bandwidth WUCM17
Proxy server performance issues An Apache proxy can: – Cache for speed – Filter for security or decency Apache's proxy functionality is encapsulated in mod_proxy In order to use mod_proxy, use the directive – ProxyRequests on|off WUCM18
Proxy customisation To block particular sites from your clients: ProxyBlock baddomain.co.uk badword This will block the specific site, domain or any URL with names that contain ‘badword’ WUCM19
Hiding servers with a proxy Suppose there are two extra servers, parallel to the server Add the ProxyPass directive to the main server configuration file ProxyPass /users/ ProxyPass /secure/ This makes users.tech.port.ac.uk and secure.tech.port.ac.uk appears as directories on the main server, e.g. WUCM110
Still not enough performance? Two further possibilities to boost performance: – Replace the server hardware with a more powerful machine – Add more servers and distribute the load of client requests amongst them WUCM111
Benefits of multiple servers Server machines can be cheaper and easily replaceable Individual servers can fall over without the website becoming unavailable Increase capacity by adding another server and synchronising the data No need to alter or reconfigure any of the existing servers WUCM112
Clustering 1 Cannot just add an extra servers – Each would need different IP addresses Set of servers needs to be established as a cluster so that: – For external clients it should appear as one big fast server with one domain name – Clients should not be aware that the load is being shared by a cluster of servers – Content on the multiple servers must be synchronised WUCM113
Clustering 2 Two basic ways of approaching clustering: 1.DNS load sharing 2.Web server clustering WUCM114
DNS load sharing Most common approach is Round-Robin DNS distribution It works by specifying multiple IP addresses for the same host name (using a BIND syntax) WUCM IN A IN A IN A
DNS load sharing WUCM116 [Source: O’Reilly Books]
Round-Robin DNS sharing 1 Each DNS request for returns the next IP in sequence Set a short time-to-live (TTL) – the 60 seconds A lower TTL would – Improve web server load sharing – But increase the load on DNS server Attraction of round-robin DNS is its simplicity WUCM117
Round-Robin DNS sharing 2 Not true load balancing, only load sharing The round-robin takes no account of: – which servers are loaded – which are free – which are actually up and running Round-robin DNS makes keeping state for a user more difficult – A user may get a different server from last time WUCM118
Hardware load balancing Needs a specialist piece of software to redirect requests For example: – LocalDirector and DistributedDirector were products from Cisco ( – These will rewrite IP headers to redirect a connection to a local server WUCM119
Clustering with Apache 1 Apache provides way to cluster servers using the features of mod_rewrite and mod_proxy together This avoids the DNS caching problems and the cost of hardware solutions Need a machine as a proxy server, handling requests to several back-end servers on which the website is actually loaded WUCM120
Clustering with Apache 2 E.g. the proxy takes the master name and the backend servers might be www1 to www6 Wainwright (1999) sets out a method of setting up Apache using two parts: – Use mod_rewrite to randomly select a back-end server for the client request – Use mod_proxy’s ProxyPassReverse directive to disguise the URL of the back-end server WUCM121
Summary Configuration issues for scalability and performance Proxy Servers – filter and cache DNS (round robin) clustering Hardware clustering Proxy based clustering WUCM122