Download presentation
Presentation is loading. Please wait.
1
Apache 2.0 Filters Greg Ames Jeff Trawick
2
Agenda Why filters? Filter data structures and utilites
An example Apache filter Filter types Configuration directives
3
Agenda cont... Frequently used Apache filters Pitfalls mod_ext_filter
Output filters Input filters Pitfalls ways to avoid them mod_ext_filter Debugging hints The big filter list
4
Why filters? Same idea as Unix command line filters: ps ax | grep "apache.*httpd" | wc -l Allows independent, modular manipulations of HTTP data stream if a 1.3 CGI creates SSI or PHP tags, they aren't parsed CGI created SSI tags can be parsed in 2.0 possible Zend issue w/PHP at present I use the ps... commands to count how many Apache server processes are running at the moment. The beauty of the filter concept is that I can easily combine several general programs to achieve the effect of a more specialized program. One of the main goals of Apache 2.0 is to incorporate that concept, and let modules manipulate each others output independently.
5
data can be manipulated independently from how it's generated
The general idea SSI gzip Here we have two different sources of data, called "handlers", feeding two different filter chains. On the left, the data generated by the CGI is filtered by mod_include, so any SSI tags will be expanded. On the right, a static file is the data source. This time the data is filtered by mod_include and then by mod_deflate which will use gzip to compress it. The post-processing is independent of the source of the data. data can be manipulated independently from how it's generated
6
Bucket brigades a complex data stream that can be passed thru layered I/O without unnecessary copying SPECWeb99 uses dynamically generated headers and trailers around static files header and trailer live in memory based buckets file bucket contains the fd End Of Stream metadata bucket is the terminator Apache filters use bucket brigades to represent data streams. There are three basic categories of buckets: memory based file based (i.e. file, pipe, socket) metadata A brigade is a list of buckets. This example shows the three categories of buckets being used to serve dynamic pages for the SPECWeb99 benchmark. The file bucket does not contain the contents of the file, just the apr_file_t structure needed to process it.
7
Filter utilities and structures
ap_register_[input|output]_filter creates ap_filter_rec_t ap_add_[input|output]_filter[_handle] creates ap_filter_t ap_pass_brigade passes a bucket brigade to the next output filter ap_get_brigade gets a bucket brigade from the next input filter ap_register_input_filter and ap_register_output filter tell the Apache core that the filter exists and returns a handle, typically while the modules are being initialized. ap_add_input_filter and ap_add_output_filter insert an instance of the filter into the appropriate runtime filter chain when an HTTP request arrives. You can use the _handle variant of the add functions for better performance. ap_pass_brigade hands off the data to the next output filter. ap_get_brigade gets a bucket brigade from the next downstream input filter.
8
ap_filter_t struct ap_filter_t {
ap_filter_rec_t *frec; <- description void *ctx; <- context (instance variables) ap_filter_t *next; request_rec *r; <- HTTP request info conn_rec *c; <- connection info }; created by ap_add_[input|output]_filter[_handle] the blue boxes on previous slide This is the entire structure used to describe an instance of an Apache filter while it's running. Pretty simple. The next slide describes the ap_filter_rec_t. The "ctx" (context) field allows a filter to stash things between invocations when there are multiple bucket brigades. The request_rec pointer is null for connection and network filters.
9
ap_filter_rec_t The ap_filter_rec_t is created when the filter is registered at Apache initialization time. The ftype field determines where the filter will be inserted in the filter chain. If you are looking at filters with a debugger, be aware that the name field is always lower case, while name parameter passed to the filter registration function is quite often in upper case for Apache core filters.
10
An example Apache filter
mod_case_filter lives in modules/experimental mission: convert data to upper case mod_case_filter is a very simple filter which illustrates some of the basic concepts.
11
mod_case_filter static apr_status_t CaseFilterOutFilter(ap_filter_t *f, apr_bucket_brigade *pbbIn) { request_rec *r = f->r; conn_rec *c = r->connection; apr_bucket *pbktIn; apr_bucket_brigade *pbbOut; pbbOut=apr_brigade_create(r->pool, c->bucket_alloc); part 1 of 3 At this point the filter has an input bucket brigade and an empty output brigade.
12
mod_case_filter... APR_BRIGADE_FOREACH(pbktIn,pbbIn) <-- iterates thru all the buckets { const char *data; apr_size_t len; char *buf; apr_size_t n; apr_bucket *pbktOut; if(APR_BUCKET_IS_EOS(pbktIn)) { /* terminate output brigade */ apr_bucket *pbktEOS=apr_bucket_eos_create(c->bucket_alloc); APR_BRIGADE_INSERT_TAIL(pbbOut,pbktEOS); continue; } part 2 of 3 The APR_BRIGADE_FOREACH macro iterates thru the buckets in the input brigade. When it sees an input EOS bucket, it creates an EOS bucket and appends it to the end of the output bucket brigade.
13
mod_case_filter... /* read */ <-- morphs bucket into memory based type apr_bucket_read(pbktIn,&data,&len,APR_BLOCK_READ); /* write */ buf = apr_bucket_alloc(len, c->bucket_alloc); <-- allocates buffer for(n=0 ; n < len ; ++n) buf[n] = apr_toupper(data[n]); pbktOut = apr_bucket_heap_create(buf, len, apr_bucket_free, <-- creates bucket c->bucket_alloc); APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut); <-- adds to output brigade } return ap_pass_brigade(f->next,pbbOut); <-- passes brigade to next output filter part 3 of 3 This code reads the current input bucket, creates a buffer, and copies in an upper case version of the input. Then it creates a heap bucket which refers to the buffer and appends it to the end of the output bucket brigade. When the APR_BRIGADE_FOREACH loop exits, the output bucket brigade is passed to the next output filter.
14
Filter types determines where filter is inserted
AP_FTYPE_RESOURCE (SSI, PHP, case filter) AP_FTYPE_CONTENT_SET (deflate, cache) AP_FTYPE_PROTOCOL (HTTP) AP_FTYPE_ TRANSCODE (chunk) AP_FTYPE_NETWORK (core_in, core_out) see include/util_filter.h for details The filter type is specified when the filter is registered. This determines the position of the filter in the chain. In addition to filters which operated on individual server resources like mod_case_filter, content set filters can operate on all the output for a request, including output created by multiple subrequests. Protocol and transcode filters operate on headers usually. Network filters work with sockets or socket buckets. There is also an AP_FTYPE_CONNECTION which is unused at present. The SSL filter is inserted between the connection and network levels.
15
Filter configuration directives
16
SetOutputFilter activates the named filter in this scope: <Directory /www/data/ > SetOutputFilter INCLUDES </Directory> SetInputFilter is the same for input WIth this configuration, the INCLUDES filter will be added to the output filter chain for every request for a file or subdirectory under this directory.
17
AddOutputFilter like SetOutputFilter, but uses extension: <Directory /www/moredata/> AddOutputFilter INCLUDES;DEFLATE html </Directory> AddInputFilter is the same for input The AddOutputFilter adds the specified filters only when the file extension matches. When multiple filters of the same type are coded, the leftmost filter is invoked first.
18
RemoveOutputFilter resets filter to extension mappings <Directory /www/moredata/dont_parse> RemoveOutputFilter html </Directory> removes .html filters in this subdirectory RemoveInputFilter is the same for input In this case, we have a subdirectory where we want to get rid of the association of .html files with the INCLUDES filter, created by the config on the previous slide.
19
Frequently used Apache filters
This decribes a set of filters used by the Apache core for every request.
20
Output filters <-- removes itself if no Range: header
<-- calculates total content length, if practical <-- generates headers from headers_out table The byte range filter handles Range: bytes=xx-yy headers which ask for partial content. We see this a lot with .pdf files, and with some download managers. The content length filter calculates the length of the entire request, unless chunking is in effect or if such a calculation would hold up the output. The HTTP header filter reads the r->headers_out table and inserts the actual headers into the data stream. The core output filter is the end of the filter chain. It invokes the APR code which writes to the network. <-- writes to the network <
21
Input filters <-- sets socket timeouts
<-- reads socket buckets; length cop The default input filter chain in simpler. The net_time filter sets the socket timeout value. Typically the keepalive timeout will be much shorter than the normal timeout. The core_input filter reads from socket buckets. It is also responsible for enforcing the boundaries between pipelined requests, and between the HTTP headers and the optional input request body. Sometimes it searches for LFs; at other times it reads the number of bytes specified in a Content-Lenth or a Chunk header.
22
Pitfalls yes, there are some pitfalls
23
Excessive memory consumption
what will happen if this filter is fed a 2G file and the client has a 9600 bps modem? solution: limit your buffer sizes, or don't buffer call ap_pass_brigade periodically ap_pass_brigade will block when socket buffers are full and downstream filters are well behaved In this scenario, it is important to slow down the output to match the client's speed (or lack thereof). I call this network back pressure. If you pick a reasonable upper limit for your buffer size, call ap_pass_brigade when the buffer fills up, and the downstream filters are well behaved, this problem takes care of itself. We use 8k buffers frequently in the Apache core. Many filters don't buffer the input data. mod_include splits mmap buckets when it sees its tags, and inserts new buckets into the data stream.
24
Holding up streaming output
a CGI writes a "search in progress" message ...then does a lengthy database query filters are called before all the output exists filter sees a pipe bucket naïve approach will block reading the pipe client won't see the message A filter will receive a pipe bucket representing the output of the CGI. When first read, the partial response will be received. When next read, the filter will block if the bucket was read in blocking mode. So there is no opportunity to send the partial response to the client since the filter is now blocked reading the pipe.
25
Holding up streaming output...
Specific solution (stolen from protocol.c::ap_content_length_filter): read_type = nonblocking; for each bucket e in the input brigade: . if e is eos bucket, we're done; . rv = e->read(); . read_type = non_blocking; . if rv is APR_SUCCESS then process the data received; . if rv is APR_EAGAIN: . . add flush bucket to end of brigade containing data already processed; . . ap_pass_brigade(data already processed); . . read_type = blocking; . if rv is anything else: . . log error and return failure; ap_pass_brigade(data already processed); High-level solution: If the bucket being processed has indeterminate length (i.e., length field is -1), first do a non-blocking read. If EAGAIN is received from the read, indicating that no data is currently available, pass any data already processed to the next filter and makethe next read blocking read since the filter should expect that no more data is available for a while.
26
Avoiding the pitfalls with a simple well-behaved output filter
Make sure that the filter doesn't: consume too much virtual memory by touching a lot of storage before passing partial results to the next filter break streaming output by always doing blocking reads on pipe buckets break streaming output by not sending down a flush bucket when the filter discovers that no more output is available for a while busy-loop by always doing non-blocking reads on pipe buckets Here are the design goals for a well-behaved filter. Insuring that the filter can stream is somewhat more complex than limiting memory consumption.
27
Design for simple well-behaved output filter
read_mode = nonblock; output brigade = empty; output bytes = 0; foreach bucket in input brigade: . if eos: . . move bucket to end of output brigade; ap_pass_brigade(); . . return; . if flush: . . output brigade = empty; output bytes = 0; . . continue with next bucket . read bucket; . if rv is APR_SUCCESS: . . read_mode = non_block; . . process_bucket(); /* see next slide */ . if rv is EAGAIN: . . add flush bucket to end of output brigade; ap_pass_brigade(); . . read_mode = block; . if output bytes > 8K: . . ap_pass_brigade(); ap_pass_brigade(); This pseudo-code incorporates the streaming logic borrowed from the content length filter and logic to limit worst case memory consumption.
28
Design for simple well-behaved output filter...
process_bucket: if len is 0: . move bucket to output brigade; while len > 0: . n = min(len, 8K); . get new buffer to hold output of processing n bytes; . process the data; . get heap bucket to represent new buffer; . add heap bucket to end of output brigade; . if output bytes > 8K: . . ap_pass_brigade(); . . output brigade = empty; output bytes = 0; . len = len – n;
29
mod_ext_filter Allows command-line filters to act as Apache filters
Useful tool when implementing native Apache filters Quick prototype of desired function Can trace a native filter being tested mod_ext_filter allows using existing command line filter programs in the Apache 2 environment. Since it launches an external program instance (fork/exec on Unix) it won't be as fast as a native Apache filter. It sets the standard CGI environment variables before invoking the program. It does not automatically invoke the shell.
30
Simple mod_ext_filter example "tidy" up the HTML
ExtFilterDefine tidy-filter \ cmd="/usr/local/bin/tidy" <Location /manual/mod> SetOutputFilter tidy-filter ExtFilterOptions LogStderr </Location> tidy is a program which cleans up HTML. This example shows how you can invoke it dynamically. Note that the full path to the program is specified since the shell isn't used.
31
mod_ext_filter and prototyping
Use a normal Unix filter to transform the response body mod_ext_filter: perform some header transformations mod_headers: add any required HTTP header fields mod_setenvif: set environment variables to enable/disable the filter To get a prototype filter up and running quickly, you can use mod_ext_filter to process the response body. It allows you to modify certain HTTP header fields, as shown on the next slide. You can add other headers by using mod_headers, and switch the filter on and off by using mod_setenvif to conditionally set environment variables.
32
mod_ext_filter header transformations
Content-Type set to value of outtype parameter (unchanged otherwise) ExtFilterDefine foo outtype=xxx/yyy Content-Length preserved or removed (default), based on the presence of preservescontentlength parameter ExtFilterDefine foo preservescontentlength
33
Using mod_header to add headers
ExtFilterDefine gzip mode=output \ cmd=/bin/gzip <Location /gzipped> SetOutputFilter gzip Header set Content-Encoding gzip </Location> This slide shows how you can use mod_headers along with mod_ext_filter to set other HTTP headers.
34
Enabling a filter via envvar
ExtFilterDefine nodirtywords \ cmd="/bin/sed s/damn/darn/g" \ EnableEnv=sanitize_output <Directory /> SetOutputFilter nodirtywords SetEnvIf Remote_Host ceo.mycompany.com \ sanitize_output </Directory> You can switch an external filter on or off by using mod_setenvif to set environment variables.
35
Tracing another filter
# Trace the data read and written by another filter # Trace the input: ExtFilterDefine tracebefore EnableEnv=trace_this_client \ cmd="/usr/bin/tee /tmp/tracebefore" # Trace the output: ExtFilterDefine traceafter EnableEnv=trace_this_client \ cmd="/usr/bin/tee /tmp/traceafter" <Directory /usr/local/docs> SetEnvIf Remote_Addr trace_this_client SetOutputFilter tracebefore;some_other_filter;traceafter </Directory> This example shows how to insert two instances of an external "tee" filter around another filter. When a request arrives from the specified client, the output data is copied to a file before and after the other filter is invoked.
36
General Apache debugging trick
use the prefork MPM w/ 2 Listen statements ps axO ppid,wchan | grep httpd The process to get the next connection is in poll It will look unique (Linux: do_pol / schedu) attach debugger...gdb bin/httpd <pid> easy to script saves having to restart with -x can quickly detach/reattach I use this technique frequently to debug Apache without having to restart it. If there are two or more Listen directives, Apache serializes the accepts. One process will be allowed to poll. ps's wchan option shows you where a process is blocked in the kernel, which is different for poll vs. the various accept mutex choices. This will show you which process will get the next HTTP request. At present, the debuggers I use are easier to use with the prefork MPM than the threaded MPMs.
37
Filter debugging hints
.gdbinit dump_filters dump_brigade dump_bucket Breakpoints core_[input|output]_filter ap_[get|pass]_brigade If you copy the .gdbinit file from the root of the Apache 2.0 source directory into your home directory, it adds a few gdb commands that help when debugging filters. dump_filters r->output_filters displays the current output filter chain, assuming that r points to a request rec. dump_brigade displays a bucket brigade; dump_bucket displays a single bucket. A breakpoint in the core_input/output_filter will show the filter chain after all the filters have been called. A breakpoint in ap_get/pass_brigade will fire each time a new filter is called.
38
The big filter list doesn't include filters already covered mod_cache
has three filters based on 1.3 mod_proxy cache mod_charset_lite translates character encodings essential on EBCDIC platforms mod_cache moves functionality out of mod_proxy and expands on it. You can now use it to cache files when no proxy is configured. mod_charset_lite was inspired by mod_charset for Apache 1.3, which was written to deal with multiple Russian character sets. As the name implies, it doesn't have all of mod_charset's features. Basically you give it two charset names, and it invokes iconv to transform the data between those charsets.
39
big list... mod_deflate mod_include CHUNK mod_logio gzip/ungzip SSI
manages HTTP outbound chunking mod_logio allows logging of total bytes sent/recv'd, incl. headers mod_deflate gzips and ungzips HTTP content. mod_include was converted to a filter in Apache 2.0, so that it can process data from arbitrary sources. The CHUNK filter adds HTTP chunk headers to the outbound data stream. mod_logio is very new. It lets you log the total number of bytes sent and received, including headers. This might be important for a site that bills according to bandwith usage.
40
big list... mod_header proxy_ftp mod_ssl
lets you set arbitrary HTTP headers proxy_ftp filter to send HTML directory listing mod_ssl low level interfaces to SSL libraries mod_headers, proxy_ftp, and mod_ssl were converted to filters in Apache In the case of mod_ssl, this should allow us to change socket layer processing with minimal disruption to mod_ssl.
41
big list... mod_bucketeer subrequest filter old_write
debugging tool for filters/buckets/brigades separates, flushes, and passes data stream uses control characters to specify where subrequest filter eats EOS bucket used to end subreq output stream old_write feeds buffered output of ap_rput* into filter chain mod_bucketeer is a debugging tool. You can feed it a text file with imbedded control characters, and it will break up the data into separate buckets, insert flush buckets, and pass the date downstream at arbitrary points. The subrequest filter separates filters from a subrequest from the main request. It consumes an EOS bucket which is used to terminate the subrequest output data stream. The old_write filter takes the buffered output from the ap_rput* family of functions and sends it down the output filter chain.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.