Download presentation
Presentation is loading. Please wait.
1
Transitioning to mod_perl Handlers
Geoffrey Young
2
Overview Apache and mod_perl 101 mod_perl Handler Basics
mod_perl Handler API Basics Using the Apache Framework Advanced mod_perl API Features Transition Strategies
3
Apache's Pre-fork Model
Apache parent process forks multiple child processes httpd (parent) httpd (child) httpd (child) httpd (child) httpd (child)
4
Roles and Responsibilities
httpd parent processes no actual requests all requests are served by the child processes requests are handled by any available child process, not necessarily the one that handled the previous request remember, the HTTP is stateless by default
5
Children are Individuals
httpd (parent) httpd (child) httpd (child) httpd (child) httpd (child)
6
Nice, Responsible Children
each httpd child processes one incoming request at a time only when the request is over is the child free to serve the next request over time httpd child processes are terminated and replaced with fresh children
7
Request Phases Apache breaks down request processing into separate, logical parts called phases each request is stepped through the phases until... all processing is complete somebody throws an "error" developers are given the chance to hook into each phase to add custom processing
8
Apache Request Cycle client request logging URI-based init content
URI translation fixups file-based init MIME setting resource control
9
So What? most Apache users don't worry about the request cycle too much... ...but they do use modules that plug into it
10
... for instance mod_rewrite:
client request URI-based init mod_rewrite: RewriteRule /favicon.ico$ /images/favicon.ico URI translation
11
... for instance mod_auth: AuthUserFile .htpasswd client request
URI-based init URI translation file-based init mod_auth: AuthUserFile .htpasswd resource control
12
... for instance mod_cgi: SetHandler cgi-script client request
URI-based init content URI translation fixups file-based init MIME setting resource control
13
That's great, but... breaking down the request into distinct phases has many benefits gives each processing point a role that can be easily managed and programmed makes Apache more like an application framework rather than a content engine but you have to code in C
14
Enter mod_perl mod_perl offers an interface to each phase of the request cycle opens up the Apache API to Perl code allows you to program targeted parts of the request cycle using Perl we like Perl
15
Apache Request Cycle client request logging URI-based init content
URI translation fixups file-based init MIME setting resource control
16
mod_perl Interface client request PerlCleanupHandler PerlLogHandler
PerlPostReadRequestHandler logging URI-based init PerlHandler PerlTransHandler content URI translation PerlFixupHandler PerlHeaderParserHandler fixups file-based init PerlAccessHandler PerlAuthenHandler PerlAuthzHandler PerlTypeHandler MIME setting resource control
17
How? mod_perl embeds an entire perl interpreter into Apache
18
Why? create a persistent perl environment open up the Apache API
increase performance * (for the dynamic parts) open up the Apache API increase developer options because it's cool
19
What is mod_perl? to the masses, mod_perl is just faster CGI a la Apache::Registry leads to comparisons to FastCGI, PerlEx, and other CGI enhancement engines not an accurate description of mod_perl, so comparisons aren't really valid
20
What is mod_perl? mod_perl is the Perl interface to the Apache API
a C extension module, just like mod_cgi or mod_rewrite creates a persistent perl environment embedded within Apache
21
What's the Big Deal? mod_perl allows you to interact with and directly alter server behavior gives you the ability to "program within Apache's framework instead of around it" let's you do it in Perl instead of C allows you to intercept basic Apache functions and replace them with your own (sometimes devious) Perl substitutes
22
Ok, so what's a handler? the term handler refers to processing that occurs during any of the Apache runtime phases this includes the request-time phases as well as the other parts of the Apache runtime, such as restarts the use of "handler" for all processing hooks is mod_perl specific
23
Why use handlers?
24
Why use handlers? gives each processing point a role that can be easily managed and programmed process request at the proper phase meaningful access to the Apache API break up processing into smaller parts modularize and encapsulate processing makes Apache more like an application framework rather than a content engine CPAN
25
Registry is just a handler
Apache::Registry is merely an (incredibly clever and amazing) mod_perl handler its performance gains are made possible due to what mod_perl really is let's take a peek inside...
26
Apache::Registry Client side Server side
Server side mod_perl intercepts content generation for Apache/Registry.pm calls Apache::Registry::handler(Apache->request) inserts wizardry returns response to client
27
Wizardry, you say? the wizardry is basically just putting the CGI script into it's own package package Apache::ROOT::perl_2dbin::foo_2epl; sub handler { BEGIN { $^W = 1; }; ... your script here... } 1; because the perl interpreter is persistent the (compiled) package is already in memory when called
28
"The dream is always the same"
the basic process for mod_perl is the same for the other request phases as for content generation (eg, Registry) Apache passes control to mod_perl mod_perl passes control to your Perl handler your Perl subroutine defines the status mod_perl passes status back to Apache Apache continues along
29
Anatomy of a Handler a mod_perl handler is just an ordinary Perl module visible including mod_perl extra paths contains a package declaration has at least one subroutine typically the handler() subroutine
30
Let's create a handler... Object: to protect our name-based virtual hosts from proper but bad HTTP/1.0 requests
31
HTTP/1.0 and Host HTTP/1.0 does not require a Host header
assumes a “one host per IP" configuration this limitation "breaks" name-based virtual host servers for browsers that follow HTTP/1.0 to the letter most send the Host header, so all is well
32
Let's create a handler... Object: to protect our name-based virtual hosts from unacceptable HTTP/1.0 requests Method: intercept the request prior to content-generation and return an error unless... there is a Host header the request is an absolute URI
33
package Cookbook::TrapNoHost;
use Apache::Constants qw(DECLINED BAD_REQUEST); use Apache::URI; use strict; sub handler { my $r = shift; # Valid requests for name based virtual hosting are: # requests with a Host header, or # requests that are absolute URIs. unless ($r->headers_in->get('Host') || $r->parsed_uri->hostname) { $r->custom_response(BAD_REQUEST, "Oops! Did you mean to omit a Host header?\n"); return BAD_REQUEST; } return DECLINED; 1;
34
Setup add TrapNoHost.pm to @INC add to httpd.conf that's it!
ServerRoot/lib/perl/Cookbook/TrapNoHost.pm add to httpd.conf PerlModule Cookbook::TrapNoHost PerlTransHandler Cookbook::TrapNoHost that's it!
35
Apache Request Cycle client request client request logging
URI-based init content URI translation fixups file-based init MIME setting resource control
36
Intercept the Request client request URI-based init URI translation
HTTP/ Bad Request Date: Tue, 04 Jun :17:52 GMT Server: Apache/ dev (Unix) mod_perl/1.27_01-dev Perl/v5.8.0 Connection: close Content-Type: text/html; charset=iso Oops! Did you mean to omit a Host header?
37
Intercept the Request client request logging URI-based init
URI translation
38
Key Concepts the Apache request object, $r the Apache::Table class
the Apache::URI class return values and the Apache::Constants class error responses and $r‑>custom_response()
39
Apache Request Object passed to handlers or available via Apache->request() $r = shift; # passed to handler() $r = Apache->request(); provides access to the Apache class, which provides access to request attributes singleton-like constructor, always returning the same object
40
Apache::Table the Apache::Table class provides the underlying API for the following request attributes... $r->headers_in() $r->headers_out() $r->err_headers_out() $r->dir_config() $r->subprocess_env() $r->notes() Apache::Request::param() Apache::Request::parms() in old releases
41
Apache::Table to manipulate Apache::Table objects, you use the provided methods get() set() add() unset() do() merge() new() clear()
42
Apache Table Properties
Apache tables have some nice properties case insensitive allow for multi-valued keys they also have one important limitation can contain only simple strings use pnotes() for Perl scalars
43
keeps the Net flowin'... both case-insensitivity and multiple values are key to manipulating headers $r->headers_out->set('Set-Cookie' => 'name=foo'); $r->headers_out->add('set-cookie' => 'name=bar'); = $r->headers_out->get('Set-cookie');
44
Table Iteration Apache::Table objects use a special idiom when you need to operate on every item in a table my $input = $apr->param; # Apache::Table object $input->do(sub { my ($key, $value) $log->info("input: name = $key, value = $value"); 1; });
45
More Apache::Table Fun
Tired PerlSetVar MyVar "um, ok" my $ok = $r->dir_config('MyVar'); Wired PerlAddVar MyVar "cool" = $r->dir_config->get('MyVar');
46
Trickery handle "what if?" cases gratuitous exploitation
$r->headers_in->set('Set-Cookie' => 'name=foo'); my $sub = $r->lookup_uri('/scripts/foo.html'); gratuitous exploitation # configure "PerlSetVar Filter On" on-the-fly $r->dir_config->set(Filter => 'On');
47
All About URIs when Apache parses the incoming request, it puts parts of the URI into the request record for just digging out the request URI you typically want the request attribute $r->uri() my $uri = $r->uri; # $uri is "/index.html" sometimes you need all the URI parts, like the scheme or port
48
Apache::URI mod_perl provides the Apache::URI utility class for handling URIs allows for manipulating the current URI as well as constructing a new URI unfortunately, it has two distinct interfaces that contain very subtle differences our code uses those differences to our advantage
49
Manipulating URIs Apache::URI offers the parse() method as a constructor my $uri = Apache::URI->parse($r); you can also grab an Apache::URI object from the request itself my $uri = $r->parsed_uri; to get the URI string back, call the unparse() method $uri->path('/new/location'); $r->headers_out->set(Location => $uri->unparse);
50
Apache::URI methods once you have an Apache::URI object, you can peek at or modify any part of the URI using its accessor methods - password() - port() - scheme() - path() - query() - more... unfortunately, although the objects have the same methods, they don't always behave the same...
51
parsed_uri() parsed_uri() returns an object populated based on $r->uri() and other parts of the request record since the URI has undergone translation, the path_info() method will return useful information Given: <Location /foo> my $path = $r->parsed_uri->path; # "/foo" my $info = $r->parsed_uri->path_info; # "/bar"
52
Apache::URI->parse($r)
Apache::URI->parse($r) only looks at the incoming URI and makes no assumptions about the underlying filesystem Given: <Location /foo> my $path = Apache::URI->parse($r)->path; # "/foo/bar" my $info = Apache::URI->parse($r)->path_info; # undef
53
parsed_uri() unless the request is an absolute URI the scheme, host, and port fields will be absent proxy requests are absolute URIs Given:
54
Apache::URI->parse($r)
Apache::URI->parse($r) uses other bits of information to construct a self-referential URI Given: my $scheme = Apache::URI->parse($r)->scheme; # "http" my $host = Apache::URI->parse($r)->hostname; "
55
it all boils down to... the differences between Apache::URI->parse($r) and $r->parsed_uri() are very confusing use Apache::URI->parse($r) for creating a self-referential URI that needs to point to the same server use $r->parsed_uri() for accessing request attributes
56
Apache::Constants Apache::Constants class provides over 90 runtime constants use Apache::Constants qw(DECLINED BAD_REQUEST); the most common are: OK SERVER_ERROR REDIRECT DECLINED
57
Return Values handlers are expected to return a value
the return value of the handler defines the status of the request Apache defines three "good" return values OK – all is well DECLINED – forget about me DONE – we're finished, start to log All other values are "errors" and trigger the ErrorDocument cycle
58
When handlers turn bad... "error" return codes are not always errors
instead, they indicate a new route for the request
59
Custom Responses Apache has default responses for most of the possible errors it also provides the ErrorDocument directive for fine-tuning server behavior you can specify error behavior on-the-fly with custom_response()
60
custom_response() the custom_response() method has several interfaces... # set the response for 500 $r->custom_response(SERVER_ERROR, '/error.html') # capture the current 500 setting my $errordoc = $r->custom_response(SERVER_ERROR) # reset 500 back to the Apache default $r->custom_response(SERVER_ERROR, undef);
61
Key Concepts the Apache request object, $r the Apache::Table class
the Apache::URI class return values and the Apache::Constants class error responses and $r‑>custom_response()
62
Just the Beginning almost every part of Apache's default behavior can be replaced, rewritten, or extended using mod_perl in many cases you can do much more than you can with Apache and C due to Perl's ability to fold, spindle, and mutilate everyday items
63
User Authentication Apache default authentication mechanism is mod_auth uses a password file generated using Apache's htpasswd utility geoff:zzpEyL0tbgwwk
64
User Authentication configuration placed in .htaccess file or httpd.conf AuthUserFile .htpasswd AuthName "my site" AuthType Basic Require valid-user
65
Who Uses Flat Files? flat files are limiting, hard to manage, difficult to integrate, and just plain boring we can use the Apache API and Perl to replace flat files with our own authentication mechanism
66
How Authentication Works
client requests a document GET /perl-status HTTP/1.1 Accept: text/xml, image/png, image/jpeg, image/gif, text/plain Accept-Charset: ISO , utf-8;q=0.66, *;q=0.66 Accept-Encoding: gzip, deflate, compress;q=0.9 Accept-Language: en-us Connection: keep-alive Host: Keep-Alive: 300 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US) server denies request HTTP/ Authorization Required WWW-Authenticate: Basic realm="my site" Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html; charset=iso
67
How Authentication Works
client sends a new request GET /perl-status HTTP/1.1 Accept: text/xml, image/png, image/jpeg, image/gif, text/plain Accept-Charset: ISO , utf-8;q=0.66, *;q=0.66 Accept-Encoding: gzip, deflate, compress;q=0.9 Accept-Language: en-us Authorization: Basic Z2VvZmY6YWZha2VwYXNzd29yZA== Connection: keep-alive Host: Keep-Alive: 300 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US) server sends document HTTP/ OK Keep-Alive: timeout=15, max=99 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html
68
Do it in Perl since mod_perl gives us the ability to intercept the request cycle before Apache, we can authenticate using Perl instead Apache provides an API, making the job easy mod_perl provides access to the Apache API
69
package My::Authenticate;
use Apache::Constants qw(OK DECLINED AUTH_REQUIRED); use strict; sub handler { my $r = shift; # Let subrequests pass. return DECLINED unless $r->is_initial_req; # Get the client-supplied credentials. my ($status, $password) = $r->get_basic_auth_pw; return $status unless $status == OK; # Perform some custom user/password validation. return OK if authenticate_user($r->user, $password); # Whoops, bad credentials. $r->note_basic_auth_failure; return AUTH_REQUIRED; }
70
Configuration change to AuthUserFile .htpasswd AuthName "my site"
AuthType Basic Require valid-user to PerlAuthenHandler My::Authenticate
71
The Choice is Yours how you decide to authenticate is now up to you
sub authenticate_user { my ($user, $pass) return $user eq $pass; } are you seeing the possibilities yet?
72
The Power of CPAN over 25 Apache:: shrink-wrapped modules on CPAN for authentication SecureID Radius SMB LDAP NTLM
73
To Infinity and Beyond! this example only covered Basic authentication via popup box the same techniques can be used to authenticate via a login form plus cookies, munged URLs, or hidden fields extended to use Digest authentication as well
74
Content Generation client request logging URI-based init content
URI translation fixups file-based init MIME setting resource control
75
Sample PerlHandler create a handler that uses CPAN module HTML::Clean to "clean" outgoing documents alter the handler to take advantage of more advanced mod_perl features
76
use Apache::Constants qw(OK DECLINED); use Apache::File;
package TPC::Clean; use Apache::Constants qw(OK DECLINED); use Apache::File; use HTML::Clean; use strict; sub handler { my $r = shift; my $fh = Apache::File->new($r->filename) or return DECLINED; my $dirty = do {local $/; <$fh>}; my $h = HTML::Clean->new(\$dirty); $h->level(3); $h->strip; $r->send_http_header('text/html'); print ${$h->data}; return OK; } 1;
77
Configuration add directives to httpd.conf to mirror DocumentRoot
Alias /clean /usr/local/apache/htdocs <Location /clean> SetHandler perl-script PerlHandler TPC::Clean </Location>
78
Results original: 202 bytes clean: 145 bytes <html> <body>
<form method="GET" action="/foo"> Text: <input type="text" name="foo"><br> <input type="submit"> </form> <strong>hi there </strong> </body> </html> clean: 145 bytes <html><body><form method="GET" action="/foo"> Text: <input type="text" name="foo"><br><input type="submit"></form><b>hi there </b></body></html>
79
Key Concepts Apache::File class $r->filename local $/
80
Apache::File Apache::File is similar to IO::File
my $fh = Apache::File->new($r->filename); while (my $line = <$fh>) { ... } it also adds a number of methods to the Apache class for use when manipulating files over HTTP
81
$r->filename() contains the result of URI-to-filename translation
if you just want to stat() the requested file, use $r->finfo and the special filehandle _ if (-f $r->finfo && -M _ < $timeout) { ... }
82
local $/; typical Perl idiom bad under mod_perl
my $file = <$fh>; #slurp my $file = do {local $/; <$fh>}; bad under mod_perl but we do it anyway with justification
83
The Choice is Yours! Content filtering with Apache::Filter?
Using proper cache-friendly headers?
84
Magic filtered content generation impossible with Apache 1.3
can't send CGI output through mod_ssi reason for output filters in Apache 2.0 mod_perl has had output filtering for years possible due to Perl's TIEHANDLE interface
85
TIEHANDLE in mod_perl mod_perl tie()s STDOUT to the Apache class prior to the content generation phase you can tie() STDOUT as well and override mod_perl's default behavior very useful with stacked handlers
86
Stacked Handlers for each phase of the request, mod_perl will run any registered Perl handlers for that phase you can register more than one Perl handler per phase whether all handlers are called depends on the syntax of the phase itself in Apache
87
Stacked Handlers some phase run until the handler list is exhausted
PerlLogHandler My::DBLogger PerlLogHandler My::FileLogger some phases run until one handler returns OK PerlTransHandler My::TranslateHTML PerlTransHandler My::TranslateText all phases terminate on "error"
88
Stacked Content Handlers
for the content generation phase, running multiple Perl handlers can be incredibly powerful Apache::Filter implements a simple interface for pipelining content handlers uses TIEHANDLE in the background
89
Now for the Fun Part modify our handler to work either standalone or as part of a handler chain easy using Apache::Filter
90
Apache::Filter Changes
my $fh = Apache::File->new($r->filename) or return DECLINED; to: my $fh = undef; if (lc $r->dir_config('Filter') eq 'on') { $r = $r->filter_register; ($fh, my $status) = $r->filter_input; return $status unless $status == OK } else { $fh = Apache::File->new($r->filename) or return NOT_FOUND;
91
Configuration change to Alias /clean /usr/local/apache/htdocs
<Location /clean> SetHandler perl-script PerlHandler My::Clean </Location> to PerlModule Apache::Filter PerlSetVar Filter On
92
Key Concepts Apache::Filter API
93
Apache::Filter Apache::Filter adds methods to the Apache class
do not have to use Apache::Filter; do have to PerlModule Apache::Filter because filtering content is tricky, the interface is quirky $r = $r->filter_register;
94
Apache::Filter to get at filtered input, call $r->filter_input()
returns an open filehandle on the input stream if the first filter, the filehandle is for $r->filename() all filters use the same API, regardless of their position in the chain
95
So What? new Apache::Filter aware code works the same as the standalone module can be used as part of a PerlHandler chain can be any part of the chain
96
Compressing Output use Apache::Compress available from CPAN
uses Compress::Zlib to compress output Apache::Filter aware
97
Configuration change to Alias /clean /usr/local/apache/htdocs
<Location /clean> SetHandler perl-script PerlHandler My::Clean PerlSetVar Filter On </Location> to PerlHandler My::Clean Apache::Compress
98
Stacked Power http://perl.apache.org/index.html straight HTML
35668 bytes My::Clean 28162 bytes (79%) Apache::Compress 8177 bytes (23%) My::Clean + Apache::Compress 7458 bytes (21%)
99
Caveats using Apache::Filter is actually a bit more complex than this... see the recent version of Apache::Clean to get an idea
100
Cache Headers we often think of dynamic content as "could be different on any given access" "dynamic" content can also be static with clearly defined factors that can change its meaning by properly managing HTTP/1.1 cache headers, we can reduce strain on our servers
101
Conditional GET Request
HTTP/1.1 allows for a conditional GET request clients are allowed to use cached content based on information about the resource information is provided by both the client and the server
102
Conditional GET Request
for static documents, Apache takes care of making our response cache-friendly since the file is on disk, Apache can determine when the file was last changed with static files, local modification is the only factor still too many rules to keep straight Apache provides an API to use so we don't have to think too much
103
Now for the Fun Part modify our handler to be "cache friendly"
send 304 when the document hasn't changed properly handle If-* header comparisons
104
How do you define change?
when dynamically altering static documents there are a number of factors to consider when the file changes on disk when the code changes when the options to the code changes all of these affect the "freshness" of the document
105
Code Changes in order to determine when the code itself changes, we need to mark the modification time of the package at request time, we call an API to compare the package modification to the If-Modified-Since header on reloads, we regenerate the package modification time
106
use Apache::Constants qw(OK DECLINED); use Apache::File;
package My::Clean; use Apache::Constants qw(OK DECLINED); use Apache::File; use HTML::Clean; use strict; # Get the package modification time... (my $package = __PACKAGE__) =~ s!::!/!g; my $package_mtime = (stat $INC{"$package.pm"})[9];
107
Configuration Changes
in order to determine when the options to the code change, we need to mark the modification time of httpd.conf at request time, we call an API to compare the configuration modification to the If-Modified-Since header on restarts, we regenerate the configuration modification time
108
use Apache::Constants qw(OK DECLINED); use Apache::File;
package My::Clean; use Apache::Constants qw(OK DECLINED); use Apache::File; use HTML::Clean; use strict; # Get the package modification time... (my $package = __PACKAGE__) =~ s!::!/!g; my $package_mtime = (stat $INC{"$package.pm"})[9]; # ...and when httpd.conf was last modified my $conf_mtime = (stat Apache->server_root_relative('conf/httpd.conf'))[9]; # When the server is restarted we need to # make sure we recognize config file changes and propigate # them to the client to clear the client cache if necessary. Apache->server->register_cleanup(sub { $conf_mtime = (stat Apache->server_root_relative('conf/httpd.conf'))[9]; });
109
Resource Changes in order to determine when the resources changes, we need to mark the modification time of $r->filename at request time, we call an API to compare the resource modification to the If-Modified-Since header resource modification is checked on each request
110
my $fh = Apache::File->new($r->filename) or return DECLINED;
sub handler { my $r = shift; my $fh = Apache::File->new($r->filename) or return DECLINED; my $dirty = do {local $/; <$fh>}; my $h = HTML::Clean->new(\$dirty); $h->level(3); $h->strip; $r->update_mtime($package_mtime); $r->update_mtime((stat $r->finfo)[9]); $r->update_mtime($conf_mtime); $r->set_last_modified; $r->set_etag; $r->set_content_length(length ${$h->data}); # only send the file if it meets cache criteria if ((my $status = $r->meets_conditions) == OK) { $r->send_http_header('text/html'); } else { return $status; print ${$h->data}; return OK;
111
Key Concepts Apache::File update_mtime() and time comparisons
headers to never modify directly meets_conditions() for header comparison
112
Apache::File when you use Apache::File; methods are added to the Apache class update_mtime() set_content_length() set_last_modified() set_etag() meets_conditions() set_byterange() each_byterange()
113
update_mtime() used to update the apparent modification time of the resource compares the current resource mtime to the set attempt and keeps the most recent just pile calls to update_mtime() up and let Apache do the work
114
Special Headers some headers you don't want to manipulate with headers_out() Apache provides a special set of APIs for these the ones added by Apache::File
115
set_last_modified() sets the Last-Modified header
uses mtime slot in the request record by default can pass it a properly formatted time
116
set_etag() sets the ETag header
ETag is supposed to be unique for every version of a resource that ever existed use only with static resources
117
set_content_length()
sets the Content-Length header ...
118
meets_conditions() does all the nasty comparisons between the client and server headers returns OK or HTTP_NOT_MODIFIED sets the headers_out() as appropriate you need to set all your outgoing headers before calling meets_conditions()
119
Fine Manuals Writing Apache Modules with Perl and C
mod_perl Developer's Cookbook mod_perl Pocket Reference mod_perl Guide mod_perl at the ASF
120
These Slides http://www.modperlcookbook.org/slides/TPC6-tutorial.ppt
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.