Blocking Bad Robots

Add a handler for the access-control phase to block undesirables.

Here we block access by certain user agents based on a pattern match file.

Configuration Entry

 <Location />
   PerlAccessHandler Apache::BlockAgent
   PerlSetVar BlockAgentFile /home/www/conf/bad_agents.txt
 </Location>

Pattern Match File

   ^teleport pro\/1\.28
   ^nicerspro
   ^mozilla\/3\.0 \(http engine\)
   ^netattache
   ^crescent internet toolpak http ole control v\.1\.0
   ^go-ahead-got-it
   ^wget
   ^devsoft's http component v1\.0
   ^www\.pl
   ^digout4uagent

Script III.3.1: Apache::BlockAgent

 package Apache::BlockAgent;
 # block browsers that we don't like
   
 use strict 'vars';
 use Apache::Constants ':common';
 use IO::File;
 my %MATCH_CACHE;
 my $DEBUG = 0;
   
 sub handler {
     my $r = shift;
       
     return DECLINED unless my $patfile = $r->dir_config('BlockAgentFile');
     return FORBIDDEN unless my $agent = $r->header_in('User-Agent');
     return SERVER_ERROR unless my $sub = get_match_sub($r,$patfile);
     return OK if $sub->($agent);
     $r->log_reason("Access forbidden to agent $agent",$r->filename);
     return FORBIDDEN;
 }
   
 # This routine creates a pattern matching subroutine from a
 # list of pattern matches stored in a file.
 sub get_match_sub {
     my ($r,$filename) = @_;
     my $mtime = -M $filename;
   
     # try to return the sub from cache
     return $MATCH_CACHE{$filename}->{'sub'} if
         $MATCH_CACHE{$filename} && 
             $MATCH_CACHE{$filename}->{'mod'} <= $mtime;
   
     # if we get here, then we need to create the sub
     return undef unless my $fh = new IO::File($filename);
     chomp(my @pats = <$fh>); # get the patterns into an array
     my $code = "sub { \$_ = shift;\n";
     foreach (@pats) {
         next if /^#/
         $code .= "return undef if /$_/i;\n";
     }
     $code .= "1; }\n";     
     warn $code if $DEBUG;
   
     # create the sub, cache and return it
     my $sub = eval $code;
     unless ($sub) {
         $r->log_error($r->uri,": ",$@);
         return undef;
     }
     @{$MATCH_CACHE{$filename}}{'sub','mod'}=($sub,$modtime);
     return $MATCH_CACHE{$filename}->{'sub'};
 }
   
 1;

What it Looks Like

http://localhost/conference/forbidden.html
<< Previous
Contents >> Next >>

Lincoln D. Stein, lstein@cshl.org
Cold Spring Harbor Laboratory
Last modified: Mon Aug 17 10:49:31 EDT 1998