Blocking bad bots with Fail2ban

fail2banFail2ban is a versatile security tool. While it is primarily used for preventing brute-force attacks against SSH, it can also be used for protecting other services.

There are bots which go around scanning the internet and send thousands of requests to web servers in hopes of finding vulnerabilities. This post discusses blocking such bots with Fail2ban.

We assume that you are using Apache as a web server. However, these instructions can be easily adjusted for nginx or any other web server.

However, you should keep in mind that Fail2ban is not a Web Application Firewall (WAF) and cannot fend off malicious requests as they come in through. This is because fail2ban takes actions by monitoring logs; so there must be at least one malicious attempt which gets logged before Fail2ban can take an action.

What is a bad bot, anyway?

In this post, we will focus on blocking bots that do one of the following things:

  • Scans the website for an open proxy
  • Sends a GET request with parameters containing SQL injection payloads
  • Sends a GET request with parameters containing Shellshock payloads

Of course, you can block other kinds of attacks as well. However, we will restrict ourselves to the above three cases for this article.

Installing Fail2ban

Fail2ban is available in the repository of most distributions.

To install it on Debian/Ubuntu, run the following:

sudo apt-get update
sudo apt-get install fail2ban

On CentOS, you should first enable the EPEL repository; then, you should enable and start it.

sudo yum -y install epel-release
sudo yum -y install fail2ban
sudo systemctl enable fail2ban
sudo systemctl start fail2ban

Fail2ban basics

At the heart of the working mechanism of Fail2ban, there are a set of jails. Put simply, a jail tells Fail2ban to look at a set of logs, and to apply a filter on it each time the log changes. If the number of matches for the filter equals the maximum number of matches allowed by the jail, then an action specified in the jail is taken.

Thus, you need to define two things: a filter, and a jail. The jail will be configured to look at Apache’s logs to detect malicious requests.

Defining the filters

A filter is simply a collection of Python regular expressions that are matched against a log. Here, we’d need to define filters for the criteria we described above.

But first, let us have a look at an entry in Apache logs: - - [17/Jan/2017:14:10:41 +0000] "GET /robots.txt HTTP/1.1" 200 3494 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http:

Notice that the request header GET /robots.txt HTTP/1.1 is enclosed in double quotes. While designing such rules yourself, you should take sufficient care to ensure that only the request header matches. Otherwise, you risk blocking legitimate users.

SQL injection payloads generally contain strings of the form union select(...) or select concat (...). Thus, you could try to match this pattern with the following regular expression:

(?i)^<HOST> -.*"[^"]+(?:union[^"]+select[^"]*|select[^"]+concat[^"]*)(?:%%2[8C]|[,(])

The <HOST> part defines the location of the IP address in the log entry, and the (?i) states that the regular expression is case insensitive.

The [^"] in the regex ensures that the matched text is enclosed in double quotes. This helps to ensure that the regular expression matches just the request header and nothing else. The (?:%%2[8C]|[,(]) specifies that the union select or select concat matched is followed by a comma(,) or a parenthesis((), either directly or in their percent-encoded form.

Bots scanning for open proxies often send requests of the form of: - - [17/Jan/2017:14:10:41 +0000] "GET HTTP/1.1" 400 3494 "-" "Mozilla" - - [17/Jan/2017:14:10:44 +0000] "CONNECT" 400 3499 "-" "Mozilla"

A regular expression such as the one below can match these easily:

(?i)^<HOST> -.*"(?:(?:GET|POST|HEAD) https?:|CONNECT [a-z0-9.-]+:[0-9]+)

The regular expression (?:(?:GET|POST|HEAD) https?: matches requests of the first type, whereas the regular expression CONNECT [a-z0-9.-]+:[0-9]+ matches requests of the second type.

Bots scanning for Shellshock often send out requests like: - - [17/Jan/2016:16:00:00 +0000] "GET /cgi-bin/printenv.cgi HTTP/1.0" 200 1 "-" "() { test;};echo \"Content-type: text/plain\"; echo; echo; /bin/rm -rf /var/www/"

A regular expression like this will match the pattern for Shellshock:

<HOST> -.*"\(\)\s*\{[^;"]+[^}"]+}\s*;

Here, we are checking for the () { <command>; } pattern, and the \s accounts for whitespaces that may be present in the malicious request.

Combining them, we can now write our filter:


failregex = <HOST> -.*"\(\)\s*\{[^;"]+[^}"]+}\s*;
            (?i)^<HOST> -.*"[^"]+(?:union[^"]+select[^"]*|select[^"]+concat[^"]*)(?:%%2[8C]|[,(])
            (?i)^<HOST> -.*"(?:(?:GET|POST|HEAD) https?:|CONNECT [a-z0-9.-]+:[0-9]+)

ignoreregex =

The ignoreregex allows you to whitelist entries. You can add ignored regexes just like the way it has been done for failregex.

Save the above filter into /etc/fail2ban/filter.d/badbot.local

Defining the jail

Having defined the filter, it’s now time to define the jail. Here, we block the IP address for six minutes, if it sends three such requests within a span of six minutes.

Add this to your /etc/fail2ban/jail.local file:


enabled   = true
port      = http,https
filter    = badbot
logpath   = /var/log/apache*/*access.log
maxretry  = 3
banaction = iptables-multiport
findtime  = 360
bantime   = 360

Having configured the jail, you should restart fail2ban for these changes to take effect. Depending on your distribution, one of these commands would be needed to restart it:

sudo systemctl restart fail2ban
sudo service fail2ban restart

fail2ban will now block all attempts made by bad bots trying to attack your web server. You can also extend these rules for your another web server, or for other kinds of attacks that you might see.

If you liked this post, please share it :)

You may also like...