Linux Stateless Firewalling for High Performance

I’m currently doing a fun bit of consulting on high performance Linux with a great company called Strongarm. I’ve written a post on their blog about we went about adapting a standard linux firewall to make it much more efficient and less resilient to DDoS attack. In short, remove the connection tracking modules and easily do it yourself – but watch out for hidden traps especially on the AWS EC2 platform because it uses jumbo frames!

I’ve archived the content below:

When we were building Strongarm, we came across an interesting challenge that we hadn’t seen addressed before: how to make a Linux stateless firewall that guarantees performance and resilience. Below, I’ll explain how we went about exploring and eventually solving this problem and offer some specific tips you can apply if you are trying to achieve something similar.

Stateful Connection Tracking and Its Issues

I love Linux, and in particular its firewalling capability. The excellent iptables utility has so many extensions and features it’s hard to keep track of everything that it can do. However, one of the issues I’ve seen many times on high performance systems is that, while the Linux firewall behaves excellently in many common situations, there are some loads where it can severely degrade performance and even cripple your server.

By default, as soon as you load iptables, stateful connection tracking is enabled. This allows you to build a firewall that will totally protect your computer as simply as:

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -P INPUT DROP

The first command allows any connections that are already established to be able to continue when the responses come in, and the second one says to drop anything else by default. This will let you talk to anyone else on the network, but not let anyone else talk to you. There are obviously much more complex firewalling setups, but unless you explicitly disable connection tracking, it will always be there keeping a list of which connections have been established to or from your computer and what their current states are.

Usually this is not a problem, but under high performance situations or in the event of  an attack, you can sometimes see the dreaded “ip_conntrack: table full, dropping packet” error in the kernel logs. This probably means that someone tried to connect to your server but couldn’t. In other words, it seems to at least some part of the internet that your server has gone offline! The purpose of this is to prevent DoS attacks. Linux limits the number of connections that it tracks so that it doesn’t use all of the system’s memory. You can see what your connection limit is on newer kernels by running:

$ cat /proc/sys/net/nf_conntrack_max
262144

(If you’re having this issue right now, as a temporary workaround, you can write a larger value, perhaps echoing 5000000 into that file to raise the limit. For a longer-term solution, please read on.)

Being able to track 250,000 connections simultaneously might seem like quite a lot, but what exactly is a connection? The answer is relatively easy to define for a stateful protocol like TCP, however in a stateless protocol such as UDP or ICMP, all you can do is say, “We didn’t see a packet in the past X seconds – I guess we don’t have a connection anymore.” How many seconds by default?

$ cat /proc/sys/net/netfilter/nf_conntrack_udp_timeout
30

Now, protocols such as UDP or ICMP are easily spoofed (i.e. the source address can be easily faked). Since UDP has 65,000 ports, this basically means that if you can send roughly 50,000 UDP packets (one per port) from 5 servers or spoofed addresses in a period of less than 30 seconds, you will cause the Linux connection tracking table to overflow and effectively block any other connections for that time. This is pretty easy to do. A UDP packet only needs to be 28 bytes, and 28 * 250,000 = 7Mb, or 60Mbit of data. In other words, on a 100Mbps connection, or from 10 people’s 10Mbps connections, you can send enough data in about 0.5 seconds to take a server offline for 29 seconds. Oops!

It’s not just deliberate attacks that can cause this, either. Many services that simply get lots of connections‚especially DNS servers (because they work over UDP)—can easily hit this limit because they’ve been left with stateful connection tracking enabled by default. Fortunately, this is quite straightforward to disable, and as long as you do it the correct way, there should be no downside to it. Moreover, as you might imagine, tracking a number of connections can take quite a lot of processing power, which is another reason to disable stateful connection tracking. In a simple test of UDP traffic we were able to achieve a 20% performance increase by disabling stateful connection tracking on our firewall.

There are, of course, some situations where you can’t use a stateless firewall. The main scenario in which a stateless firewall won’t work arises when you need to NAT traffic. This can be the case, for example, when you are configuring the server as a router. In this case, the kernel needs to keep track of all the connections flowing through the router. However, for many situations, you can convert your firewall to be stateless with very little hassle. Below, we’ll explain how to do this.

How to Convert a Stateful Firewall to Stateless

Let’s take a slightly more complex example than above for a web server:

# Allow any established connections, dropping everything else
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -P INPUT DROP
iptables -P OUTPUT DROP

# Allow remote ssh and http access
iptables -A INPUT -p tcp --dport 22 -j ACCEPT # ssh
iptables -A INPUT -p tcp --dport 80 -j ACCEPT # http

# Allow DNS lookups to be initiated from this server
iptables -A OUTPUT -p udp --dport 53 -j ACCEPT # dns
iptables -A OUTPUT -p tcp --dport 53 -j ACCEPT # dns

Simple enough to understand (hopefully).

The first thing to do is to allow TCP (stateful) connections to keep working as they always have, but without tracking their state. We can do this by changing our “-m state” lines to look something like:

iptables -A INPUT -p tcp \! --syn -j ACCEPT
iptables -A OUTPUT -p tcp \! --syn -j ACCEPT

This means, “If the TCP connection is already established, let it through,” (i.e. it doesn’t have the SYN flag set). This will mean that all TCP connections work exactly as before.

However, because we don’t have a state for UDP connections, we have to flip the rules around. For example, in the above code, we are saying, “Allow outbound connections to port 53,” so we now need to add a rule that also states, “Allow inbound connections from port 53.” This means you will need to add the following rule:

iptables -A INPUT -p udp --sport 53 -j ACCEPT # dns responses

One other thing that recently caught us by surprise is that you need to allow certain types of ICMP signalling traffic:

iptables -A INPUT -p icmp --icmp-type destination-unreachable -j ACCEPT

(Your stateless firewall will work fine without this 99% of the time. However, at Strongarm, we hit an issue on EC2 with our servers when using jumbo frames (packet size 9000) that were trying to communicate with the internet (packet size 1500) over the https protocol. EC2 tried to tell us using ICMP to make our packets smaller (the pmtu protocol), but because we were dropping it all automatically, we didn’t receive those packets, and so we couldn’t speak https with the internet. D’oh!)

Finally, we add in the magic that tells conntrack to not run, using the special “raw” table:

iptables -t raw -I PREROUTING -j NOTRACK
iptables -t raw -I OUTPUT -j NOTRACK

And you’re done. For an extra few lines of firewall code, you can achieve a 20% performance improvement in packet processing speed, lower memory usage, greater resistance to withstand DoS attacks, and much better scalability.

Now let’s put this all together to give you a final script that you can use to build an awesome firewall:

# Stateless firewall!
iptables -t raw -I PREROUTING -j NOTRACK
iptables -t raw -I OUTPUT -j NOTRACK

# Allow any established connections, dropping everything else
iptables -A INPUT -p tcp \! --syn -j ACCEPT
iptables -A OUTPUT -p tcp \! --syn -j ACCEPT
iptables -P INPUT DROP
iptables -P OUTPUT DROP

iptables -A INPUT -p icmp --icmp-type destination-unreachable -j ACCEPT # icmp routing messages

# Allow remote ssh and http access
iptables -A INPUT -p tcp --dport 22 -j ACCEPT # ssh
iptables -A INPUT -p tcp --dport 80 -j ACCEPT # http

# Allow DNS lookups to be initiated from this server
iptables -A OUTPUT -p udp --dport 53 -j ACCEPT # dns
iptables -A OUTPUT -p tcp --dport 53 -j ACCEPT # dns
iptables -A INPUT -p udp --sport 53 -j ACCEPT # dns responses