Category Archives: AWS

Protecting an Open DNS Resolver

As another piece of work I’ve been doing for the excellent Strongarm anti-malware team we recently converted the service so that it can be used to get instant protection wherever you are. Part of this involved my work in converting the core (customized) DNS server into an open resolver. This is usually strongly advised against as you can unwittingly become part of some very serious Denial of Service attacks, however in this blog post I show you how to implement some pretty simple restrictions and limitations to prevent this from happening so you can run a DNS open resolver without running this risk.

Here’s a copy of the article:

One of the challenges of running an open DNS resolver is that it can be used in a number of different attacks, compared to a server that is only allowed access from a known set of IPs. One of the most well known is the DNS amplification attack. As this article explains, “The fact that a DNS reply may be many times larger than a DNS query allows the attacker to achieve amplification by spoofing a relatively small query that is known to generate a large answer in response”. That means that if I can send a DNS question that takes 50 bytes, and I send it pretending to be the computer that I want to attack, and the answer to that question is 1000 bytes, then I have effectively multiplied the traffic that I can attack with by 20 times. Especially as DNSSEC (Domain Name System Security Extensions) become more common, the RRSIG and DNSKEY DNS response codes can contain a lot of data that can be used in this type of attack.

In this post, I’d like to present a couple of ways to easily protect your open DNS resolver from being involved in malware attacks like the DNS amplification attack.

Configuring a DNS Resolver

Many DNS servers, or frontends such as PowerDNS or dnsdist, have the built-in or user-configurable ability to limit some types of attacks. In the case of dnsdist, the loadbalancer sits in front of the DNS servers and monitors the traffic going to and from them in order to blacklist hosts that are abusing the platform.

However, when configuring this within Strongarm’s servers, we wanted the ultimate scalability and flexibility on our DNS infrastructure, so we decided not to use dnsdist but instead use a pure networking approach. Here are a few steps that you can take to protect your DNS infrastructure no matter whether you use a DNS loadbalancer or servers interfacing directly to the internet.

The first step you can take in protecting your server is to ensure that ANY queries cannot be used in an attack. An ANY query returns all the records of a particular domain so naturally it returns more data than a standard query. This is usually easy to configure with an option like ‘any-to-tcp’ in PowerDNS. This setting says that if the recursive server receives an ANY query, it will automatically send back a small redirect: “TCP is required”.

To understand why this helps prevent attacks we need to understand the following three things.

  1. An ANY query will usually return larger responses as it asks for all records under a particular domain.
  2. 99% of the time, an ANY query is not legitimate traffic. Usually, a host will only want a specific type of record such as A or MX.
  3. Whereas it’s easy to spoof UDP traffic, it’s virtually impossible to spoof TCP. This is because establishing a TCP connection requires a 3-way handshake. For example, if the client says “I’d like to open a connection”, and the server says “Okay, you’d like to open a connection, it’s now open”, then the client says, “Thanks, the connection is now open”. While you can spoof the initiation of the connection, when the server says “Okay, you’d like to open a connection, it’s now open,” the host that has been spoofed will reply “What?! I didn’t ask to open a connection!” and it won’t go any further.

Putting this all together, we can see that this can be a very effective preventative measure for abusing an open DNS resolver. Legitimate clients will fall back to using TCP and attackers will simply give up. We can’t use this for all connections because having to do every DNS lookup over TCP would noticeably slow down internet browsing speed, but we can do this easily enough on connections that have a high probability of being attack traffic.

In a similar vein, another useful option for many DNS servers is the ability to limit the size of a return packet over UDP. Typically, you would configure this to say, “If the return packet is more than X bytes, send a TCP redirect and only allow this over TCP.”

Firewall Limiting of Potential Attack Traffic

In addition to doing the above, we implemented a pure firewall-based approach to throttling attack traffic. To do this, we needed to configure our firewall to be stateless, as we described how to do in a previous post.

As opposed to dnsdist or other frontend servers, this allows you to deploy either on a single server or on a frontend router that covers multiple resolvers. This also should be much more efficient as all processing occurs in-kernel via netfilter rather than having to go through a program which may crash or be somehow limited in the speed at which it can process data. As we showed in a previous post this is very efficient at packet processing.

We start by creating an ‘ipset’ of IPs that we have currently blacklisted. We’ll use the ‘timeout’ option to specify that after we have added an IP into this blacklist, it will automatically expire after a certain time. We’ll also limit it to a maximum 100,000 IPs so that an attacker cannot use this to take our server offline:

ipset create throttled-ips hash:ip timeout 600 family inet maxelem 100000

Then, if an IP is on this list, we’ll block it from doing any UDP traffic to our server:

iptables -t raw -A PREROUTING -p udp -m set --match-set throttled-ips src -j DROP

Now for the clever part: we’ll look for DNS responses that are over a certain threshold packet size (700 bytes) and start monitoring them to see the rate at which someone is sending them:

iptables -N LARGE_DNS_PACKET_TRACKING # Create the destination chain
iptables -A OUTPUT -p udp --sport 53 \
        -m length --length 700:0xffff \
        -j LARGE_DNS_PACKET_TRACKING

This points to a new iptables chain called “LARGE_DNS_PACKET_TRACKING” which we’ll set up as follows:

iptables -A LARGE_DNS_PACKET_TRACKING -m hashlimit --hashlimit-mode dstip --hashlimit-dstmask 32 \
   --hashlimit-upto 50kb/min --hashlimit-burst 10 --hashlimit-name large-dns-packets --hashlimit-htable-max 100000 \
   -j ACCEPT

This first rule allows up to 50kb of large DNS responses per minute to a single IP (the 32 means a /32, i.e. a single IP address), and always allows the first 10 large response packets through. Again, it tracks, at most, 100,000 IPs in order to avoid an attack vector against our server.

After a host goes over this threshold, we’ll pass the traffic through to the next stage of the chain:

iptables -A LARGE_DNS_PACKET_TRACKING -j SET --add-set throttled-ips dstip --timeout 600 --exist

This is where the magic happens. If the client breaches the threshold set above, then it will add its IP to the ipset we created earlier, meaning that it will be blocked for 10 minutes. Finally, let’s note this in the system log and then drop the packet:

iptables -A LARGE_DNS_PACKET_TRACKING -j LOG --log-prefix "DNS-amplification protection: "
iptables -A LARGE_DNS_PACKET_TRACKING -j DROP

Conclusions

With the right protection in place, it’s not such a bad thing to run an open DNS resolver on the internet. If you look in your server’s configuration manual, you should find a few options that can also help in preventing attacks. Additionally, we recommend setting up a firewall-based system like I detailed above so that you can limit the amount of traffic you send out. Otherwise, you may easily find your server being disconnected by your ISP for being part of an attack.

Testing the Performance of the Linux Firewall

Over on the Strongarm blog I’ve got an in-depth article about testing the performance of the Linux firewall. We knew it was fast, but how fast? The answer is perhaps surprisingly “significantly faster than the kernel’s own packet handling” – blocking packets to an un-bound UDP port was 2.5* faster than accepting them, and in that case a single-core EC2 server managed to process almost 300kpps. We also tested the performance of blocking DDoS attacks using ipset and/or string matching.

I’ve archived the article below:

As we mentioned in a previous post, at Strongarm, we love the Linux firewall. While a lot of effort has gone into making the firewall highly flexible and suitable for nearly every use case, there is very little information out there about the actual performance of the firewall. There are many studies on the performance of the Linux network stack, but they don’t get to the heart of the performance of the firewall component itself.

With the core part of the Strongarm product focused on making highly reliable DNS infrastructure, we thought we’d take a look at the Linux firewall, especially as it relates to UDP performance.

Preparing to Test

To begin, we wrote a small C program that sends 128-byte UDP packets very efficiently to the loopback device on a single source/destination port. We tested this with one of the latest Ubuntu 16.04 4.4-series kernels on a small single-core EC2 instance. We ran each test multiple times over a period of 30 seconds to ensure that the CPU was pegged at 100% throughout, verifying that we were CPU-bound. We then averaged the performance results to present the values shown here, although, because this was all software-based on the same box, there was very low deviation from the average between test runs.

Here are the results we found and some potential conclusions you can draw from them to optimize your own firewall.

NOTRACK Performance

First, we ran the packet generator with no firewall installed but with the connection tracking modules loaded. We saw a rate of 87k PPS (Packets Per Second). Then, we added the following rules to disable connection tracking on the UDP packets (e.g. converting our firewall to be stateless):

iptables -t raw -I PREROUTING -j NOTRACK
iptables -t raw -I OUTPUT -j NOTRACK

With this, we saw the firewall perform at 106k PPS, or a 20% speedup, as I first reported in this blog post. We then disabled connection state tracking for all future tests.

Conclusion: Where possible, use a stateless firewall.

Maximum Possible Performance

We are primarily interested in the performance of blocking UDP based attacks, so we focused on testing various DROP rules with connection state tracking disabled.

First, to get a baseline performance with regards to packet dropping, we wrote a simple rule to drop packets and put it as the first entry on the very first part of the iptables processing stack, the “raw PREROUTING” table:

iptables -t raw -I PREROUTING -p udp -i lo -j DROP

This produced a baseline performance of 280k PPS for the firewall when dropping packets! This is a massive 260% increase in performance over just having a blank firewall in place. This shows that by putting blocks earlier on in the process, you can bypass kernel work, resulting in a significant speedup.

We then did the same drop, but on the default ‘filter’ table:

iptables -I INPUT -p udp -i lo -j DROP

This produced a performance of 250k PPS, or roughly a 12% performance drop.

Conclusion: Drop unexpected packets in your firewall, rather than letting them go further down the network stack. It doesn’t matter as much if you put them in the ‘raw’ table or the default ‘filter’ table, but raw does give slightly better performance.

When we redid this with the ‘raw’ table and inserted 5 non-matching rules, the performance dropped to 250k PPS, or 12% less compared to the baseline.

Additionally, we concluded from this that you should try to put your most common rules towards the top of your firewall, as position does make a difference. However, it is not too significant.

ipset Performance

If you’re under attack and you insert a rule in the firewall for every single IP address you wish to block, these rules are processed one-at-a-time for each packet. So instead of doing this, you can use the ipset command to create a lookup table of IPs. You can specify how these are to be stored, but if you use the most efficient option (a hash) you can make operations on sets of IPs scale very well. At least, this is the theory, so let’s see how it performs in practice:

ipset create blacklist hash:ip counters family inet maxelem 2000000
iptables -t raw -A PREROUTING -m set --match-set blacklist src -j DROP

First, we just added in the loopback’s IP address so it would behave the same as the baseline. This produced a result of 255k PPS, or 10% slower than the baseline of blocking all traffic on this interface. Then, we added another 65k entries to the blacklist using the following small script:

perl -E 'for $a (0..255){ for $b (0..255) { say "add blacklist 1.1.$a.$b" } }' |ipset restore

The result was again 255k PPS. We added another 200k entries and the performance stayed the same.

Conclusion: Using ipset’s to do operations on groups of IPs is very efficient and scales well, especially for blacklisting groups of IP addresses.

String Matching Performance

One of the more advanced abilities of the Linux firewall is filtering packets based on looking at the contents (packet inspection). I have often have seen DNS DDoS attacks performed against a particular domain. In this case, matching the string and blocking all packets containing this string is one of the easiest ways to stop an attack in its tracks. But, I assumed that performing a string search in every packet would be pretty inefficient:

iptables -t raw -I PREROUTING -p udp -m string --algo kmp --hex-string 'foo.com' -j DROP

(Note that this is just fake traffic. Real DNS traffic doesn’t encode domains with dot separators)

There are two possible algorithms (kmp and bm) in the string matching module. The UDP generator was sending 128-byte packets. The results are: bm handling 255k PPS traffic (10% lower than baseline) and kmp: handling 235k PPS (17% lower).

Conclusion: String-based packet filtering was much more efficient than we expected and a great way to stop DDoS attacks where a common string can be found. You’ll want to use the ‘bm’ algorithm for best performance.

Hashlimit Performance

Finally, we tested the performance of hashlimit, which is a great first layer of defense against DDoS attacks. We will say “Limit each /16 (65,000 server) block on the internet to only be allowed to send 1000 PPS of UDP traffic”:

iptables -t raw -A PREROUTING -p udp -m hashlimit --hashlimit-mode srcip --hashlimit-srcmask 16 --hashlimit-above 1000/sec --hashlimit-name drop-udp-bursts -j DROP

Bearing in mind that our server only handles 106k PPS of traffic not blocked by the firewall, we would expect a ~1% performance drop just because we’re now allowing 1k PPS through, so we’ll use a baseline of, say, 275k PPS. With this rule, we saw 225k PPS processed by the server (1k PPS accepted, the rest rejected), so 20% lower performance, but it will give quite a bit of protection against a DDoS attack.

Conclusion: Consider using the hashlimit module to do basic automatic anti-DDoS protection on your servers. The 20% overhead is generally worth it.

Summary of Results

Method Performance (PPS) Cost (%)
Accepting all packets
Connection tracking disabled 106k
Connection tracking enabled 87k 20%
Rejecting all packets
Dropping at very start of raw table 280k
Dropping after 5 rules 250k 12%
Dropping in normal (filter) table 250k 12%
Ipset (1-250k entries) 255k 10%
String matching: bm 255k 10%
String matching: kmp 235k 17%
Hashlimit (1k PPS accepted) 225k 20%

Drawing Final Conclusions

Linux iptables are incredibly flexible, and for the most part, very scalable. Even on one of the smallest single-core virtual machines in EC2, the Linux firewall can handle filtering over 250k PPS. In fact, if you put rules right at the entry of the netfilter stack, you can reject packets over twice as fast as the kernel itself can process them. Even some of the modules which have greater algorithmic complexity, for example hashlimit and string matching, performed much better than we expected.

Linux Stateless Firewalling for High Performance

I’m currently doing a fun bit of consulting on high performance Linux with a great company called Strongarm. I’ve written a post on their blog about we went about adapting a standard linux firewall to make it much more efficient and less resilient to DDoS attack. In short, remove the connection tracking modules and easily do it yourself – but watch out for hidden traps especially on the AWS EC2 platform because it uses jumbo frames!

I’ve archived the content below:

When we were building Strongarm, we came across an interesting challenge that we hadn’t seen addressed before: how to make a Linux stateless firewall that guarantees performance and resilience. Below, I’ll explain how we went about exploring and eventually solving this problem and offer some specific tips you can apply if you are trying to achieve something similar.

Stateful Connection Tracking and Its Issues

I love Linux, and in particular its firewalling capability. The excellent iptables utility has so many extensions and features it’s hard to keep track of everything that it can do. However, one of the issues I’ve seen many times on high performance systems is that, while the Linux firewall behaves excellently in many common situations, there are some loads where it can severely degrade performance and even cripple your server.

By default, as soon as you load iptables, stateful connection tracking is enabled. This allows you to build a firewall that will totally protect your computer as simply as:

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -P INPUT DROP

The first command allows any connections that are already established to be able to continue when the responses come in, and the second one says to drop anything else by default. This will let you talk to anyone else on the network, but not let anyone else talk to you. There are obviously much more complex firewalling setups, but unless you explicitly disable connection tracking, it will always be there keeping a list of which connections have been established to or from your computer and what their current states are.

Usually this is not a problem, but under high performance situations or in the event of  an attack, you can sometimes see the dreaded “ip_conntrack: table full, dropping packet” error in the kernel logs. This probably means that someone tried to connect to your server but couldn’t. In other words, it seems to at least some part of the internet that your server has gone offline! The purpose of this is to prevent DoS attacks. Linux limits the number of connections that it tracks so that it doesn’t use all of the system’s memory. You can see what your connection limit is on newer kernels by running:

$ cat /proc/sys/net/nf_conntrack_max
262144

(If you’re having this issue right now, as a temporary workaround, you can write a larger value, perhaps echoing 5000000 into that file to raise the limit. For a longer-term solution, please read on.)

Being able to track 250,000 connections simultaneously might seem like quite a lot, but what exactly is a connection? The answer is relatively easy to define for a stateful protocol like TCP, however in a stateless protocol such as UDP or ICMP, all you can do is say, “We didn’t see a packet in the past X seconds – I guess we don’t have a connection anymore.” How many seconds by default?

$ cat /proc/sys/net/netfilter/nf_conntrack_udp_timeout
30

Now, protocols such as UDP or ICMP are easily spoofed (i.e. the source address can be easily faked). Since UDP has 65,000 ports, this basically means that if you can send roughly 50,000 UDP packets (one per port) from 5 servers or spoofed addresses in a period of less than 30 seconds, you will cause the Linux connection tracking table to overflow and effectively block any other connections for that time. This is pretty easy to do. A UDP packet only needs to be 28 bytes, and 28 * 250,000 = 7Mb, or 60Mbit of data. In other words, on a 100Mbps connection, or from 10 people’s 10Mbps connections, you can send enough data in about 0.5 seconds to take a server offline for 29 seconds. Oops!

It’s not just deliberate attacks that can cause this, either. Many services that simply get lots of connections‚especially DNS servers (because they work over UDP)—can easily hit this limit because they’ve been left with stateful connection tracking enabled by default. Fortunately, this is quite straightforward to disable, and as long as you do it the correct way, there should be no downside to it. Moreover, as you might imagine, tracking a number of connections can take quite a lot of processing power, which is another reason to disable stateful connection tracking. In a simple test of UDP traffic we were able to achieve a 20% performance increase by disabling stateful connection tracking on our firewall.

There are, of course, some situations where you can’t use a stateless firewall. The main scenario in which a stateless firewall won’t work arises when you need to NAT traffic. This can be the case, for example, when you are configuring the server as a router. In this case, the kernel needs to keep track of all the connections flowing through the router. However, for many situations, you can convert your firewall to be stateless with very little hassle. Below, we’ll explain how to do this.

How to Convert a Stateful Firewall to Stateless

Let’s take a slightly more complex example than above for a web server:

# Allow any established connections, dropping everything else
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -P INPUT DROP
iptables -P OUTPUT DROP

# Allow remote ssh and http access
iptables -A INPUT -p tcp --dport 22 -j ACCEPT # ssh
iptables -A INPUT -p tcp --dport 80 -j ACCEPT # http

# Allow DNS lookups to be initiated from this server
iptables -A OUTPUT -p udp --dport 53 -j ACCEPT # dns
iptables -A OUTPUT -p tcp --dport 53 -j ACCEPT # dns

Simple enough to understand (hopefully).

The first thing to do is to allow TCP (stateful) connections to keep working as they always have, but without tracking their state. We can do this by changing our “-m state” lines to look something like:

iptables -A INPUT -p tcp \! --syn -j ACCEPT
iptables -A OUTPUT -p tcp \! --syn -j ACCEPT

This means, “If the TCP connection is already established, let it through,” (i.e. it doesn’t have the SYN flag set). This will mean that all TCP connections work exactly as before.

However, because we don’t have a state for UDP connections, we have to flip the rules around. For example, in the above code, we are saying, “Allow outbound connections to port 53,” so we now need to add a rule that also states, “Allow inbound connections from port 53.” This means you will need to add the following rule:

iptables -A INPUT -p udp --sport 53 -j ACCEPT # dns responses

One other thing that recently caught us by surprise is that you need to allow certain types of ICMP signalling traffic:

iptables -A INPUT -p icmp --icmp-type destination-unreachable -j ACCEPT

(Your stateless firewall will work fine without this 99% of the time. However, at Strongarm, we hit an issue on EC2 with our servers when using jumbo frames (packet size 9000) that were trying to communicate with the internet (packet size 1500) over the https protocol. EC2 tried to tell us using ICMP to make our packets smaller (the pmtu protocol), but because we were dropping it all automatically, we didn’t receive those packets, and so we couldn’t speak https with the internet. D’oh!)

Finally, we add in the magic that tells conntrack to not run, using the special “raw” table:

iptables -t raw -I PREROUTING -j NOTRACK
iptables -t raw -I OUTPUT -j NOTRACK

And you’re done. For an extra few lines of firewall code, you can achieve a 20% performance improvement in packet processing speed, lower memory usage, greater resistance to withstand DoS attacks, and much better scalability.

Now let’s put this all together to give you a final script that you can use to build an awesome firewall:

# Stateless firewall!
iptables -t raw -I PREROUTING -j NOTRACK
iptables -t raw -I OUTPUT -j NOTRACK

# Allow any established connections, dropping everything else
iptables -A INPUT -p tcp \! --syn -j ACCEPT
iptables -A OUTPUT -p tcp \! --syn -j ACCEPT
iptables -P INPUT DROP
iptables -P OUTPUT DROP

iptables -A INPUT -p icmp --icmp-type destination-unreachable -j ACCEPT # icmp routing messages

# Allow remote ssh and http access
iptables -A INPUT -p tcp --dport 22 -j ACCEPT # ssh
iptables -A INPUT -p tcp --dport 80 -j ACCEPT # http

# Allow DNS lookups to be initiated from this server
iptables -A OUTPUT -p udp --dport 53 -j ACCEPT # dns
iptables -A OUTPUT -p tcp --dport 53 -j ACCEPT # dns
iptables -A INPUT -p udp --sport 53 -j ACCEPT # dns responses

Automounting swap on local SSD’s on Amazon EC2

Many instances on EC2 (AWS) now have local SSD’s attached. The excellent ubuntu 14.04 image boots brilliantly on these and automatically formats and mounts any of the local SSD storage. However when the instance shuts down, reboots or gets migrated these SSD’s go away so you still need to use the persistent EBS storage for most operations.

If you want to enable swap on the box, add the following to /etc/rc.local – it will create a 2gb swap file each boot on the local SSD and mount it:

dd if=/dev/zero of=/mnt/swapfile bs=1M count=2048
chmod 600 /mnt/swapfile
mkswap /mnt/swapfile
swapon /mnt/swapfile

I’ve not yet figured out what the process is to format/mount these local disks on bootup it may well be easier to add this to them.