Tracking down Lua JSON decoding issues

I’ve recently been doing quite a bit of Lua scripting for a client wanting some PowerDNS customizations. I’ve actually grown to quite like Lua, even though it’s very simple and quick, you can do some very complex programming with it reasonably straight forwardly. I think it could perhaps be compared to a stripped-down version of Perl which is also a language that I very much like because of its incredible flexibility.

Anyway, as part of this work are wanting to look up incoming IP addresses in a table of non-overlapping IP address ranges. For high performance I recommended LMDB as I’ve used it extensively before and I know that for its quirks and tendency to crash if you mishandle any aspect of its API, it is very high performance, low over head, scales very well to multiple cores, and can do pretty much anything you ask of it.

So basically the problem was “how do we store an IP range as an indexed key in LMDB” (which is just a key-value database where all keys are b-tree indexed). In the future we may want to support IPv6, and we may also want to support IP ranges which cannot be expressed in subnet-mask representation. The solution I came up with is to store the first IP in raw binary format (ie 4 bytes for IPv4, or 16 bytes for IPv6) as the key, and then as part of the value we store the end IP address. In order to see if a given IP is within a subnet, you look up you open a cursor on the table, seek to the position of the IP you are trying. If you get a direct hit, then obviously it has found the first IP in the subnet and so you know it is valid. If it does not get a direct hit you seek back to the previous entry (this is a great feature of LMDB and is found in surprisingly few indexed key-value data store APIs, even though it should be very simple to implement). You then take the value of that, get the end IP of the range and check to see if the requested IP is within the start and end of the range.

Because we wanted a very flexible and easily extensible data storage format for the values in this table we decided to encode it all as JSON. Lua has a number of JSON decoders and lua-cjson seemed pretty quick and easy, and was also available as a pre-built ubuntu package so we went with that. As we were storing the key’s IP address in raw binary notation, we figured it would make the code-path simplest if we stored the end IP address in the same manner. So, we did this, wrote a test suite with some non-public IPv4 addresses (10.xxx and 127.xxx) and verified that it was all working correctly, and then launched the code.

A few days later we started getting some complaints from customers that some IP addresses in their network ranges were not being identified correctly. But when we added the exact same details into the test suite with our private IP ranges, it clearly worked fine.

Finally I started trying to use the exact IP addresses that the customers were reporting issues with in the test scripts and discovered that there was actually a problem. Basically, whenever a component of the address was greater than 127 and the code did not go down the direct hit code path (ie the address was part of a subnet larger than a /32 and not the first entry) the decoded end IP address would be incorrect. Very strange! So, our test code which was using ranges like 127.0.0.1-127.1.2.3 worked fine, but an IP range like 1.0.0.0-1.0.129.0 would fail!

Looking more closely at the cjson Lua documentation I saw the line “cjson.decode will deserialise any UTF-8 JSON string into a Lua value or table”. And reading through the C code I saw that the routines were hard-coded to treat any JSON escaped \uXXXX value that was greater than 127 as part of a UTF-8 encoded character. This is because Lua uses the platform’s underlying char[] to store strings which means usually each character in a string can only be 8-bits, meaning that in order to store wider characters the bytes need to be encoded into a single character which is what UTF-8 is for. With our encoding we knew that all parts of the string would fit into 8-bits, but there was no way to tell the decoder this. Because cjson is aiming to be a fast module, this is all hard-coded and there is no way that I could see to easily work around this utf-8 decoding. We tried some other Lua JSON modules but they either had the same problem, or were orders of magnitude slower than cjson.

Eventually a colleague suggested just hex-encoding the end IP address prior to including it in the JSON data which was the simplest solution we could find. It should also reduce the storage required for an IP address as assuming 50% of the characters are usually encoded with a \uXXXX escape sequence in JSON, an average IPv4 address would take 14 bytes in the database, whereas with hex this would be a fixed 8 bytes per IPv4 address.

If the encoding program had been using perl we could probably have used some of the features of the JSON::XS module (specifically, the utf8 flag) to write characters directly as bytes into the string, which although is perhaps not technically valid JSON, from my reading of the Lua module should have bypassed the UTF-8 encoding of escaped values. However we wern’t using perl in our encoding routines so this wasn’t possible.

Angular 4 API service with automatic retries and Ionic 3 integration

In the bad old days of the web, you’d submit a form and if there was a problem with your internet connection it would loose the form and display an error page in the browser. These days you don’t need to worry about this quite so much, but handling errors with sending AJAX form-submits or other API requests is still a difficult topic. Fortunately, the way that Angular 4 uses Observables makes retrying requests quite a bit easier.

In the app I was building for a client recently, we wanted the default process flow to be as follows. Any API request should display a spinner (via Ionic 3), and send the request to the server. If we got an error like login failure then it should return this error to the client. If the error is with the network connection timing out it should automatically retry a couple of times. For other errors such as internal server (ie API side) or not connected at all, it should fail straight away. However if it was an API or network connection failure, it should display a popup prompting the user to opt to retry or cancel the request (eg ‘Turn your internet connection on and hit retry’) rather than making them hit a form resubmit button again.

As Observables remember all the data and options they were submitted with, it’s pretty easy to retry the request and there are a number of bits of code on the internet for this. However I couldn’t find any good examples of this being written in a reusable fashion, and with options of asking prompting the user without forgetting the request. So, here is an example of how you can do this within the framework of Ionic, however it should work in general for anything based on Observables especially under Angular 2+. Below I’ll walk through some of the harder parts of this code.

Create the API service (app/api.service.ts) looking like:

import { Injectable } from '@angular/core';
import { Http, Headers, Response } from '@angular/http';

import { Observable } from 'rxjs/Observable';
import { Subject } from 'rxjs/Subject';
import 'rxjs/add/operator/map';
import 'rxjs/add/operator/scan';
import 'rxjs/add/operator/do';
import 'rxjs/add/operator/delay';
import 'rxjs/add/operator/retryWhen';
import 'rxjs/add/operator/finally';
import 'rxjs/add/operator/delayWhen';
import 'rxjs/add/operator/timeout';
import 'rxjs/add/observable/throw';
import 'rxjs/add/observable/onErrorResumeNext';
 
@Injectable()
export class APIService {
  public inprogress_requests : Subject<any> = new Subject();

  public error_handler : (message :string, err:any) => Observable<any> = (err) => Observable.throw(err);

  private requests_active :number = 0;

  constructor (private http: Http) {}
 
  request(path: String, data = {}, options :any = {}): Observable<any> {
    if( !options.headers )
        options.headers = new Headers();
    options.headers.append('Content-Type', 'application/json');

    let base_url = this.config.baseApiUrl();

    let timeout = options.timeout || 10000;
    let max_retries = 'retries' in options ? options.retries : 3;

    let url = `${base_url}/api/${path}`;
    let request = this.http.post(
            url, JSON.stringify(data),
            {
                headers: options.headers,
            }
        )

        // Add a timeout and retry the request after specified time and 3
        // attempts (but only if it is a timeout error)
        .timeout(timeout)
        .retryWhen((errors) =>
            errors.scan( ( errorCount, err ) => {
                if( errorCount < max_retries && err.name == 'TimeoutError' )
                    return errors.delay(500);
                throw err;
            }, 0)
        );

    if( options.interceptor )
        request = request.do( options.interceptor );

    request = request.map(this.extractData);

    // User-visible error handler now
    if( options.auto_fail )
        request = request.onErrorResumeNext();  // enable mergeMap etc to keep working
    else
        request = request.retryWhen( (errors) =>
            errors.delayWhen( (error) => {
                let message = this.log_error(error);
                console.error( `URL was: ${url}, request body: ${stringified_data}` );
                return this.error_handler(message, error);
            })
        );

    if( !options.nonblocking ) {
        this.add_blocking_request( options.loading_msg ? { reason: options.loading_msg } : {} );
        request = request.finally( () => this.finish_blocking_request() );
    }

    return request;
  }

  private add_blocking_request(details :any = {}) {
    // Re-issue a request if the details have been updated
    if( this.requests_active++ == 0 || Object.keys(details).length ) {
        details.active = true;
        this.inprogress_requests.next( details );
    }
  }

  private finish_blocking_request() {
    if( --this.requests_active == 0 )
        this.inprogress_requests.next( { active: false } );
  }

  private extractData(res: Response) {
    // Decode errors will be handled automatically by Observable
    let body = res.json();
    return body || {};
  }

  private log_error(error: Response | any) {
    let errMsg: string;
    if (error instanceof Response) {

      // Ignore any decode errors
      let body :any = {};
      try {
        body = error.json() || {};
      } catch(e) {}

      const err = body.error || JSON.stringify(body);
      errMsg = `${error.status} - ${error.statusText || ''} ${err}`;

      // No internet, probably
      if( error.status == 0 ) {
        console.error(errMsg);
        errMsg = 'Your internet connection is offline. Please connect and hit retry';
      }
    } else {
      errMsg = error.message ? error.message : error.toString();
    }
    console.error(errMsg);
    return errMsg;
  }
}

Lets walk through some potentially confusing bits of this service.

The main request observable is the request variable, we perform actions on this (saving the result in the request variable again) as the user requests, Initially we just set the request to have a timeout (several multiples of time of the maximum time you expect the API to respond in, otherwise you may get multiple resubmissions of the same request if the API gets a bit laggy).

Then, we come to this piece of code:

        .retryWhen((errors) =>
            errors.scan( ( errorCount, err ) => {
                if( errorCount < max_retries && err.name == 'TimeoutError' )
                    return errors.delay(500);
                throw err;
            }, 0)
        );

This basically keeps a log of all the errors that occurred and each time there is an error with the request, it first checks to see how many times we already retried, and ensure that it was actually a timeout error (as opposed to an internal server error or so). If that was the case then it waits 500ms and retries, otherwise it re-throws the error which will cause the Observable to continue as an error response.

    if( options.auto_fail )
        request = request.onErrorResumeNext();  // enable mergeMap etc to keep working

If the user passes an auto_fail option to the request, we want the request to happily silently fail (perhaps we are just sending some usage stats to the server and we don’t want errors popping up about them). This basically returns a successful Observable whether or not it was actually a success so that it doesn’t short-circuit anything due to an error being raised.

However, under normal circumstances we want to raise a frontend error:

        request = request.retryWhen( (errors) =>
            errors.delayWhen( (error) => {
                let message = this.log_error(error);
                console.error( `URL was: ${url}, request body: ${stringified_data}` );
                return this.error_handler(message, error);
            })
        );

This code says to shell out to an external function (the error_handler function reference which can be set somewhere in the main code that builds the API) with the error, and expects it to return an item such as a Subject or a true/false value indicating whether the whole of the above work should be retried again or not. This is a bit messy – you should perhaps have multiple different instances of API depending on whether you want this functionality or not, but because the API is a global service and we want a standard piece of retry code I thought to put it like this. However because it needs to interact with the frontend, I set this elsewhere as I’ll show in a bit.

Finally, we want to wrapper most requests with some code to display a spinner (optionally with a message), unless it is a non-blocking request:

    if( !options.nonblocking ) {
        this.add_blocking_request( options.loading_msg ? { reason: options.loading_msg } : {} );
        request = request.finally( () => this.finish_blocking_request() );
    }

The add_blocking_request and finish_blocking_request issue an Observable message (via this.inprogress_requests) when there are requests active or when the last active request finishes, which avoids having the spinner popping on and off again every time a request is redone or a sub-request is triggered.

Finally, in the main app constructor we hook into these two Observables to do the UI-facing work (app/app.component.ts in ionic – this is ionic-specific but you should be able to replace with your own framework easily enough). Firstly, the spinner:

    // Loader needs creating each time it is displayed in ionic...
    let loader; 
    api.inprogress_requests.subscribe(
        details => {
            if( loader )
                loader.dismiss();
            loader = null;  
                            
            if( details.active ) {
                let loader_options :any = {};
                if( details.reason )
                    loader_options.content = details.reason;
                loader = loadingCtrl.create(loader_options);
                loader.present();
            }           
        }           
    );

Simple enough – if there is a loader get rid of it, and if there should be one then create it with the message. This enable us to update the message displayed easily enough although I’ve not really used this functionality much in the code I’ve written.

Finally, lets look at the dialogs presented to the user to prompt retries. This handler should be simple enough providing different dialogs and messages depending on what the error was exactly. Note that we are returning a Subject which we effectively use like a Promise to handle the asynchronous nature of user interaction with the dialog:

    // Handle errors with a popup and offer retry functionality
    api.error_handler =
        (message, error) => {
            // Just in case it is the first request..
            this.hide_splashscreen();

            let retry_subject = new Subject();
            let retry;

            // Unauthorized
            if( error.status == 401 ) {
                retry = alertCtrl.create({
                    title: 'Logged Out',
                    message: `You have been logged out and need to log in again`,
                    buttons: [
                        {
                            text: 'OK',
                            handler: () => {
                                retry.dismiss().then( () => this.navCtrl.push('login') );
                                retry_subject.error( error );
                                retry_subject.complete();

                                return false;
                            }
                        },
                    ]
                });
            } else {
                let title = 'Server Error';
                let display_message = `We got an error from the remote server: ${message}. Do you want to retry?`;

                // 400's are nicer errors - not a server code issue but a user input problem most likely
                if( error.status == 400 || error.status == 0 ) {    // 0 = no internet
                    display_message = `${message}. Do you want to retry?`;
                    title = "Error";
                }
                retry = alertCtrl.create({
                    title,
                    message: display_message,
                    buttons: [
                        {
                            text: 'No',
                            handler: () => {
                                retry.dismiss();
                                retry_subject.error( error );
                                retry_subject.complete();
                                return false;
                            }
                        },
                        {
                            text: 'Retry',
                            handler: () => {
                                retry.dismiss();
                                retry_subject.next( 1 );
                                retry_subject.complete();
                                return false;
                            }
                        }
                    ]
                });
            }
            retry.present();
            return retry_subject;
        };

Programming ESP8266 from the CHIP

The CHIP is a powerful $9 computer. I saw them online and ordered 5 of them some time ago as part of a potential home automation project, and because it’s always useful to have some small linux devices around with GPIO ability. I’ve recently been playing a lot with ESP8266 devices (more on this in some future blog posts), and I’ve been using the CHIP to program them via a breadboard and the serial port header connectors (exposed as ttyS0) and esptool.py. So far so good.

However, I want to put the CHIP devices into small boxes around the house and use something like find-lf for internal location tracking based on Wifi signals emitted from phones and other devices to figure out who’s in which room. Whilst the CHIP has 2 wifi devices (wlan0, wlan1) it doesn’t allow one to run in monitor mode while the other is connected to an AP. This means we need an extra Wifi card to be in monitor mode, and as I had a number of ESP8266’s lying around, I thought I’d write a small program to just print MAC and RSSI (signal strength) via the serial port.

As these devices will be in sealed boxes I don’t want to have to go fiddling around with connectors on a breadboard to update the ESP8266 firmware, so I came up with a minimal design to allow reprogramming ESP8266 on-the-fly from CHIP devices (should work on anything with a few GPIO ports). Obviously the ESP8266 does have OTA update functionality, however as these devices will be in monitor mode I can’t use that. As the CHIP works at 3.3v, the same as ESP8266 chips this was pretty straight forwards involving 6 cables and 2 resistors, there were a few steps and gotchas to be aware of first though.

The main issue preventing this from working is that when the CHIP first boots up, the uBoot software listens for input for 2 seconds via ttyS0 (the serial port exposed on the header, not the USB one). When power first comes on, the ESP8266 will always output some bootloader messages via the serial port which means that the CHIP would never boot. Fortunately the processor has a number of different UARTs, a second one that is optionally exposed via the headers. You can read all about the technical details on this thread. In short, to expose the second serial port you need to download this dtb from dropbox and use it to replace /boot/sun5i-r8-chip.dtb. You then need to download this small program to enable the port and run it every boot up. This worked fine for me on the 4.4.13-ntc-mlc kernel. You can then use the pins found listed here to connect to the tx/rx of the ESP8266 serial and it won’t affect the boot-up of the CHIP.

The other nice thing about using ttyS2 rather than ttyS0 is that there are hardware flow control ports exposed (RTS, CTS) which I had hoped could be integrated into esptool to automatically handle the reset. Unfortunately it looks like esptool uses different hardware flow control ports to signal the ESP8266 bootloader mode/reboot so I had to connect these ports to GPIOs and trigger from there.

After doing this, wire the ESP8266 (I’m using the ESP-12 board, but should be the same for any other boards) to the CHIP in the following manner:

ESP8266 pin CHIP connector
VCC 3.3v
Gnd
CH_PD / EN XIO-P6
GPIO0 XIO-P7 via a resistor (eg 3.3k)
GPIO15 – via resistor (eg 3.3k)
TX LCD-D3
RX LCD-D2

Note that on some ESP boards TX/RX are the wrong way round so if you don’t see anything try flipping the cables around.


I then wrote a small program (called restart_esp.py) to trigger different mode reboots of the ESP8266 from the CHIP:

import CHIP_IO.GPIO as GPIO
import time
import sys

pin_reset = "XIO-P6"
pin_gpio0 = "XIO-P7"

def start_bootloader():
        GPIO.output(pin_gpio0, GPIO.LOW)
        GPIO.output(pin_reset, GPIO.LOW)
        time.sleep(0.1)
        GPIO.output(pin_reset, GPIO.HIGH)

def start_normal():
        GPIO.output(pin_gpio0, GPIO.HIGH)
        GPIO.output(pin_reset, GPIO.LOW)
        time.sleep(0.1)
        GPIO.output(pin_reset, GPIO.HIGH)

GPIO.setup(pin_reset, GPIO.OUT)
GPIO.setup(pin_gpio0, GPIO.OUT)
if sys.argv[1] == 'bootloader':
        print("Bootloader")
        start_bootloader()
else:
        print("Normal start")
        start_normal()

GPIO.cleanup()

Then you can easily flash your ESP8266 from the CHIP using a command like:

python restart_esp.py bootloader; \
esptool.py -p /dev/ttyS2 write_flash --flash_mode dio 0 firmware.bin; \
python restart_esp.py normal

433MHz wireless on the C.H.I.P

I recently got a few C.H.I.P devices to play with – at $9/computer with wireless and bluetooth, how can you go wrong? As part of some messing around with home automation I also bought some wireless ceiling fan controllers which came with a remote control and took one apart in order to change it from using a 433MHz controller to be controllable over WIFI (to be addressed in another post). Replacing the 433MHZ receiver chip with an ESP8266 left me with a remote controller and a 4-pin chip to play with, so I thought I’d look to see what the codes were that it sent.

Unfortunately this was a bit tricky because there are only python GPIO libraries available on the chip device, and they are too uncertain with the timings to be able to read the device. Also there is static etc mixed in so you really need a proper library such as rc-switch to be able to decode the messages. However the CHIP_IO python library was backed by some well organised C code which I figured could be used easily enough to port rc-switch to the CHIP. The result was a small patch set which can be found on the following links – hopefully they get accepted upstream:

  1. https://github.com/xtacocorex/CHIP_IO/pull/70
  2. https://github.com/sui77/rc-switch/pull/150
  3. https://github.com/ninjablocks/433Utils/pull/38

Unfortunately the CHIP’s XIO-P* ports are too slow to receive the data signal, and only a handful of GPIO ports have event support so I guess you’re stuck with using AP-EINT1, AP-EINT3, PWM1 or I2S-MCLK/I2S-DI to drive this part.

Percent signs in crontab

As this little-known ‘feature’ of cron has now bitten me several times I thought I should write a note about it both so I’m more likely to remember in future, but also so that other people can learn about it. I remember a few years ago when I was working for Webfusion we had some cronjobs to maintain the databases and had some error message that kept popping up that we wanted to remove periodically. We set up a command looking something like:

0 * * * * mysql ... -e 'delete from log where message like "error to remove%"'

but it was not executing. Following on from that, today I had some code to automatically create snapshots of a certain btrfs filesystem (however I recommend that for serious snapshotting you use the excellent (if a bit hard to use) snapper tool):

0 5 * * 0 root /sbin/btrfs subvol snap -r /home/ /home/.snapshots/$(date +%Y-%m-%d)

But it was not executing… Looking at the syslog output we see that cron is running a truncated version of it:

May 14 05:00:02 localhost /USR/SBIN/CRON[8019]: (root) CMD (/sbin/btrfs subvol snap -r /home/ /home/.snapshots/$(date +)

Looking in the crontab manual we see:

Percent-signs (%) in  the  command,  unless  escaped
with backslash (\), will be changed into newline characters,
and all data after the first % will be sent to the command
as standard input.

D’oh. Fortunately the fix is simple:

0 5 * * 0 root /sbin/btrfs subvol snap -r /home/ /home/.snapshots/$(date +\%Y-\%m-\%d)

I’m yet to meet anyone who is using this feature to pipe data into a process run from crontab. I’m also yet to meet even very experienced sysadmins who have noticed this behaviour making this a pretty good interview question for a know-it-all sysadmin candidate!

Making a BTRFS read-only snapshot writable

For the past few years I’ve been using btrfs on most filesystems that I create, whilst it’s pretty slow on rotating disk media now that most of my hardware is SSD-based there’s not much of a performance penalty (as long as you’re not using quotas to track filesystem usage). The massive advantage is the ability to have proper snapshot history (unlike any LVM snapshotting hacks that you may suggest) going back a long time with very little overhead. With a tool like snapper (which admittedly is tricky to get set up) you can automatically rotate your snapshots and easily recover any files that you accidentally changed or deleted. Alongside always using git for code repositories, this has saved my skin repeatedly!

Anyway, by default snapper creates read-only snapshots. But when trying to diagnose some database server file corruption I recently experienced I wanted to change a btrfs snapshot from read-only to read-write so I could update some files. After spending a while looking around in the manual and on stack overflow I couldn’t see any way to do this with the kernel/toolchain versions that I was using.

Then, the solution struck me. Simply create a read-write snapshot of the read-only snapshot and work off that. Sometimes it’s very easy to look at the more complicated way of doing things and forget about some of the easier solutions that there might be!

UPDATE: For newer versions of btrfs tools you can toggle read-onlyness of snapshots by running the following command against the subvolume directory:

btrfs property set -ts /path/to/snapshot ro false

Protecting an Open DNS Resolver

As another piece of work I’ve been doing for the excellent Strongarm anti-malware team we recently converted the service so that it can be used to get instant protection wherever you are. Part of this involved my work in converting the core (customized) DNS server into an open resolver. This is usually strongly advised against as you can unwittingly become part of some very serious Denial of Service attacks, however in this blog post I show you how to implement some pretty simple restrictions and limitations to prevent this from happening so you can run a DNS open resolver without running this risk.

Here’s a copy of the article:

One of the challenges of running an open DNS resolver is that it can be used in a number of different attacks, compared to a server that is only allowed access from a known set of IPs. One of the most well known is the DNS amplification attack. As this article explains, “The fact that a DNS reply may be many times larger than a DNS query allows the attacker to achieve amplification by spoofing a relatively small query that is known to generate a large answer in response”. That means that if I can send a DNS question that takes 50 bytes, and I send it pretending to be the computer that I want to attack, and the answer to that question is 1000 bytes, then I have effectively multiplied the traffic that I can attack with by 20 times. Especially as DNSSEC (Domain Name System Security Extensions) become more common, the RRSIG and DNSKEY DNS response codes can contain a lot of data that can be used in this type of attack.

In this post, I’d like to present a couple of ways to easily protect your open DNS resolver from being involved in malware attacks like the DNS amplification attack.

Configuring a DNS Resolver

Many DNS servers, or frontends such as PowerDNS or dnsdist, have the built-in or user-configurable ability to limit some types of attacks. In the case of dnsdist, the loadbalancer sits in front of the DNS servers and monitors the traffic going to and from them in order to blacklist hosts that are abusing the platform.

However, when configuring this within Strongarm’s servers, we wanted the ultimate scalability and flexibility on our DNS infrastructure, so we decided not to use dnsdist but instead use a pure networking approach. Here are a few steps that you can take to protect your DNS infrastructure no matter whether you use a DNS loadbalancer or servers interfacing directly to the internet.

The first step you can take in protecting your server is to ensure that ANY queries cannot be used in an attack. An ANY query returns all the records of a particular domain so naturally it returns more data than a standard query. This is usually easy to configure with an option like ‘any-to-tcp’ in PowerDNS. This setting says that if the recursive server receives an ANY query, it will automatically send back a small redirect: “TCP is required”.

To understand why this helps prevent attacks we need to understand the following three things.

  1. An ANY query will usually return larger responses as it asks for all records under a particular domain.
  2. 99% of the time, an ANY query is not legitimate traffic. Usually, a host will only want a specific type of record such as A or MX.
  3. Whereas it’s easy to spoof UDP traffic, it’s virtually impossible to spoof TCP. This is because establishing a TCP connection requires a 3-way handshake. For example, if the client says “I’d like to open a connection”, and the server says “Okay, you’d like to open a connection, it’s now open”, then the client says, “Thanks, the connection is now open”. While you can spoof the initiation of the connection, when the server says “Okay, you’d like to open a connection, it’s now open,” the host that has been spoofed will reply “What?! I didn’t ask to open a connection!” and it won’t go any further.

Putting this all together, we can see that this can be a very effective preventative measure for abusing an open DNS resolver. Legitimate clients will fall back to using TCP and attackers will simply give up. We can’t use this for all connections because having to do every DNS lookup over TCP would noticeably slow down internet browsing speed, but we can do this easily enough on connections that have a high probability of being attack traffic.

In a similar vein, another useful option for many DNS servers is the ability to limit the size of a return packet over UDP. Typically, you would configure this to say, “If the return packet is more than X bytes, send a TCP redirect and only allow this over TCP.”

Firewall Limiting of Potential Attack Traffic

In addition to doing the above, we implemented a pure firewall-based approach to throttling attack traffic. To do this, we needed to configure our firewall to be stateless, as we described how to do in a previous post.

As opposed to dnsdist or other frontend servers, this allows you to deploy either on a single server or on a frontend router that covers multiple resolvers. This also should be much more efficient as all processing occurs in-kernel via netfilter rather than having to go through a program which may crash or be somehow limited in the speed at which it can process data. As we showed in a previous post this is very efficient at packet processing.

We start by creating an ‘ipset’ of IPs that we have currently blacklisted. We’ll use the ‘timeout’ option to specify that after we have added an IP into this blacklist, it will automatically expire after a certain time. We’ll also limit it to a maximum 100,000 IPs so that an attacker cannot use this to take our server offline:

ipset create throttled-ips hash:ip timeout 600 family inet maxelem 100000

Then, if an IP is on this list, we’ll block it from doing any UDP traffic to our server:

iptables -t raw -A PREROUTING -p udp -m set --match-set throttled-ips src -j DROP

Now for the clever part: we’ll look for DNS responses that are over a certain threshold packet size (700 bytes) and start monitoring them to see the rate at which someone is sending them:

iptables -N LARGE_DNS_PACKET_TRACKING # Create the destination chain
iptables -A OUTPUT -p udp --sport 53 \
        -m length --length 700:0xffff \
        -j LARGE_DNS_PACKET_TRACKING

This points to a new iptables chain called “LARGE_DNS_PACKET_TRACKING” which we’ll set up as follows:

iptables -A LARGE_DNS_PACKET_TRACKING -m hashlimit --hashlimit-mode dstip --hashlimit-dstmask 32 \
   --hashlimit-upto 50kb/min --hashlimit-burst 10 --hashlimit-name large-dns-packets --hashlimit-htable-max 100000 \
   -j ACCEPT

This first rule allows up to 50kb of large DNS responses per minute to a single IP (the 32 means a /32, i.e. a single IP address), and always allows the first 10 large response packets through. Again, it tracks, at most, 100,000 IPs in order to avoid an attack vector against our server.

After a host goes over this threshold, we’ll pass the traffic through to the next stage of the chain:

iptables -A LARGE_DNS_PACKET_TRACKING -j SET --add-set throttled-ips dstip --timeout 600 --exist

This is where the magic happens. If the client breaches the threshold set above, then it will add its IP to the ipset we created earlier, meaning that it will be blocked for 10 minutes. Finally, let’s note this in the system log and then drop the packet:

iptables -A LARGE_DNS_PACKET_TRACKING -j LOG --log-prefix "DNS-amplification protection: "
iptables -A LARGE_DNS_PACKET_TRACKING -j DROP

Conclusions

With the right protection in place, it’s not such a bad thing to run an open DNS resolver on the internet. If you look in your server’s configuration manual, you should find a few options that can also help in preventing attacks. Additionally, we recommend setting up a firewall-based system like I detailed above so that you can limit the amount of traffic you send out. Otherwise, you may easily find your server being disconnected by your ISP for being part of an attack.

Easily extending Cordova’s WebView in your Android app

I’ve recently been working on producing a AngularJS-based financial web app for a client which will also be packaged and distributed via cordova/phonegap. As we are only targeting relatively new browsers, and as we’re aiming to be mobile-first, I decided to use HTML5 inputs such as number as this causes virtual keyboards on iOS and Android to reflect the fact that they can only enter numbers.

This was working fine in Chrome and on various different Android phones via the phonegap build, but then we got feedback that on a certain Android 4.x Samsung phone you could only enter numbers and not a decimal point! This was the first time I’d heard about this bug as normally when I’ve used number inputs before they have only been integral, but it seems that this is a relatively well-known bug on most Samsung Android phones. D’oh.

I searched for quite a while for a plugin or work-around for phonegap, and discovered some code that could be used on a WebView component to work around but no instructions for how to replace this function in the cordova WebView subclass. Fortunately it turned out to be relatively simple, and this is also a generic way of customizing a cordova build’s Android WebView in such a way that you can keep rebuilding the app without it getting overwritten.

Firstly, create a new Java class under your main package called HackedWebViewEngine as at the bottom of this post. The key line is

        this(new HackedWebView(context), preferences);

which changes phonegap’s engine to use your own subclassed WebView rather than using the default one. You need to tell phonegap to use this customised Engine by placing the following in your config.xml file:

    <platform name="android">
       <preference name="webView" value="com.myapp.HackedWebViewEngine" />
    </platform>

Here’s the full code of the Java class to handle the overriding (as an aside, I hate how many imports Java programs need!)

package ...;

import android.content.Context;
import android.text.InputType;
import android.util.AttributeSet;
import android.view.inputmethod.EditorInfo;
import android.view.inputmethod.InputConnection;

import org.apache.cordova.CordovaPreferences;
import org.apache.cordova.engine.SystemWebView;
import org.apache.cordova.engine.SystemWebViewEngine;

public class HackedWebViewEngine extends SystemWebViewEngine {
    public static class HackedWebView extends SystemWebView {
        public HackedWebView(Context context) {
            super(context);
        }
        public HackedWebView(Context context, AttributeSet attrs) {
            super(context, attrs);
        }

        @Override
        public InputConnection onCreateInputConnection(EditorInfo outAttrs) {
            InputConnection connection = super.onCreateInputConnection(outAttrs);

            // Many Samsung phones don't show decimal points on html number inputs by default.
            if ((outAttrs.inputType & InputType.TYPE_CLASS_NUMBER) == InputType.TYPE_CLASS_NUMBER)
                outAttrs.inputType |= InputType.TYPE_NUMBER_FLAG_DECIMAL;

            return connection;
        }
    }

    /** Used when created via reflection. */
    public HackedWebViewEngine(Context context, CordovaPreferences preferences) {
        this(new HackedWebView(context), preferences);
    }

    public HackedWebViewEngine(SystemWebView webView) {
        super(webView);
    }
    public HackedWebViewEngine(SystemWebView webView, CordovaPreferences preferences) {
        super(webView, preferences);
    }
}

Prompt before opening an external link in AngularJS

On a recent project of creating an Angular app which would be both a website and a cordova-packaged app, we had a number of links which opened to external websites (terms and conditions, links to some process flows which couldn’t be contained within the app, etc). However because some of the branding on the sites was very similar to the app itself some test users were getting confused about whether they were still in the app, or had been redirected into a browser.

Because of these issues the client wanted us to create a small popup for some external links that would prompt the user to see if they wanted to move off the site/app. Below is a small angular directive that does this. Usage like:

<a href="https://..." target=_blank prompt-before-open="Do you want to go to an external web page?">...</a>

app.directive('promptBeforeOpen', function($window) {
    return {
        link: function(scope, elem, attr) {
            $(elem).click(function(event) {
                if( !$window.confirm( attr.promptBeforeOpen ) )
                    event.preventDefault();
            });                      
        }                            
    };                               
});

Using ImageMagick to manipulate PNGs stably

This seems to be an issue that has been talked about in a number of places, however I found it very hard to find the correct solution, which is why I have documented it here.

Often as part of the build process for a webapp you’ll want to take original images and shrink them down to be the correct dimensions (either because they require certain dimensions to be accepted, such as icons, or because you want to save space by stripping out unnecessary data). For JPGs you can do this pretty easily like

convert orig.jpg -strip -resize 500x500 build/out.jpg

The -strip removes any EXIF header information both anonymizing the image and saving potentially a few Kb of asset size.

This process is ‘stable’ because if you repeat it (within the same version of ImageMagick), the resulting file’s data will be identical. This means that you won’t get a new version of the built image in your (git) repository each time you run this command.

However recently when trying to do the same for PNGs (because I required transparency) I noticed that each time they were being built, git was committing a new version into the repository. This is bad news because it both grows the size of the repository by storing pointless identical versions of the file, and also makes it a lot harder tracking through history to see what changed because you have loads of PNG images being committed each time you do a build.

Looking at the output of identify -verbose I could see that the part that was changing each time was below:

  Properties:
    date:create: 2016-12-01T19:24:13+03:00
    date:modify: 2016-12-01T19:24:13+03:00
    png:tIME: 2016-12-01T16:24:13Z

So it appears that PNG format wants to store the update/create time in the image’s header itself. That was what was changing each time.

Searching on the internet I found a number of suggestions about how to strip these out with the convert command, and I saw that the header changed a bit but I couldn’t find any that were also removing the ‘png:tIME’ element. Finally I managed to come up with the following flags which convert the image stably:

convert orig.png -strip -define png:exclude-chunks=date,time build/out.png

The identify command still outputs the date: property sections but these are now being taken from the create time (ctime) and modify time (mtime) of the file itself rather than from the header and so are not stored in version control.

You might be wondering why I don’t just create a lazy build system that only updates the asset if the mod time of the source asset is greater than that of the built asset – if I was doing this on a bigger project that would be the best way, but as this was just for a small project I wanted to do quickly I thought that doing this would be the easiest way!