How to generate good-looking geographical heatmaps

For a project recently I needed to produce a geographical heatmap with millions of data points. I first tried using R with OpenStreetMap rendering, but I couldn’t make the heatmap display as flexibly as I wanted. Then, a friend suggested I try using python with the geopandas library. Even then, inspite of some examples in the manual and even a heatmap-based example, I couldn’t make the map look how I wanted with the interfaces that were exposed.

There were a number of issues:

  • My data was weighted, but there is no support for a weighted heatmap – all points are considered equal
  • I want to have a nicer map which has town names on etc. Ideally the cartodb light style
  • My data was heavily weighted to certain locations, so I wanted to run some logarithmic function on the weights generated by the heatmap algorithm to make the data more easily visible
  • The standard heatmap setup assumes there is nothing other than a map outline so by default is not transparent at all
  • I want to include this data in a report so all I want is a single png image output, rather than having the axis lines, various paddings etc on the graph

So, looking through the geopandas code (which mostly relies on seaborn kdeplot) to generate heatmaps I pulled out various bits into my program so that I could have full control over these processes. Here is a short runthrough of the resulting code:

First, we include libraries, read command-line values and set up the area of our map as a geopandas object

import contextily as ctx
import geopandas
import matplotlib.pyplot as plt
import numpy as np
import pandas
import scipy.stats
import seaborn.palettes
import seaborn.utils
import sys

input_file = sys.argv[1]
output_file = sys.argv[2]
langs = sys.argv[3].split(',')
country_codes = sys.argv[4].split(',')

max_tiles = 10

world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

if country_codes[0] == 'list':
    pandas.set_option('display.max_rows', 1000)

# Restrict to specified countries
if country_codes[0] == 'all':
    area = world[world.continent != 'Antarctica']
    area = world[world.iso_a3.isin(country_codes)]

Then, we pull out the bounds both in latlng format and convert the area into Spherical Mercator for web display. We need to plot everything in this so that we can pull in open street map tiles for the background later.

latlng_bounds = area.total_bounds
area = area.to_crs(epsg=3857)
axis = area.total_bounds

# Create the map stretching over the requested area
ax = area.plot(alpha=0)

We now read in the CSV and drop any data that we wouldn’t display:

df = pandas.read_csv(input_file,
    names=['weight', 'lat', 'lng'],
    dtype = {
        'weight': np.int32,
        'lat': np.float64,
        'lng': np.float64,

            ( < axis[1]) | ( > axis[3]) | (df.lng < axis[0]) | (df.lng > axis[2])     # outside bounds of country
    ].index, inplace=True)

NOTE: I tried doing .to_crs() on the CSV data using normal lat/lng representation to convert into Spherical Mercator but it was very slow. As I’m generating the data itself from PostGIS I just run the conversion in that instead. The data I have is basically using the freely available MaxMind geoip database to generate a list of lat/lng/radius. I then pick a random point within that radius to output to the CSV file using a query like:

SELECT ST_AsText(ST_GeneratePoints(ST_Transform(ST_Buffer(location, radius)::geometry, 3857), 1)), ...

Then comes the hard bit – converting these values into the heatmap using an algorithm called KDE. These bits were lifted and simplified from the kdeplot routines mentioned above, but fortunately although not exposed by seaborn the underlying scipy.stats module contains a weighted KDE which is easy to use.

# Calculate the KDE
data = np.c_[df.lng,]
kde = scipy.stats.gaussian_kde(data.T, bw_method="scott", weights=df.weight)
data_std = data.std(axis=0, ddof=1)
bw_x = getattr(kde, "scotts_factor")() * data_std[0]
bw_y = getattr(kde, "scotts_factor")() * data_std[1]
grid_x = grid_y = 100
x_support = seaborn.utils._kde_support(data[:, 0], bw_x, grid_x, 3, (axis[0], axis[2]))
y_support = seaborn.utils._kde_support(data[:, 1], bw_y, grid_y, 3, (axis[1], axis[3]))
xx, yy = np.meshgrid(x_support, y_support)
levels = kde([xx.ravel(), yy.ravel()]).reshape(xx.shape)

The levels output variable is then the weights of all points on the heatmap we are going to output. We can apply a mathematical function to this (in this case take the quintic root) in order to scale the data appropriately, and output:

cset = ax.contourf(xx, yy, levels,
    20, # n_levels

    cmap=seaborn.palettes.blend_palette(('#ffffff10', '#ff0000af'), 6, as_cmap=True),
    antialiased=True,       # avoids lines on the contours to some extent

By generating a palette ourselves we can add an alpha channel to each aspect so that the underlying map will be visible. However there is quite a lot of noise at the lower levels so we want to hide all of them completely:

# Hide lowest N levels
for i in range(0,5):

We then add in the OSM base tiles, which is a bit of a mess as the contextily library seems a bit immature:

def add_basemap(ax, latlng_bounds, axis, url=''):
    prev_ax = ax.axis()
    # TODO: Zoom should surely take output pixel request size into account...
    zoom = ctx.tile._calculate_zoom(*latlng_bounds)
    while ctx.tile.howmany(*latlng_bounds, zoom, ll=True) > max_tiles:      # dont ever try to download loads of tiles
        zoom = zoom - 1
    print("downloading %d tiles with zoom level %d" % (ctx.tile.howmany(*latlng_bounds, zoom, ll=True), zoom))
    basemap, extent = ctx.bounds2img(*axis, zoom=zoom, url=url)
    ax.imshow(basemap, extent=extent, interpolation='bilinear')
    ax.axis(prev_ax)        # restore axis after changing the background

add_basemap(ax, latlng_bounds, axis)

Finally we can tweak some of the output settings and save it as a png file:


plt.savefig(output_file, dpi=200, format='png', bbox_inches='tight', pad_inches=0)