Spatial Hexagons


Binning is a technique that groups or aggregates data within a certain interval.  It can be used as a discretisation technique that creates a discrete variable from a continuous variable.  Age is typically a variable which is binned into a number of categories such as 20~30, 40~50.  This makes creating histograms possible especially when there is a limited amount of data.  Binning also smoothes out data variation.

Hexagonal Map of Canberra's Population Density, 2011.

The map above bins population density data for Canberra and surrounds into 930 equally sized hexagons. There are 40 different colours (i.e. shades of blue) used in the diagram above — these colours are perceptually evenly spaced using R’s ‘colorRampPalette’ function.

Binning Spatial Data

Census data is typically presented already aggregated into various statistical regions. These regions are irregular polygons and their size varies according to the underlying population density.

Using a Regular Tessellation

Rather than use maps with polygons of varying sizes, Fishnet maps use various techniques to tile the map into uniform square cells and then assign a colour to each of these cells. The advantage in this technique is that perception is driven solely by the colour value rather than the underlying shape.

The map on the left shows the Canberra region with the underlying hexagon grid.  The image on the right shows the blank hexagon grid.

The map on the left shows the Canberra region with the underlying hexagon grid. The image on the right shows the blank hexagon grid.

Benefits of Hexagon Tessellation

For a regular tessellation, hexagons have the maximum number of sides and this makes them useful for covering an irregular shape such as the map above.  Hexagons more rounded shape (as opposed to triangles and squares) mean that the eye is not drawn to sharper corners or horizontal and vertical lines.

Implementation with the R Hexbin Package

The hexagon diagram above contains 930 equally sized hexagons.  These were constructed from a raster data set of about 1.5 million observations of X, Y and Z vectors.  These contained latitude and longitude information as well as an estimate for the population density at the relevant coordinates.

Default Behaviour for Hexbin

Hexbin, by default simply uses the X and Y coordinates and counts how many of these are contained in a specific hexagon.  This was overridden using the  hexTapply() function to calculate mean population density for each bin.   The use.count parameter needs to be set to ‘FALSE’ and parameter is referred to the vector create by  hexTapply() .  A code snippet is shown below:

#how to include a z varible
hbin<-hexbin(df$X, df$Y,xbins=40,IDs=TRUE)
     mincnt = 1, maxcnt = 3000,, colramp=cr, colorcut= ci$brks)
#full code listing on Github

The function grid.hexagon() also scales the data between 0 and 1. This scaling was replicated outside the function such that  colour values could be calculated appropriately.

Github Code and Data

The full code is available on github here.