Binning is a technique that groups or aggregates data within a certain interval. It can be used as a discretisation technique that creates a discrete variable from a continuous variable. Age is typically a variable which is binned into a number of categories such as 20~30, 40~50. This makes creating histograms possible especially when there is a limited amount of data. Binning also smoothes out data variation.
Binning Spatial Data
Census data is typically presented already aggregated into various statistical regions. These regions are irregular polygons and their size varies according to the underlying population density.
Using a Regular Tessellation
Rather than use maps with polygons of varying sizes, Fishnet maps use various techniques to tile the map into uniform square cells and then assign a colour to each of these cells. The advantage in this technique is that perception is driven solely by the colour value rather than the underlying shape.
Benefits of Hexagon Tessellation
For a regular tessellation, hexagons have the maximum number of sides and this makes them useful for covering an irregular shape such as the map above. Hexagons more rounded shape (as opposed to triangles and squares) mean that the eye is not drawn to sharper corners or horizontal and vertical lines.
Implementation with the R Hexbin Package
The hexagon diagram above contains 930 equally sized hexagons. These were constructed from a raster data set of about 1.5 million observations of X, Y and Z vectors. These contained latitude and longitude information as well as an estimate for the population density at the relevant coordinates.
Default Behaviour for Hexbin
Hexbin, by default simply uses the X and Y coordinates and counts how many of these are contained in a specific hexagon. This was overridden using the hexTapply() function to calculate mean population density for each bin. The use.count parameter needs to be set to ‘FALSE’ and cell.at parameter is referred to the vector create by hexTapply() . A code snippet is shown below:
#how to include a z varible hbin<-hexbin(df$X, df$Y,xbins=40,IDs=TRUE) mtrans<-hexTapply(hbin,df$Z,mean,na.rm=TRUE) grid.hexagons(hbin,style='colorscale',pen=0,border= 'white',use.count=FALSE, mincnt = 1, maxcnt = 3000, cell.at=mtrans, colramp=cr, colorcut= ci$brks) #full code listing on Github
The function grid.hexagon() also scales the data between 0 and 1. This scaling was replicated outside the function such that colour values could be calculated appropriately.
Github Code and Data
The full code is available on github here.