Ridge regression is a method used to produce simpler but more accurate regression models. It is also known as ‘regularisation’ .
Sophisticated ensemble methods such as Random Forests provide cutting edge prediction models but they can be difficult to interpret.
CART algorithms do not have the same predictive accuracy as other algorithms. However, their results are easily interpretable.
Thin plate splines are a non parametric method used to interpolate data in two dimensions. This may help detect trends.
K-Means clustering is an unsupervised technique that partitions data into a pre-defined number of groups of similar observations.
Cosine similarity enables the comparison of high dimensional vectors to be efficiently calculated with a few lines of code.
Using R graphics, two different techniques are used to visualise daily data over a three year period.
PCA is a linear algebra technique that summarises data sets. The results can be used as inputs to predictive models.
55,000 temperature observations are aggregated into 13,000 bins. These are displayed in 3 dimensions using Autodesk Maya.
Hexagons allow bivariate data to be binned as a regular tessellation. This has advantages compared to tessellating with squares.