
Ridge Regression
Ridge regression is a method used to produce simpler but more accurate regression models. It is also known as ‘regularisation’ .

Logistic Regression
Logistic regression is a popular method to predict binary variables such the presence or absence of disease.

The Case for Data
There is no shortage of data, storage or computing power — the only limiting item is the ability of the analyst.

Random Forests
Sophisticated ensemble methods such as Random Forests provide cutting edge prediction models but they can be difficult to interpret.

CART Models
CART algorithms do not have the same predictive accuracy as other algorithms. However, their results are easily interpretable.

Thin Plate Splines
Thin plate splines are a non parametric method used to interpolate data in two dimensions. This may help detect trends.

K Means Clustering
KMeans clustering is an unsupervised technique that partitions data into a predefined number of groups of similar observations.

Cosine Similarity
Cosine similarity enables the comparison of high dimensional vectors to be efficiently calculated with a few lines of code.

Time Series Visualisation
Using R graphics, two different techniques are used to visualise daily data over a three year period.

Principle Components
PCA is a linear algebra technique that summarises data sets. The results can be used as inputs to predictive models.

3D Histogram
55,000 temperature observations are aggregated into 13,000 bins. These are displayed in 3 dimensions using Autodesk Maya.

Spatial Hexagons
Hexagons allow bivariate data to be binned as a regular tessellation. This has advantages compared to tessellating with squares.

RDFS and Inferencing
RDFS is a simple extension of RDF which allows simplebutpowerful inferences to be automatically generated from the data.

RDF Basics
RDF enables graphical data structures to be encoded using XML. These can be queried using a technology called SPARQL.