Skip to content

Probabilistic Linkage Model

This project is creating a probabilistic linkage model using Splink, in order to improve linkage outcomes, and by extension, patient outcomes.

Crafting a model that suits NHS England data linkage needs

This project aims at developing an alternative data linkage model to MPS (Master Person Service) by creating a probabilistic linkage model using the package called Splink, which was developed by Ministry of Justice (MoJ).

The linkage pipeline consists of a few steps:

  • Pre-processing
  • Distance Metrics
  • Blocking
  • Training
  • Prediction
  • Evaluation

Each of these steps requires research into linkage best practice, testing on samples of our data, feasibility studies of computational power required, and then thorough evaluation. We are working with an incremental improvement plan and a series of iterative MVPs to ensure that the pipeline has the highest quality we can achieve within our computational limits.

Here is an overview of how our pipeline currently looks. Splink linkage pipeline scheme

Building a model with transparency in mind

Users of linked data have to rely on the accuracy of the process created by others as often the process of linking data is not under their control. That is why one of the main focus of the model we are building is transparency of the methods and explainability of the results.

Output Link
Splink Linkage Pipeline * Github

* This is currently private and available for internal access only.