Skip to content

Generating Primary Care Data Using UK Version of Synthea

Primary care records are crucial for understanding healthcare interactions at both the population and individual levels. However, these records are difficult to obtain and integrate with other services, hindering innovation due to data unavailability and privacy concerns.

Our project aims to address this by developing a code base to generate primary care electronic health records. We start by creating a synthetic population that mirrors a region in England, and then adapt the US-based tool Synthea for the English NHS context (particularly focusing on the South West of England). We chose Synthea as it is already a highly developed, accuracy tested synthetic generator, and adapting it is relatively simple with clinical input. It is also highly efficient and quick to generate large amounts of data.

See the data science website for more details on the aims of the project.

Usage and Limitations

The aim of Synthea is to produce data that is "Realistic but not real." This means that data produced by Synthea can be used to trial systems, and to test software and procedures, but it should not be used to draw statistical insight or do analysis.

The UK adaptation of Synthea has removed several features which were US specific, and integrates UK statistics Details of the available features can be found here

Evaluation

The functionality and stability of the release has been "smoke-tested" by assuring valid outputs can be optained when using various option flags described in the documentation.

In the current release of the UK Adaptation of Synthea there have not been any statistical validations of outputs.