Skip to content

Resources in swpc_synthea

The structure of the resources in swpc_synthea is as below:

├── src/main/resources
|   ├── export
|   ├── geography
|   |   ├── demographics.csv
|   |   ├── foreign_birthplace.json
|   |   ├── postcodes.csv
|   |   ├── sdoh.csv
|   |   ├── timezones.csv
|   ├── keep_modules
|   ├── modules
|   ├── physiology
|   |   ├── hospitals.csv
|   |   ├── primary_care_facilities.csv
|   |   ├── urgent_care_facilities.csv
|   ├── providers
|   ├── templates
|   biometrics.yml
|   birthweights.csv
|   bmi_correlations.json
|   cdc_growth_charts.json
|   cdc_wtleninf.csv
|   gdb_disability_weights.csv
|   growth_data_error_rates.json
|   htn_drugs.csv
|   immunization_schedule.json
|   language_lookup.json
|   names.yml
|   nhanes_two_year_olds_bmi.csv
|   race_ethnicity_codes.json
|   shr_mapping.csv
|   telemedicine_config.json
|   us_core_mapping.csv

We'll now highlight a few key files for the simulation inputs

geography/demographics.csv

This file sets the general demographics for the population created during this simulation, such as age distributions and town populations. The meanings of the different columns in this file can be found in the table below.

The demographics breakdowns for the UK version have been done by the full region rather than per town as information per town was not available.

Column Name Contains Data Sources for UK Version
ID Town ID number monotonically increasing from 1
COUNTY County Code where town is located TOWN_211CODE column from ONS data table
NAME Town Name TOWN_211NAME column from above ONS table
STNAME Region Name REGION/COUNTRY from above ONS table
POPESTIMATE2015 Population of that town TOTAL population rows for 2019 from above ONS data
CTYNAME County Name where town is located range of wikipedia sites for the towns
TOT_POP total county population ONS Region data
Ethnicity (includes ASIAN, BLACK, MIXED, WHITE, OTHER columns) Percentage of the population that is of a certain ethnicity1 NHS Health Survey England
Ages (includes all age breakdown columns) percentage of population in different age groups Another NHS Health Survey England2
Income (includes all income breakdown columns) percentage of population in different income brackets ONS Employment data3
LESS_THAN_HS fraction of people with no qualifications, or level 1 or 2 of education (as classified by ONS) ONS Education data
HS_DEGREE fraction of people with level 3 education Same as above
SOME_COLLEGE fraction of people with apprenticeships Same as above
BS_DEGREE fraction of people with level 4 education Same as above

Many of the above sources are complimented by data from the 2021 census

geography/sdoh.csv

This file contains information on social determinants of health for the different regions.

Column Name Contains Data Sources for UK Version
FOOD_INSECURITY percentage of people with food insecurity Sheffield University study
SEVERE_HOUSING_COST_BURDEN percentage of people with severe housing cost burden 4 Government English housing survey
UNEMPLOYMENT percentage of unemployment ONS local labour market data
NO_VEHICLE_ACCESS percentage of the population with no access to a vehicle ONS census data

geography/postcodes.csv

Originally called zipcodes.csv but changed to use the English word. Postcode data was found here.

modules/

Store for the clinical modules saved as jsons. Most of these are currently based on the original US SyntheaTM version. See index.md for a list of changed modules.

providers/

These files set different medical facilities for patients to attend in the simulation.

GP practices in the South West were found in the NHS digital GP Practice Data, and the conversion from postcode to latitude and longitude was done using the grid reference finder.


  1. Ethnicity categories were changed from the American version to align better with UK ethnicity breakdowns 

  2. Under 18 ages set to 0 as we are only interested in an adult population currently. 

  3. Income brackets don't match exactly, and so estimations of the breakdowns within the brackets used in Synthea had to be done. 

  4. We used data on mortgagors who found affording their mortgage very or fairly difficult (table AT2_8) plus renters who found affording their rent very or fairly difficult over the total number of people surveyed in the study used. This data was only available for the whole country.