Embedded Data Scientists in the National Disease Registration Service (NDRS)
We have datascientists that work with the National Disease Registration Service supporting various functions.
Aim
The aim of this collaboration is to support NDRS with implementing data science based projects and support the upskilling of staff.
Projects
We have supported NDRS with a range of projects.
Clinical Measurement Extractor
This is an LLM pipeline to extract out clinical measurments from free-text data. This is using AWS services such as BedRock and SageMaker. For more information, please follow the link to this project page.
Output | Link |
---|---|
Private repo | Github Repo |
ONS Population RAP Pipeline
This is a gold-level RAP, R-based codebase that scrapes population data from the ONS website to produce three tables reporting population counts broken down by year, LSOA, gender, age, and age bands. This data is used by NDRS analysts to support the calculation of population-based metrics.
Output | Link |
---|---|
Private repo | Github Repo |
Germline Count RAP Pipeline
An R-based codebase was developed to create visualisations showing the monthly and yearly counts of specific germline tests, as well as the percentage change in counts over time. The goal is to help evaluate the data quality of germline test counts and assess appropriate timelines for data sharing.
Output | Link |
---|---|
Private repo | Github Repo |
Supporting the Cancer Genomics Improvement Programme
The aim of this project is to link multiple datasets to capture the cancer diagnosis-to-treatment pathway, specifically for cases where genetic testing is required. Support for this project involved writing a hybrid of SQL and R scripts to define the cohort of interest and to extract relevant dates and events from pathology testing.
Output | Link |
---|---|
Private repo | Github Repo |