Skip to content

Reusable New Data Product Validation Functions

More Efficient, More Consistent Data ​Through Shared Validation Functions

An image showing a stack of boxes on the left and a single box with robotic legs on the right. The stack of boxes has a label "old validation process" along with titles on boxes such as "code not shared", "inconsistent approach", "unreliable" and "manual process". Above the boxes it says "3 days". Next to the boxes an unhappy man is struggling to move them. To the right is a single box with robotic legs, with a happy looking man stood next to it. The box with robotic legs is labeled "new validation process" and has words nearby such as "reusable code", "consistent process" and "easy to re-run". Above the box is a label stating it takes about 30 minutes.

All data provisioned into the NHS England Secure Data Environment (SDE) must be validated first. The old data product validation process was manual, time consuming and lengthy to re-run.​

Our objectives were to: - Boost the efficiency and consistency of the data validation process for the Data Access Request Service (DARS) ​ - Make it re-usable to save time and uphold best practice​ - Share the code so others can benefit. ​

Results

  • Validation time down from days to approximately 30 minutes​
  • Validation code was reusable on other datasets​ and has already been reused
  • Consistent methodology compared to manual approach​
  • Enabled multiple potential issues that could have hampered research efforts to be addressed earlier.​
Output Link
Open Source Code & Documentation Coming soon!
Case Study N/A
Technical report N/A
Algorithmic Impact Assessment N/A