Connected Publications
Our work has produced a number of publications
List of pre-releases and publications connected to our work
[8] https://ieeexplore.ieee.org/document/10635870
Medisure: Towards Assuring Machine Learning-Based Medical Image Classifiers Using Mixup Boundary Analysis
Adam Byfield, William Poulett, Ben Wallace, Anusha Jose, Shatakshi Tyagi, Smita Shembekar, Adnan Qayyum, Junaid Qadir, Muhammad Bilal
Machine learning (ML) models are becoming integral in healthcare technologies, necessitating formal assurance methods to ensure their safety, fairness, robustness, and trustworthiness. However, these models are inherently error-prone, posing risks to patient health and potentially causing irreparable harm when deployed in clinics. Traditional software assurance techniques, designed for fixed code, are not directly applicable to ML models, which adapt and learn from curated datasets during training. Thus, there is an urgent need to adapt established software assurance principles such as boundary testing with synthetic data. To bridge this gap and enable objective assessment of ML models in real-world clinical settings, we propose Mix-Up Boundary Analysis (MUBA), a novel technique facilitating the evaluation of image classifiers in terms of prediction fairness. We evaluated MUBA using brain tumour and breast cancer classification tasks and achieved promising results. This research underscores the importance of adapting traditional assurance principles to assess ML models, ultimately enhancing the safety and reliability of healthcare technologies. Our code is available at https: //github.com/willpoulett/MUBA_pipeline.
[7] https://publichealth.jmir.org/2024/1/e46485
The Use of Online Consultation Systems or Remote Consulting in England Characterized Through the Primary Care Health Records of 53 Million People in the OpenSAFELY Platform: Retrospective Cohort Study
Martina Fonseca, Brian MacKenna, Amir Mehrkar, The OpenSAFELY Collaborative, Caroline E Walters, George Hickman, Jonathan Pearson, Louis Fisher, Peter Inglesby, Seb Bacon, Simon Davy, William Hulme, Ben Goldacre, Ofra Koffman, Minal Bakhai
We aimed to explore general practice coding activity associated with the use of Online Consultations (OC) systems in terms of trends, COVID-19 effect, variation, and quality. The OpenSAFELY platform was used to query and analyze the in situ electronic health records of suppliers The Phoenix Partnership (TPP) and Egton Medical Information Systems, covering >53 million patients in >6400 practices, mainly in 2019-2020. We successfully queried general practice coding activity relevant to the use of OC systems, showing increased adoption and key areas of variation during the pandemic at both sociodemographic and clinical levels. The work can be expanded to support monitoring of coding quality and underlying activity. This study suggests that large-scale impact evaluation studies can be implemented within the OpenSAFELY platform, namely looking at patient outcomes.
[6] https://arxiv.org/abs/2403.19802
Developing Healthcare Language Model Embedding Spaces
Niall Taylor, Dan Schofield, Andrey Kormilitzin, Dan W Joyce, Alejo Nevado-Holgado
Pre-trained Large Language Models (LLMs) often struggle on out-of-domain datasets like healthcare focused text. We explore specialized pre-training to adapt smaller LLMs to different healthcare datasets. Three methods are assessed: traditional masked language modeling, Deep Contrastive Learning for Unsupervised Textual Representations (DeCLUTR), and a novel pre-training objective utilizing metadata categories from the healthcare settings. These schemes are evaluated on downstream document classification tasks for each dataset, with additional analysis of the resultant embedding spaces. Contrastively trained models outperform other approaches on the classification tasks, delivering strong performance from limited labeled data and with fewer model parameter updates required. While metadata-based pre-training does not further improve classifications across the datasets, it yields interesting embedding cluster separability. All domain adapted LLMs outperform their publicly available general base LLM, validating the importance of domain-specialization. This research illustrates efficient approaches to instill healthcare competency in compact LLMs even under tight computational budgets, an essential capability for responsible and sustainable deployment in local healthcare settings. We provide pre-training guidelines for specialized healthcare LLMs, motivate continued inquiry into contrastive objectives, and demonstrates adaptation techniques to align small LLMs with privacy-sensitive medical tasks.
[5] https://link.springer.com/chapter/10.1007/978-3-031-56107-8_21 - Conference Paper
Hypergraphs for Frailty Analysis Research Paper
Zoe Hancox, Samuel D. Relton, Andrew Clegg, Philip G. Conaghan, and Daniel Schofield
Inclusion of mortality to hypergraphs alongside the most prevalent combinations of frailty conditions. This paper demonstrates that this technique enables us to determine the probability of acquiring another condition as well as understanding the connection and sequencing of acquiring comorbidities.
[4] https://doi.org/10.1101/2023.08.31.23294903 - (Pre-Print)
Representing Multimorbid Disease Progressions using directed hypergraphs
Jamie Burke, Ashley Akbari, Rowena Bailey, Kevin Fasusi, Ronan A.Lyons, Jonathan Pearson, James Rafferty, and Daniel Schofield
To introduce directed hypergraphs as a novel tool for assessing the temporal relationships between coincident diseases,addressing the need for a more accurate representation of multimorbidity and leveraging the growing availability of electronic healthcare databases and improved computational resources.
[3] https://doi.org/10.1016/j.epidem.2022.100662
Large-scale calibration and simulation of COVID-19 epidemiologic scenarios to support healthcare planning
Nick Groves-Kirkby, Ewan Wakeman, Seema Patel, Robert Hinch, Tineke Poot, Jonathan Pearson, Lily Tang, Edward Kendall, Ming Tang, Kim Moore, Scott Stevenson, Bryn Mathias, Ilya Feige, Simon Nakach, Laura Stevenson, Paul O'Dwyer, William Probert, Jasmina Panovska-Griffiths, Christophe Fraser
... We adapted an agent-based model of COVID-19 to inform planning and decision-making within a healthcare setting, and created a software framework that automates processes for calibrating the model parameters to health data and allows the model to be run at national population scale on National Health Service (NHS) infrastructure. ... These simulations were used to support operational planning in the NHS in England, and we present the example of the use of these simulations in projecting future clinical demand during the rollout of the national COVID-19 vaccination programme. ...
[2] https://doi.org/10.1101/2023.01.25.23284428
Primary care coding activity related to the use of online consultation systems or remote consulting: an analysis of 53 million peoples’ health records using OpenSAFELY
Martina Fonseca, Brian MacKenna, Amir Mehrkar, The OpenSAFELY Collaborative, Caroline E Walters, George Hickman, Jonathan Pearson, Louis Fisher, Peter Inglesby, Seb Bacon, Simon Davy, William Hulme, Ben Goldacre, Ofra Koffman, Minal Bakhai
We aimed to explore general practice coding activity associated with the use of online consultation systems in terms of trends, COVID-19 effect, variation and quality.
[1] https://doi.org/10.21203/rs.3.rs-2226531/v1
Assessing the value of integrating national longitudinal shopping data into respiratory disease forecasting models
Elizabeth Dolan, James Goulding, Harry Marshall, Gavin Smith, Gavin Ling, Laila Tata
... We investigated the value of integrating sales of non-prescription medications commonly bought for managing respiratory symptoms, to improve forecasting of weekly registered deaths from respiratory disease at local levels across England, by using over 2 billion transactions logged by a UK high street retailer from March 2016 to March 2020. We report the results from the novel AI explainability variable importance tool Model Class Reliance implemented on the PADRUS model. ...