⚠️ This is a prototype - this content is draft, does not constitute NHS policy, and is likely to change ⚠️

Draft LLM Evaluation and Monitoring Framework v0.2.2

This framework presents a structured approach to evaluating and monitoring in the responsible use of Large Language Models (LLMs) in healthcare settings.
It is organised around three key groups:

Suitability in Context: addresses if the model is continuing to do what it was designed for.
Wider Impact: looks at the responsible use of the model.
Quantifiable Changes: attempts to group considerations that can be measured through metrics.

Each group contains a set of dimensions that cover different aspects of LLM performance. The emphasis across the whole framework is on the practical implications i.e. how often to review each dimension, what decisions they inform, and what actions they drive.