Investigating Privacy Concerns and Mitigations for Language Models in Healthcare
Over recent years, larger, more data-intensive Language Models (LMs) with greatly enhanced performance have been developed. The enhanced functionality has driven widespread interest in adoption of LMs in Healthcare, owing to the large amounts of unstructured text data generated within healthcare pathways.
However, with this heightened interest, it becomes critical to comprehend the inherent privacy risks associated with these LMs, given the sensitive nature of Healthcare data. This PhD Internship project sought to understand more about the Privacy-Risk Landscape for healthcare LMs through a literature review and exploration of some technical applications.
Overview
LMs can memorize their Training Data
Studies have shown that LMs can inadvertently memorise and disclose information verbatim from their training data when prompted in certain ways, a phenomenon referred to as training data leakage. This leakage can violate the privacy assumptions under which datasets were collected and can make diverse information more easily searchable.
As LMs have grown, their ability to memorize training data has increased, leading to substantial privacy concerns. The amount of duplicated text in the training data also correlates with memorization in LMs. This is especially relevant in healthcare due to the highly duplicated text in Electronic Healthcare Records (EHRs).
If LMs have been trained on private data and are subsequently accessible to users who lack direct access to the original training data, the model could leak this sensitive information. This is a concern even if the user has no malicious intent.
Privacy Attacks
A malicious user can stage a privacy attack on an LM to extract information about the training data purposely. Researchers can also use these attacks to measure memorization in LMs. There are several different attack types with distinct attacker objectives.
One of the most well-known attacks is Membership inference attacks (MIAs). MIAs determine whether a data point was included in the training data of the targeted model. Such attacks can result in various privacy breaches; for instance, discerning that a text sequence generated by Clinical LMs (trained on EHRs) originating from the training data can disclose sensitive patient information.
At the simplest level, MIAs use the confidence of the target model on a target data instance to predict membership. A threshold is set against the confidence of the model to ascertain membership status. For a specific example, if the confidence is greater than the threshold then the attacker assumes the target is a member of the training data, as the model is "unsurprised" to see this example, indicating it has likely seen this example before during training. Currently, the most successful MIAs use reference models. This refers to a second model trained on a dataset similar to the training data of the target model. The reference model filters out uninteresting common examples, which will also be "unsurprising" to the reference model.
Privacy Mitigations
There are three primary approaches to mitigate privacy risks in LMs:
- Methods for data preprocessing - Data Sanitization aims to remove all sensitive information from data before the model aims to eliminate all sensitive information before model training. Data sanitization approaches are very effective when sensitive information follows a context-independent, consistent format (e.g., NHS numbers, email addresses, etc.). Still, they cannot guarantee the privacy of contextual text. Data Deduplication aims to remove duplicated sequences of text of a certain length in the dataset, leaving only one unique instance. This has been demonstrated to reduce overall memorization in large LMs.
- Training strategies include utilizing privacy-preserving learning algorithms, e.g., differentially private (DP) training - In DP training, during back propagation, the gradients for individual examples are clipped to a fixed norm, and noise is added to this before updating model parameters. This limits the effect that any one example can have on the model parameters, reducing the ability of the model to memorize the example.
- Post-training techniques such as Machine Unlearning or Editing - Machine Editing and Unlearning comprise a set of techniques for modifying or erasing information post-training, rendering them highly applicable in real-world scenarios. This could be used, for example, when somebody wishes to practice their Right-to-be-Forgotten, removing private data from the training data after model training.
What we did
In this project, we sought to understand more about the Privacy-Risk Landscape for Healthcare LMs and conduct a practical investigation of some existing privacy attacks and defensive methods.
Initially, we conducted a thorough literature search to understand the privacy risk landscape. Our first applied work package explored data deduplication before model training as a mitigation to reduce memorization and evaluated the approach with Membership Inference Attacks. We showed that RoBERTa models trained on patient notes are highly vulnerable to MIAs, even when only trained for a single epoch. We investigated data deduplication as a mitigation strategy but found that these models were just as vulnerable to MIAs. Further investigation of models trained for multiple epochs is needed to confirm these results. In the future, semantic deduplication could be a promising avenue for medical notes.
Our second applied work package explored editing/unlearning approaches for healthcare LMs. Unlearning in LMs is poised to become increasingly relevant, especially in light of the growing awareness surrounding training data leakage and the 'Right to be Forgotten'. We found that many repositories for performing such approaches were not adapted for all LM types, and some are still not mature enough to be easy to use as packages. Exploring a Locate-then-Edit approach to Knowledge Neurons, we found this was not well suited to the erasure of information we needed in medical notes. Our findings suggest that the focus from a privacy perspective on these methods should be on those which allow the erasure of specific training data instances instead of relational facts.
Where Next?
This work primarily explored privacy in pre-trained Masked Language Models. The growing adoption of generative LMs underscores the importance of expanding this work to Encoder and Encoder-Decoder models like the GPT family and T5. Also, due to the common practice of freezing parameters and tuning the last layer of a LM on a private dataset, it is critical to expand investigations of privacy risks to LMs fine-tuned on healthcare data.
Within the scope of this exploration, the field of Machine Unlearning/Editing applied to LMs was in its infancy, but it is gaining momentum. As this field matures, comparing the efficacy of different methods becomes crucial. Furthermore, it is important to explore the effect of removing the influence of a set of data points. A holistic examination of the effectiveness, privacy implications, and broader impacts of Machine Unlearning/Editing methods on healthcare LMs is essential to inform the development of robust and privacy-conscious LMs in the NHS.
When considering explainability of models, this often involves generating explanations or counterfactuals alongside the decisions made by the LM. However, integrating explanations into the output of LMs can introduce vulnerabilities related to training data leakage and privacy attacks. Additionally, efforts to enhance privacy, such as employing Privacy-preserving training techniques, can inadvertently impact fairness, particularly in datasets lacking diversity. In healthcare, all three elements are paramount, so investigating the privacy-explainability-fairness trade-off is crucial for developing private, robust and ethically sound LMs.
Finally, privacy concerns in several emerging trends for LMs need to be understood in Healthcare scenarios. Incorporating external Knowledge Bases to enhance LMs, known as retrieval augmentation, could make LMs more likely to leak private information. Further, Multimodal Large Language Models (MLLM), referring to LM-based models that can take in and reason over multimodal information common in healthcare, could be susceptible to leakage from one input modality through another output modality.