8. Skip to content

8. Glossary

Term Definition
Bias analysis Bias analysis is concerned with identifying and mitigating biases that may be present in algorithms and how these can affect the outcomes of a model.
Blocking rules A blocking rule is a criterion or set of criteria used to group records into sub-sets or ‘blocks’ before the actual matching or linkage process takes place. The purpose of blocking is to limit the number of record pairs that need to be compared during the matching process, making the linkage more efficient and reducing the computational resources required
Cardinality The cardinality of a set is a measure of the number of unique elements of the set.
Clerical reviews Clerical reviews, in the context of data linkage, refer to manual inspections or examinations conducted by human operators or clerks to verify and validate the accuracy of the matched record.
Confidence score Record-level information that consists in a number (for example, between 0 and 1, or 0 and 100%) that represents a confidence that the output of the matching algorithm is correct for a given record.This is different from accuracy, which defines the percentage of correct predictions made from all predictions. Confidence scores can be formulated in different ways for different algorithms. It may not be possible to interpret a confidence score as a direct measure of the probability of a record being correct. But generally speaking, a higher confidence score should correlate with a higher likelihood of the record having been correctly matched.
Data linkage Data linkage is the process of bringing together records that pertain to the same entity, such as a person, place, or organisation --OR-- A process which brings together two or more sets of administrative or survey data to produce information (linked data) that can be used for research or statistical purposes.
Data matching Data matching is often used interchangeably with data linkage. However, while data linkage refers to the entire process of bringing together records from different data sets, data matching refers to the specific approach used to recognise that such records refer to the same entity.
Deterministic matching Deterministic matching uses deterministic rules to decide whether a pair of records match or not. The match does not need to be perfect. If the rules establish that partial-matches are accepted, then the algorithm can produce a non-perfect match.
Identifiable An identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.
Joining Joining refers to the process of using unique keys that identify the same records in different data sets.
Linked asset/ Linked data The product of bringing together data from multiple sources.
Non-identifiable The opposite of identifiable
Precision It is the ratio of the true positive matches to the total number of records identified as matches.
Probabilistic matching Probabilistic matching involves the calculation of conditional probabilities to determine the likelihood that a record pair is a match.
Pseudonymisation The processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.
Recall It is the ratio of true positive matches to the total number of true matches.
Sensitivity Analysis Sensitivity analysis in data linkage involves systematically varying the parameters or methods used in the linkage process to evaluate how sensitive the study results are to these changes. It helps researchers assess the robustness of their findings and identify potential sources of bias or error.

Last update: August 30, 2024
Created: August 30, 2024