How record linkage works
Record linkage brings together information that relates to the
same individual, family, place or event from different data
sources. In this way it is possible to construct
chronological sequences of health events for individuals.
Combined, these individual 'stories' create a larger story about
the health of people in NSW and the ACT.
In bringing together these records, the CHeReL uses strict
privacy preserving protocols which ensure the security of the data
and confidentiality of the individuals the records relate to.
This process is described below.
1. Custodians of the data collections to be linked provide the
CHeReL with an encrypted source record number and demographic
details for each record in their dataset. Note that
clinical data is not provided to the CHeReL.
2. The CHeReL links these records using probabilistic matching
of the demographic details, and assigns a CHeReL person number for
records that belong to the same person. In the example shown,
CHeReL person 35 (George Brown) has one cancer registry record, two
hospital records and a death record. Person 78 (James Grey)
has one death record and one hospital record. The CHeReL
person number never leaves the CHeReL. The CHeReL person ID
and the associated source record numbers form the CHeReL Master
Linkage Key (MLK). The MLK provides a 'pointer' to records
for a person in different datasets.
3. When the data custodians and ethics committee approve a
project, the CHeReL assigns a project-specific person number (PPN)
for each person in the linked dataset. In this example,
CHeReL person number 35 becomes PPN 896. CHeReL person number
109 becomes PPN 531. The PPNs assigned are different for each
4. The data custodian decrypts the source record number, and
merges the project person number with the clinical variables that
have been approved for use in the project. The source record
number is removed and the researcher is provided with the PPN and
the clinical information.
5. The researcher is then able to combine the records for the
same person from the different datasets using the PPN.
This process ensures that:
- CHeReL staff performing the linkage use demographic variables
but do not have access to the clinical information about the
- Data custodians only have access to data within their data
- Researchers receive data which contains no identifying
variables, or variables which provide a link back to the CHeReL