How record linkage works
Record linkage brings together information that relates to the same individual, family, place or event from different data sources. In this way it is possible to construct chronological sequences of health events for individuals. Combined, these individual 'stories' create a larger story about the health of people in NSW and the ACT.
In bringing together these records, the CHeReL uses strict privacy preserving protocols which ensure the security of the data and confidentiality of the individuals the records relate to. This process is described below.
1. Custodians of the data collections to be linked provide the CHeReL with an encrypted source record number and demographic details for each record in their dataset. Note that clinical data is not provided to the CHeReL.
2. The CHeReL links these records using probabilistic matching of the demographic details, and assigns a CHeReL person number for records that belong to the same person. In the example shown, CHeReL person 35 (George Brown) has one cancer registry record, two hospital records and a death record. Person 78 (James Grey) has one death record and one hospital record. The CHeReL person number never leaves the CHeReL. The CHeReL person ID and the associated source record numbers form the CHeReL Master Linkage Key (MLK). The MLK provides a 'pointer' to records for a person in different datasets.
3. When the data custodians and ethics committee approve a project, the CHeReL assigns a project-specific person number (PPN) for each person in the linked dataset. In this example, CHeReL person number 35 becomes PPN 896. CHeReL person number 109 becomes PPN 531. The PPNs assigned are different for each project.
4. The data custodian decrypts the source record number, and merges the project person number with the clinical variables that have been approved for use in the project. The source record number is removed and the researcher is provided with the PPN and the clinical information.
5. The researcher is then able to combine the records for the same person from the different datasets using the PPN.
This process ensures that:
- CHeReL staff performing the linkage use demographic variables but do not have access to the clinical information about the individuals;
- Data custodians only have access to data within their data collections; and
- Researchers receive data which contains no identifying variables, or variables which provide a link back to the CHeReL MLK.