How record linkage works

Record linkage brings together information that relates to the same individual, family, place or event from different data sources.  In this way it is possible to construct chronological sequences of health events for individuals.  Combined, these individual 'stories' create a larger story about the health of people in NSW and the ACT. 

In bringing together these records, the CHeReL uses strict privacy preserving protocols which ensure the security of the data and confidentiality of the individuals the records relate to.  This process is described below.

1. Custodians of the data collections to be linked provide the CHeReL with an encrypted source record number and demographic details for each record in their dataset.   Note that clinical data is not provided to the CHeReL.

 The demographic information, like names and birth dates, from different data collections are encrypted and sent to the CHeReL


2. The CHeReL links these records using probabilistic matching of the demographic details, and assigns a CHeReL person number for records that belong to the same person.  In the example shown, CHeReL person 35 (George Brown) has one cancer registry record, two hospital records and a death record.  Person 78 (James Grey) has one death record and one hospital record.  The CHeReL person number never leaves the CHeReL.  The CHeReL person ID and the associated source record numbers form the CHeReL Master Linkage Key (MLK).  The MLK provides a 'pointer' to records for a person in different datasets.

The CHeReL used the demographic information to create a person ID number, this number can be put on records from different sources to indicate they are the same person. 


3. When the data custodians and ethics committee approve a project, the CHeReL assigns a project-specific person number (PPN) for each person in the linked dataset.  In this example, CHeReL person number 35 becomes PPN 896.  CHeReL person number 109 becomes PPN 531.  The PPNs assigned are different for each project. 

 The CHeReL sends the Person ID number along  with the source records number back to the data custodians


4. The data custodian decrypts the source record number, and merges the project person number with the clinical variables that have been approved for use in the project.  The source record number is removed and the researcher is provided with the PPN and the clinical information. 

The data custodians can then attach the CHeReL person ID number onto their data, for example hospital admissions, or death registrations.


5. The researcher is then able to combine the records for the same person from the different datasets using the PPN. 

The Data custodians send the clinical information and the CHeReL Person ID to the researcher. The researcher can combine multiple data sets using the CHeReL Person ID without knowing any demographic information

 This process ensures that:

  • CHeReL staff performing the linkage use demographic variables but do not have access to the clinical information about the individuals;
  • Data custodians only have access to data within their data collections; and
  • Researchers receive data which contains no identifying variables, or variables which provide a link back to the CHeReL MLK.