How record linkage works

Record linkage brings together information that relates to the same individual, family, place or event from different data sources.  In this way it is possible to construct chronological sequences of health events for individuals.  Combined, these individual 'stories' create a larger story about the health of people in NSW and the ACT. 

An example is a study by a research team who want to investigate whether bowel cancer screening can improve survival.  To address this question the researchers require information about people’s bowel cancer screening from a survey, as well as their cancer incidence, relevant hospital treatment and deaths that are recorded in administrative data collections managed by government agencies.

Once the relevant government agencies and a human research ethics committee approve the research project, the data can then be securely linked by the CHeReL and made available to the researcher. 

The process by which data is securely linked and made available is as follows:

1. Splitting

The first step in data linkage is to split the records from each dataset into two separate files. To protect privacy, the identifier information such as names and addresses is separated from the content information, like cancer type or screening history.  The two separate files from each dataset are stored and handled separately. The identifier data is used in the Data Linkage process and the content information without names and addresses is used in Data Integration.

2. Data Linkage

CHeReL’s Data Linkage Unit applies a combination of matching methods to the identifier data to identify and distinguish between individuals, e.g. between this John Smith and that John Smith. The identifier data belonging to each individual is then assigned an arbitrary Person Number which replaces the name, address and other identifying details.

3. Data integration and disclosure

Using an encrypted version of the arbitrary Person Number, CHeReL’s Data Integration Unit then makes a research Project specific Person Number for each individual (PPN).  A different PPN is created for each research project.  This PPN is then joined to the relevant survey, cancer and hospital data that has been approved for the research project by the relevant government agencies and human research ethics committee.   All records for an individual in any approved project dataset will have the same PPN.  The approved data is released to the researcher, along with conditions on the use of data. 

4. Creating a research data set

Using the PPN that is attached to the survey, cancer and hospital data, the researcher can combine records for an individual without accessing information about their identity. This helps to ensure that privacy is protected and research in the public interest continues to improve the health system and health outcomes.