Skip to content
A NSW Government website

Our services

The CHeReL provides advice on the design, cost, feasibility and process of linkage studies. We offer four types of record linkage services:

  • Linkage between and within records held in the Master Linkage Key (MLK).  The MLK consists of records from a number of administrative datasets, including records of hospitalisations, emergency department presentations, births, cancer registrations and deaths.  An example of a study using MLK linkage services could be linking death records to hospitalisations to assess mortality within 30 days of hospital discharge
  • Linkage of other datasets to the MLK.  Often researchers are interested in a particular cohort of people, and want to examine their outcomes or history using routinely collected administrative records.  For example, a cohort of people may be receiving a particular intervention, and the researcher wants to determine the effectiveness of this intervention as measured by later hospital admissions, emergency department presentations and mortality.
  • Linkage of two or more datasets that are not included in the MLK - for example, linkage of educational outcome to child protection data
  • Deduplication of datasets.  The CHeReL regularly conducts this kind of linkage for data custodians to identify duplicate records within a dataset and records which belong to the same person. 

How record linkage works

Record linkage brings together information that relates to the same individual, family, place or event from different data sources.  In this way it is possible to construct chronological sequences of health events for individuals.  Combined, these individual 'stories' create a larger story about the health of people in NSW and the ACT. 

An example is a study by a research team who want to investigate whether bowel cancer screening can improve survival.  To address this question the researchers require information about people’s bowel cancer screening from a survey, as well as their cancer incidence, relevant hospital treatment and deaths that are recorded in administrative data collections managed by government agencies.

Once the relevant government agencies and a human research ethics committee approve the research project, the data can then be securely linked by the CHeReL and made available to the researcher. 

The process by which data is securely linked and made available is as follows:

Step 01 of 4

The first step in data linkage is to split the records from each dataset into two separate files. To protect privacy, the identifier information such as names and addresses is separated from the content information, like cancer type or screening history.  The two separate files from each dataset are stored and handled separately. The identifier data is used in the Data Linkage process and the content information without names and addresses is used in Data Integration.

Step 02 of 4

CHeReL’s Data Linkage Unit applies a combination of matching methods to the identifier data to identify and distinguish between individuals, e.g. between this John Smith and that John Smith. The identifier data belonging to each individual is then assigned an arbitrary Person Number which replaces the name, address and other identifying details.

Step 03 of 4

Using an encrypted version of the arbitrary Person Number, CHeReL’s Data Integration Unit then makes a research Project specific Person Number for each individual (PPN).  A different PPN is created for each research project.  This PPN is then joined to the relevant survey, cancer and hospital data that has been approved for the research project by the relevant government agencies and human research ethics committee.   All records for an individual in any approved project dataset will have the same PPN.  The approved data is released to the researcher, along with conditions on the use of data. 

Step 04 of 4

Using the PPN that is attached to the survey, cancer and hospital data, the researcher can combine records for an individual without accessing information about their identity. This helps to ensure that privacy is protected and research in the public interest continues to improve the health system and health outcomes.

Technical details

The CHeReL currently uses Choicemaker software for linkage. Choicemaker provides for standardisation and parsing, and differs from the classical probabilistic approaches and software primarily in the use of an automated blocking algorithm and machine learning technique for 'scoring' or assigning weights. The system also allows users to make use of stacked data (where multiple values are present in a single field) and includes a "transitivity engine" that allows for a user-specified action in the case of transitive linkage problems - for example, if record A is a high probability match to both B and C, but B and C are low probability matches to each other. More detail on these features is described in Goldberg and Borthwick (2004). Additional advantages of Choicemaker for the CHeReL are that the system is multi-user, matches in one rather than several steps, is capable of matching very large files and can be run on variety of platforms.


The Centre for Health Record Linkage charges a fee for the linkage of data.  This fee is based on a number of factors, primarily:

  • The number of individuals (record groups) in the study;
  • The number of datasets from which information is requested; and
  • Whether the study involves linking a dataset that is not part of the Master Linkage Key (MLK).  These studies involve additional cost due to the number of clerical reviews required to check the linkage of the external dataset(s).

The cost of linkage varies for each project.  As an example however, the linkage and extraction of records from two MLK datasets for 50,000 individuals would cost approximately $5,000 ex GST.  The fee for linkage of an external (i.e. non-MLK) dataset of 10,000 records and subsequent extraction of linked data from two MLK datasets would be approximately $7,000 ex GST.  In each example, the cost would increase with the addition of more datasets and individuals.  Quotes can be obtained by contacting the CHeReL at or forwarding an Application for Data to the CHeReL.

Please note, the 45 and Up Study may charge a fee for data extraction which is not included in the linkage fee. For more information, please contact the 45 and Up Study at

We pay respect to the Traditional Custodians and First Peoples of NSW, and acknowledge their continued connection to their country and culture.