Skip to content
A NSW Government website

Our services

The CHeReL provides advice on the design, cost, feasibility and process of linkage studies. We offer four types of record linkage services:

  • Linkage between and within records held in the Master Linkage Key (MLK).  The MLK consists of records from a number of administrative datasets, including records of hospitalisations, emergency department presentations, births, cancer registrations and deaths.  An example of a study using MLK linkage services could be linking death records to hospitalisations to assess mortality within 30 days of hospital discharge
  • Linkage of other datasets to the MLK.  Often researchers are interested in a particular cohort of people, and want to examine their outcomes or history using routinely collected administrative records.  For example, a cohort of people may be receiving a particular intervention, and the researcher wants to determine the effectiveness of this intervention as measured by later hospital admissions, emergency department presentations and mortality.
  • Linkage of two or more datasets that are not included in the MLK - for example, linkage of educational outcome to child protection data
  • Deduplication of datasets.  The CHeReL regularly conducts this kind of linkage for data custodians to identify duplicate records within a dataset and records which belong to the same person. 

How record linkage works

Record linkage brings together information that relates to the same individual, family, place or event from different data sources.  In this way it is possible to construct chronological sequences of health events for individuals.  Combined, these individual 'stories' create a larger story about the health of people in NSW and the ACT. 

An example is a study by a research team who want to investigate whether bowel cancer screening can improve survival.  To address this question the researchers require information about people’s bowel cancer screening from a survey, as well as their cancer incidence, relevant hospital treatment and deaths that are recorded in administrative data collections managed by government agencies.

Once the relevant government agencies and a human research ethics committee approve the research project, the data can then be securely linked by the CHeReL and made available to the researcher. 

The process by which data is securely linked and made available is as follows:

Step 01 of 4

The first step in data linkage is to split the records from each dataset into two separate files. To protect privacy, the identifier information such as names and addresses is separated from the content information, like cancer type or screening history.  The two separate files from each dataset are stored and handled separately. The identifier data is used in the Data Linkage process and the content information without names and addresses is used in Data Integration.

Step 02 of 4

CHeReL’s Data Linkage Unit applies a combination of matching methods to the identifier data to identify and distinguish between individuals, e.g. between this John Smith and that John Smith. The identifier data belonging to each individual is then assigned an arbitrary Person Number which replaces the name, address and other identifying details.

Step 03 of 4

Using an encrypted version of the arbitrary Person Number, CHeReL’s Data Integration Unit then makes a research Project specific Person Number for each individual (PPN).  A different PPN is created for each research project.  This PPN is then joined to the relevant survey, cancer and hospital data that has been approved for the research project by the relevant government agencies and human research ethics committee.   All records for an individual in any approved project dataset will have the same PPN.  The approved data is released to the researcher, along with conditions on the use of data. 

Step 04 of 4

Using the PPN that is attached to the survey, cancer and hospital data, the researcher can combine records for an individual without accessing information about their identity. This helps to ensure that privacy is protected and research in the public interest continues to improve the health system and health outcomes.

Technical details

The CHeReL currently uses Choicemaker software for linkage. Choicemaker provides for standardisation and parsing, and differs from the classical probabilistic approaches and software primarily in the use of an automated blocking algorithm and machine learning technique for 'scoring' or assigning weights. The system also allows users to make use of stacked data (where multiple values are present in a single field) and includes a "transitivity engine" that allows for a user-specified action in the case of transitive linkage problems - for example, if record A is a high probability match to both B and C, but B and C are low probability matches to each other. More detail on these features is described in Goldberg and Borthwick (2004). Additional advantages of Choicemaker for the CHeReL are that the system is multi-user, matches in one rather than several steps, is capable of matching very large files and can be run on variety of platforms.


The Centre for Health Record Linkage (CHeReL) charges a fee for the linkage of records. The fee matrix below shows the estimated fees for CHeReL linkage services, depending on the cohort size and the number and types of datasets to be linked.

For example, a research project with a cohort of up to 5,000 persons, using up to 10 datasets which includes a combination of 5 MLK and 5 external (non-MLK) datasets, will cost approximately $12,000, excluding GST, for the linkage service.

Linkage fee matrix showing the estimated fee (exc GST) for a CHeReL linkage service based on the cohort size and number and types of datasets to be linked.

Cherel Pricing Table V1


*MLK – CHeReL master linkage key datasets

The following details what is included and excluded in calculating the estimated fee for the linkage service:

  • The price estimate is exclusive of GST, which is payable on top of the estimated amount above.
  • Estimates assume clean data is provided, i.e. identifiers are available and content is linkable. Should a data cleaning service be required, additional fees will apply.
  • Estimates assume that each non-MLK dataset to be linked comes in one file. Additional data sourcing fees applies for more than one file per dataset.
  • Additional linkage fees (starting from $5,000) apply for the following types of linkage services: case-control (eg requiring control selection), family (eg linkage of mother and child records), cross-agency linkages (e.g. ABS and AIHW) and cross-jurisdictional linkages (eg NSW and Victoria).
  • Additional fees may apply for projects requiring more intensive project management, or where data extraction is required.
  • A more detailed quote will be provided after your full application has been submitted to the CHeReL.

For more information, contact the CHeReL at

Please note, the 45 and Up Study may charge a fee for data extraction which is not included in the linkage fee. For more information, please contact the 45 and Up Study at

We pay respect to the Traditional Custodians and First Peoples of NSW, and acknowledge their continued connection to their country and culture.