Information for data custodians
This page is designed to assist custodians who are providing data for a record linkage study. The information provides an outline of the three stages in a linkage study where data custodian involvement is required. These are:
- the application process;
- provision of identifying data to the Centre for Health Record Linkage (CHeReL); and
- provision of clinical data to the Principal Investigator.
1. The application process
All linkage studies require three separate approvals - from the CHeReL, the custodians of all data collections used in a linkage study and a human research ethics committee. This means that before a project can begin the research team will approach the data custodian to seek approval for inclusion of his or her data in the project. Specifically, the researcher will be requesting approval for (1) release of identifying data (e.g. name, address, date of birth) to the CHeReL to enable the data to be linked to other datasets, and (2) release of clinical data (e.g. diagnosis, treatment) to the researchers for analysis. Strict privacy-preserving protocols are used to ensure that the CHeReL receives only the identifying data but no clinical information, and the researchers receive only the clinical information but no identifying variables.
2. Provision of identifying data to the CHeReL
Once a project has received the three approvals specified above, the CHeReL will contact the relevant data custodians to request the identifying variables from their data collections. These allow us to link the datasets to each other. The identifying variables for a particular project will have been specified during the application process, and listed in the CHeReL Application for Data.
i) Record Identifiers
Along with the identifying data, the CHeReL also requires two additional fields from data custodians:
- record ID (recid): this identifies each individual record to be sent to the CHeReL. This field should not exceed 13 characters in length; and
- patient or person ID (patientid): this identifies each individual or person within the custodian's database. This should also not exceed 13 characters in length. If there is no unique person number in the database then the Patient ID field should be set to the value of the Record ID.
For privacy purposes, we recommend that custodians don't send the original Record ID or Patient ID from their datasets to the CHeReL. Rather, project-specific numbers should be generated. This may be done using encryption software, or by generating autonumbers. For example, if there are multiple records per person and a unique person number is available, the Patient ID field can be populated as follows:
- create a list of records with unique person numbers;
- apply an auto number to each person number;
- set the Patient ID field to the autonumber (note that this autonumber will be different to the autonumber in the Record ID field); and
- RETAIN the mapping of autonumbers to Record ID and/or Patient ID until completion of the project. If this is not retained it will not be possible to join the information generated by the CHeReL linkage process to the original data.
ii) Data Format
The CHeReL will accept data in comma delimited (CSV), Excel or plain text 'flat file' formats. Please discuss details with the CHeReL Operations or Data Managers prior to preparing data for transfer.
iii) Data Transfer
Since data sent to the CHeReL by a data custodian contains personal information, all files should be encrypted using Winzip or GPG software using a minimum of 128-bit AES encryption security. Data should not be sent by e-mail. The encrypted data file should only be transferred to the CHeReL using our secure file upload facility. Documents on the usage of this facility and passwords for access will be made available to data custodians once data is ready for transfer. The CHeReL Data Manager will acknowledge receipt of the file and obtain the encryption password from the custodian by phone.
3. Provision of clinical data to the Principal Investigator
Once the linkage is complete, the CHeReL creates a 'project key' for each data collection, which is returned to the respective data custodian. A project key is a file that is usually provided in comma-separated values format with three fields: PPN (a unique Project Person Number for each individual in the data collection), Record ID (or Patient ID, depending on what has been agreed with the custodian), and a source code field (which identifies the data collection).
The data custodian then:
- translates the Record ID in the project key file into the custodian's internal Record ID using either the previously generated autonumber lookup table, or decryption, depending on which method was used to generate the Record ID;
- joins the PPN to the database records, using the Record ID;
- extracts the PPN and clinical variables that were previously approved for the project from the database; and
- forwards the de-identified data to the Principal Investigator. No identifying information is sent to the researchers, only the PPNs and their associated clinical variables.
Data extraction tip
Many custodians find that it is more efficient to extract the identifiers and clinical data required for a project at the same time. This data file can then be used for both Stages 2 and 3 above: the identifiers can be sent to the CHeReL at the start of the project for linkage, and once the linkage is complete and the PPNs have been attached, the clinical data items can be sent to the researchers.
Before releasing any data, the NSW Ministry of Health requests a signed confidentiality agreement from the researchers. Custodians of other collections should also obtain a signed confidentiality agreement and follow the data disclosure policies of their organisation. Confidentiality agreement templates are available on the Ministry's website at: