Skip to navigation Skip to main content Skip to footer

GP data of UK Biobank participants

Primary care data give researchers unprecedented insight into conditions that are largely managed by GPs, such as diabetes, dementia and mental ill health. All UK Biobank participants consented to sharing their medical records with us.

We are collecting only coded GP data about diagnoses, prescriptions and referrals, never any confidential notes or letters. We remove all information that could identify individual participants before sharing the data with our approved researchers.

Why we collect GP data

All our participants gave explicit consent for UK Biobank to access all of their medical and health-related records when they first joined the study (see participant consent form).

Linking UK Biobank data to participants coded GP data will transform the scientific value and clinical relevance of the research questions that researchers can address. De-identified primary care data will give approved researchers an unprecedented toolbox to drive diagnostics and treatments – particularly for conditions such as diabetes, dementia and mental ill health, which are largely managed by GPs.

For example, adding primary care data to UK Biobank will roughly double the cases of depression and dementia that can be identified, as well as allowing detection of less severe cases at an earlier stage. This means researchers could then study the full spectrum of disease severity, bringing the new diagnostic tools and treatments we need closer.

On the left, there are three overlapping circles. The largest is labelled ‘48% of cases are recorded only by GPs’. The next largest, which overlaps a small amount with the first, is labelled ‘Cases recorded via hospital admissions data'. The third, most of which overlaps with both circle 1 and circle 2, is labelled ‘Cases recorded via death certificates’. On the right, there are two overlapping circles. The largest is labelled ‘50% of cases are recorded only by GPs’. The second, which overlaps a small amount with the first, is labelled ‘Cases recorded via hospital admissions data’.

Source data: UK Biobank participants in England

During the pandemic, emergency legislation allowed UK Biobank to access our participants’ GP data solely for the purpose of COVID-19 research. More than 300 scientific studies used these data, including to identify factors that increase the risk of severe COVID-19 and to uncover that infection is associated with changes in the brain structure.

This research changed the trajectory of COVID-19 treatment and showed how powerful GP data can be when combined with other participant data and I look forward to seeing this impact across diseases.

Professor Sir Rory Collins, Principal Investigator and CEO of UK Biobank

How we access GP data

For many years, UK Biobank has been able to collect participant health data that is held centrally, including information about hospital stays, cancer diagnoses, causes of death and use of other central NHS services. We also have access to a small amount of coded GP data – meaning codes related to diagnoses, prescriptions and referrals, not any confidential notes or letters – for the around 11% of our participants living in Scotland and Wales.

Primary care data in England is managed by thousands of individual GP practices across the country. Until 2017, we had access to coded primary care data for around half of our England-based participants, after which GP practices as the legal controllers of the data could no longer offer this. As of 4 October 2024, NHS England (NHSE) has taken responsibility for the primary care data of all NHSE patients, and UK Biobank will be able to apply to make the de-identified data of our participants available to researchers. This has taken the burden off busy and overworked GPs, putting it into the central hands of NHSE and fulfilling our founders and participants’ original hopes. 

Our global community of scientists will be thrilled with this boon of incoming data, and I cannot wait to see how they use it to focus research efforts on improving early detection, prediction and prevention of disease.

Professor Naomi Allen, Chief Scientist, UK Biobank

How we protect our participants’ privacy

UK Biobank will only collect coded data about health conditions. For example, if a participant visits their GP with bronchitis, the GP records the code for bronchitis and only this code would be shared with UK Biobank. Letters and notes of conversations between a participant and their GP would not be shared with UK Biobank.

All data about participants are held separately from their personal identifying information (such as name and address) in encrypted form for optimum protection. All of our systems meet the international standards for information security, and we conduct regular tests to ensure that they are robust against any possible cyberattacks.

Before UK Biobank provides data to researchers, participants’ identifying information is removed so individuals cannot easily be identified. All applications to access UK Biobank data are carefully scrutinised to ensure that the researchers are appropriately qualified, and that their proposed studies are health-related and in the public interest. Researchers’ institutions sign a legal contract requiring them to keep the data secure, not to share the data with anyone, and to publish their findings to benefit patient care and public health.

Contact details

For further information please email: UKBiobank@ukbiobank.ac.uk

Last updated