Launch of world’s most significant protein study set to usher in new understanding for medicine
Launch of world’s most significant protein study set to usher in new understanding for medicine
UK Biobank has today announced the launch of the world’s most comprehensive study of the proteins circulating in our bodies, which will transform the study of diseases and their treatments. This unparalleled project aspires to measure up to 5,400 proteins in each of 600,000 samples, including those taken from half a million UK Biobank participants and 100,000 second samples taken from these volunteers up to 15 years later. This will allow researchers to explore a first-of-its-kind database, detailing how changes to an individual’s protein levels over mid-to-late life influence disease. The study will begin by analysing the first 300,000 samples, which will include initial samples from 250,000 UK Biobank volunteers and 50,000 second samples taken at follow-up assessments.
Measuring the abundance of thousands of proteins circulating in the blood enables researchers to investigate their potential role in many types of diseases that occur during mid-to-late life. This emerging research field – known as population proteomics – has demonstrated huge potential for diagnostics and therapeutics.
In October 2023, a pilot project released data on nearly 3,000 circulating proteins from 54,000 UK Biobank participants. The pilot was already the world’s largest study of its kind and led to research identifying over 14,000 links between common genetic variants and altered protein levels, over 80% of which were previously unknown.
The research, published in Nature (1), has already been cited over 400 times, laying the foundations for scientists to better understand how and why diseases develop. So far, studies using the data have led to advances in disease prediction (2),(3) and developing future targeted treatments for breast cancer (4), cardiovascular disease (5), Parkinson’s disease (6), and other brain illnesses (7).
This new study, which aims to increase this unique dataset by ten-fold, is being funded by a consortium of 14 leading biopharmaceutical companies, known as the UK Biobank Pharma Proteomics Project.
"For the first time at this scale, researchers will be able to detect the exact causes of diseases by comparing how protein levels change over mid-to-late life in a large group of people. Proteomic data has already paved the way for better cancer, autoimmune and dementia diagnostics, and this truly exciting study of proteins will significantly speed up drug discovery, leading to major improvements in public health and care everywhere."Professor Sir Rory Collins, Principal Investigator and Chief Executive, UK Biobank
"UK Biobank is an extraordinary resource for medical research and has already had a big impact on diagnosis and treatments. The plan to study proteins in participants across the study has the potential to unlock a new era of possibilities. That this is being funded by a wide consortium of companies highlights the importance of pre-competitive research to increase knowledge for everyone who is trying to be innovative to improve health."Lord Patrick Vallance, Minister of State for Science, Research and Innovation of the United Kingdom
UK Biobank’s proteomics dataset will allow researchers to:
- Examine proteomic and genetic data from half a million people simultaneously. UK Biobank released the whole genome sequencing of its half a million participants in November 2023. Adding proteomic data will allow researchers to combine these massive datasets, providing a more detailed picture of the biological processes involved in disease progression. This may in turn drive the development of personalised treatments.
- Examine how and why protein levels change over time. Half a million participants provided UK Biobank with a blood sample when they joined and 100,000 of them provided a second sample up to 15 years later. Researchers will be able to see how protein levels have changed over mid-to-late life, enhancing understanding of age-related changes in healthy individuals and shedding light on how diseases develop. This will further accelerate research into diagnostic and prognostic markers.
- Uniquely use proteomic data in combination with imaging data. Nearly 100,000 UK Biobank participants have undergone magnetic resonance imaging (MRI) of their brain, heart and body, providing researchers with detailed scans. Layering these different data types to investigate human health creates a truly extraordinary, detailed understanding of the disease mechanisms.
- Open avenues for developing AI models. Already, machine learning tools can predict future disease many years before diagnosis, with the potential to shape early interventions (8). The depth and breadth of the proteomic data held within UK Biobank may enable machine learning to accurately subtype diseases, which has the potential to inform what treatments should be given at the point of diagnosis.
"Proteomics provides an incredibly detailed snapshot of health. This new frontier of science can unveil how genetics and external factors – like diet, exercise and climate – interact, and will help to pinpoint the key causes of diseases and identify drug targets. It has already led to important scientific discoveries, such as identifying proteins that can help to diagnose disease – including multiple sclerosis (9) – and helping to identify those at higher risk of developing dementia (10) and cancer (11) many years before clinical diagnosis. Over 19,000 researchers around the world are using UK Biobank data; adding proteomic data to everything else we hold will enable scientists to make rapid discoveries to help diagnose and treat life-altering diseases.”"Professor Naomi Allen, Chief Scientist, UK Biobank
It will take about a year to measure the protein levels in 300,000 participant samples. The proteomic data will be made available to UK Biobank-approved researchers (12) in staggered releases from 2026, with the full dataset expected to be added to the UK Biobank Research Analysis Platform by 2027. During this time, additional funding will be sought to analyse samples from all remaining UK Biobank volunteers (an additional 250,000 participants, including second samples from a further 50,000).
"UK Biobank’s proteomic dataset has the potential to enable more powerful biomarker discovery, more accurate disease prediction, and more successful drug development. Analysing samples from two time points in the same volunteer will allow us to examine how protein levels change across hundreds of health and disease states over time, at an unprecedentedly large scale. This will represent one of the world’s largest ever biopharmaceutical research collaborations, underlining the growing importance of proteomics as a drug discovery tool. I can’t wait to see how the scientific community will explore these data to pinpoint molecular drivers of disease progression, disease subtypes, and aging."Dr Chris Whelan, Director, Neuroscience, Data Science & Digital Health, Johnson & Johnson Innovative Medicine, Pharma Proteomics Project Lead
"Adding proteomic data for the full UK Biobank cohort will be an absolute game changer for prediction of disease onset and prognosis, particularly for the many neglected diseases for which good prospective data are lacking. These include debilitating and life threating diseases, such as polycystic ovary syndrome and motor neurone disease. Just imagine if we could detect these and many other conditions much earlier than is currently possible"Professor Claudia Langenberg, Director of the Precision Healthcare University Research Institute at Queen Mary University of London
Before the data are made available to UK Biobank-approved researchers, and in keeping with its Access policy, members of this industry consortium will have a short period of exclusive access (nine months). Any results gleaned will be returned to UK Biobank, further enhancing a ground-breaking health dataset accessible to approved researchers globally.
The protein detection and sequencing will be completed by Regeneron Genetics Center®, using the Olink™ Explore HT proteomics platform from Thermo Fisher Scientific and Ultima UG 100™ sequencers from Ultima Genomics (13), both high throughput technologies enabling large-scale applications.
"Regeneron Genetics Center is honoured to be selected by this distinguished consortium of industry peers to complete proteomic assay data generation for the Pharma Proteomics Project, allowing us to deploy our capabilities in large-scale proteomics. By investigating how protein levels change with disease and over time, we will unlock a tremendous new knowledge base of powerful biomarkers and predictors of disease. The insights gained from this comprehensive proteomics study – combined with our deep genomic database – will pave the way for more precise diagnostics and targeted treatments, ultimately transforming the landscape of modern medicine."Dr. Aris Baras, Senior Vice President and Head of Regeneron Genetics Center® (RGC™), Pharma Proteomics Project sequencer
The UK Biobank Pharma Proteomics Project will fund the analysis of the first 300,000 samples. The biopharmaceutical companies in the Pharma Proteomics Project are: Alden Scientific, Amgen, AstraZeneca, Bristol Myers Squibb, Calico Life Sciences, Roche, GSK, Isomorphic Labs, Johnson & Johnson, MSD, Novo Nordisk, Pfizer, Regeneron and Takeda. UK Biobank are seeking additional funding to analyse the remaining 300,000 samples, therefore completing the full cohort, plus 100,000 second samples, taken up to 15 years later.
References:
- Plasma proteomic associations with genetics and health in the UK Biobank, Sun & Whelan et al, Nature, October 2023. https://www.nature.com/articles/s41586-023-06592-6
- Proteomic signatures improve risk prediction for common and rare diseases, Carrasco-Zanini et al, Nature, July 2024. https://www.nature.com/articles/s41591-024-03142-z
- Blood protein assessment of leading incident diseases and mortality in the UK Biobank, Foley, Marioni & Sun et al, Nature Aging, July 2024. https://www.nature.com/articles/s43587-024-00655-7
- Evaluation of circulating plasma proteins in breast cancer using Mendelian randomisation, Mälarstig et al, Nature Communications, November 2023. https://www.nature.com/articles/s41467-023-43485-8
- Proteome-wide Mendelian randomization identifies candidate causal proteins for cardiovascular diseases, Chen et al, MedRxiv, October 2023. https://www.medrxiv.org/content/10.1101/2023.10.16.23297103v1
- Proteogenomic network analysis reveals dysregulated mechanisms and potential mediators in Parkinson’s disease, Doostparast et al, Nature Communications, July 2024. https://www.nature.com/articles/s41467-024-50718-x
- Immunological Drivers and Potential Novel Drug Targets for Major Psychiatric, Neurodevelopmental, and Neurodegenerative Conditions, Dardani et al, MedRxiv, February 2024. https://www.medrxiv.org/content/10.1101/2024.02.16.24302885v1
- Disease prediction with multi-omics and biomarkers empowers case–control genetic discoveries in the UK Biobank, Garg, Karpinski & Matelska et al, Nature Genetics, September 2024. https://www.nature.com/articles/s41588-024-01898-1
- Plasma proteomic profiles of UK Biobank participants with multiple sclerosis, Jacobs et al, Annals of Clinical and Translational Neurology, January 2024. https://onlinelibrary.wiley.com/doi/10.1002/acn3.51990
- Plasma proteomic profiles predict future dementia in healthy adults, Guo, Yu & Zhang et al, Nature Aging, February 2024. https://www.nature.com/articles/s43587-023-00565-0
- Identifying proteomic risk factors for cancer using prospective and exome analyses of 1463 circulating proteins and risk of 19 cancers in the UK Biobank, Atkins & Tong et al, Nature Communications, May 2024. https://www.nature.com/articles/s41467-024-48017-6
- Data will be made available to approved researchers through UK Biobank, via the UK Biobank Research Analysis Platform. Researchers can register to apply from around the world. For more information visit: https://www.ukbiobank.ac.uk/enable-your-research
- The Olink™ Explore HT platform and Ultima UG 100™ sequencers are currently labelled, “For research use only. Not for use in diagnostic procedures.”