Skip to navigation Skip to main content Skip to footer

Approved research

Evaluating the utility of disease polygenetic risk scores in identifying high-risk individuals using UK Biobank data

Principal Investigator: Dr Wei Zheng
Approved Research ID: 40685
Approval date: November 15th 2018

Lay summary

With the rapid surge of genomic information for diseases and traits, there is a growing interest in its application in personalized medicine, including improving individual disease risk prediction. We propose to evaluate whether the aggregate information from multiple common variants associated with respective diseases and traits is useful to identify people who are at high risk of developing certain diseases and traits. By reviewing the literature and analyzing UK Biobank data, we will obtain a list of genetic variants associated with each of the common diseases and traits we would like to investigate, such as cancers and cardiometabolic diseases. We will create a polygenic risk score for each of these diseases and traits using these genetic variants. We will evaluate the association of each of these polygenic scores with their respective diseases and then estimate the proportion of the individuals that could be identified to be at a significantly elevated risk of developing a particular disease or a group of diseases. The results of our proposed study will help to understand the role of aggregate common variants inherited by each individual in determining disease and trait variation in the general population. Our research is in agreement with the aims of UK Biobank: research intended to improve the prevention, diagnosis and treatment of illness and the promotion of health throughout society.

Our current aims:

Cardiometabolic diseases, cancers and neurodegenerative diseases have been responsible for the largest mortality in the past decade in developed countries. Hundreds of common variants have been identified for them in GWAS. It is unclear whether these GWAS-identified variants can be used to identify high-risk individuals for disease-prevention. It has been suggested that the aggregate effect of these variants may help classify individuals into different clinically-applicable-disease-risk strata. It is likely that these variants combined could identify a small-proportion of people at a significantly-elevated risk of developing a particular disease. We hypothesize that a large-fraction of the population could be classified using a combination of these variants to have a significantly-elevated risk of developing any of these diseases. We aim to (1) construct PGS for each of these diseases using GWAS-identified variants,  (2) evaluate the associations of  PGSs with these diseases with UK-Biobank data, (3) determine the percentile of each PGS for classifying individuals who are at a certain defined risk (e.g, 2-fold-elevated risk compared with the medium-risk group) of developing a particular disease, and (4) calculate the percentage of UK-Biobank participants who are at a certain level (e.g., 2-fold) of developing a particular disease or a group of diseases.

We want to expand the aim (2) into: evaluate the associations of each PGS with risk of the corresponding disease and other diseases using UK-Biobank data and build models to predict risk of common diseases.

We want to expand it as:

Cardiometabolic diseases, cancers and neurodegenerative diseases have been responsible for the largest mortality in the past decade in developed countries. Hundreds of common variants have been identified for them in GWAS. It is unclear whether these GWAS-identified variants can be used to identify high-risk individuals for disease-prevention. It has been suggested that the aggregate effect of these variants may help classify individuals into different clinically-applicable-disease-risk strata. It is likely that these variants combined could identify a small-proportion of people at a significantly-elevated risk of developing a particular disease. We hypothesize that a large-fraction of the population could be classified using a combination of these variants to have a significantly-elevated risk of developing any of these diseases. We aim to (1) construct PGS for each of these diseases using GWAS-identified variants,  (2) evaluate the associations of  PGSs with these diseases with UK-Biobank data, (3) determine the percentile of each PGS for classifying individuals who are at a certain defined risk (e.g, 2-fold-elevated risk compared with the medium-risk group) of developing a particular disease, and (4) calculate the percentage of UK-Biobank participants who are at a certain level (e.g., 2-fold) of developing a particular disease or a group of diseases.

We want to expand the aim (2) into: evaluate the associations of each PGS with risk of the corresponding disease and other diseases using UK-Biobank data and build models to predict risk of common diseases.

We want to further expand the aim (2) into: evaluate the associations of each PGS with risk of the corresponding disease and other diseases using UK-Biobank data to build models to predict risk of common diseases and further to evaluate how PGSs interact with lifestyle factors to increase risk of diseases.

We would like to further expand the aim (2) into: evaluate the associations of PGSs and their interactions with non-genetic factors with the risk of accelerated aging.

We would like to further expand aim (2) into: evaluate associations of PGS for diseases and risk factors, obesity and related obesity measurements, lifestyle and other environmental factors,  proteins and metabolite biomarkers with the incidence and mortality of common diseases as well as their interactions in the development and progression of common diseases.