Approved Research

Improvement of Polygenic Scores

Charles University

Lay summary

Applied research often deals with the high-dimensional nature of genetic data by constructing univariate polygenic scores (PGS), which are often interpreted as individual genetic predisposition for a certain trait. PGS is an index that is constructed using information about individual's single-nucleotide polymorphisms (SNP) together with the results from genome-wide association studies (GWAS). PGS is then a weighted average of individual risky alleles over all the SNPs weighted by their coefficient from a GWAS. Thereby, researchers combine the power from large GWAS samples with the detail of survey data. This approach has allowed researchers to examine the role of genes for many outcomes that are often only measured in small survey samples. Yet, the PGS approach faces methodological problems arising from an incomplete statistical foundation, especially for outcomes that are determined jointly by genes and the environment.

In this project, we will examine the consequences of these problems and develop methods to improve them. Specifically, we will develop a sample-splitting approach to address the current issues of polygenic scores such as external validity and correct for linkage disequilibrium. Correcting these issues will increase the predictive power of polygenic scores constructed from the estimates GWAS currently provide, which will make them more useful to both the public and to scientists. In addition, we will investigate what additional information GWAS can provide to construct improved polygenic scores. Such additional information is particularly relevant for studies of outcomes that are affected by both genes and environment. In these situations, the PGS model can be biased due to differences in the environments between GWAS and the survey samples. We aim to examine the bias arising from the current approach and develop methods to alleviate the problem, such as machine learning methods designed to capture heterogeneous impacts. We will then examine how our new methods of polygenic score construction perform when no information from GWAS are available by studying their performance in small samples of the size of genotyped surveys. Methods that work reliably in small samples are crucial to expand research on the role of genes beyond outcomes for which GWAS exist and to outcomes that are only available in surveys. Finally, we will use the new methods to study outcomes that are jointly determined by genes and the environments. We will investigate how genes and the environments jointly affect important behaviors including attitudes to risk, education, obesity, or smoking.