Approved research

Development of statistical methods to discover novel genetic associations, explain underlying biological mechanisms, and develop risk prediction models across varied complex diseases.

Harvard School of Public Health

Lay summary

We aim to develop and apply a suite of scalable, powerful, and robust tools that can further identify the genomic determinants of health and disease, explain the biological mechanisms underlying various outcomes, and predict disease risk using genetic and environmental factors. Many common human diseases - for example, lung cancer and cardiovascular diseases - possess a complex genetic etiology characterized by large numbers of genetic risk factors, but scientists have not yet been able to identify all the genetic variants contributing to most such outcomes. Thus, the first major goal of our project is to uncover the unknown genetic variants associated with various human traits, which will help us better understand and treat many diseases. We have been developing a variety of powerful statistical association tests that exploit the richness of datasets such as the UK Biobank. For example, we have developed tools to simultaneously interrogate sets of related genetic risk features and sets of related outcomes, instead of investigating them individually, therefore integrating a much larger amount of information than alternative approaches. We will apply these tools to discover new risk variants in many common diseases, including many cancers. Once risk variants have been identified, it is important to understand the biological mechanisms and pathways that explain how the variant is linked to individual outcomes. We have been developing a collection of powerful causal inference methods to analyze multi-step biological pathways. An example of an application we will pursue is testing whether the genetic variants of PCSK9 influence risk of cardiovascular disease through regulation of low-density lipoproteins. Related work includes determining precisely which variant (out of a group of correlated candidates) is causal and should be tested in these pathways. The third goal of our work is to synthesize whole-genome data with lifestyle and environmental information in a clinically impactful manner by developing novel risk prediction models for a variety of outcomes. Such algorithms may help screen for high-risk individuals so that they may begin treatment or prevention programs earlier in life. We will initially focus on lung cancer. Our three main aims will broadly help us better understand the genetic etiology of complex diseases. This information can assist health professionals in advancing treatment and prevention strategies for various illnesses. We expect the scope of the work we have outlined to take around three years.

Scope extension: We aim to develop novel Mendelian randomisation methods for inferring the causal effects of risk factors on health outcomes.