Defining disease risk with genotype and phenotype integration using machine learning methods
Principal Investigator:
Dr Irene Blat
Approved Research ID:
31984
Approval date:
February 22nd 2018
Lay summary
The primary aim of this proposal is to use the UK Biobank data to determine whether artificial intelligence methods based on combining genetic risk variants and phenotypic data improve disease risk classification vs. the conventional method of combining common genetic risk variants using a multiplicative model to create polygenic disease risk. This method may demonstrate that there is more information available in large genomic and medical databases such as the UK Biobank that may then be extracted using traditional statistical methods. The successful outcome of the research proposed in this application will help to improve the calculation of common disease risk, which will facilitate the prevention of diseases and morbidity. In addition to providing risk insights, our results have the potential to open new avenues of research for disease intervention and overall health. Application of artificial intelligence (AI) provides another angle for large data analysis. While this newer analysis method has great potential, it requires testing on considerably large and well-curated disease data collections such as the UK Biobank. The breadth of the phenotypic and genotypic data in the UK Biobank will allow us to test the hypothesis that AI methods perform better at defining disease risk. Our study will first combine the different data types using machine learning to determine important risk factors on a large population subset. We will then confirm these results using a different subset. We would like to include the full cohort for this study.