Integrating Clinical Model, Polygenic Risk Score, and Machine Learning Models in the Prediction of Cardio-Metabolic Disease Using an Ensemble Approach
Principal Investigator:
Professor Chien-Chang Lee
Approved Research ID:
62093
Approval date:
May 18th 2020
Lay summary
Cardiometabolic diseases, such as metabolic syndrome, gout, heart attack, and stroke are the leading cause of death worldwide. Thus, the primary purpose of this study is to compare the accuracy of two prediction methods (polygenic risk score-enhanced prediction model vs machine learning models) for cardiometabolic diseases. The predictive models we are building can be used for the early detection of cardiometabolic diseases, and fulfills the UK biobank stated purpose of improving the prevention, and of illnesses. The development of cardiometabolic diseases is a complex interplay between genetics, lifestyle factors, and even infection history. Recently, it was found that infection increases the risk of cardiovascular complications by up to 5 fold. Thus, the enhanced polygenic risk score will be created using all the information from common and rare genetic variants, lifestyle factors, comorbidities, infection history, and medications ( a proxy for undocumented comorbidity). Therefore, this research requires access to both the GWAS data and electronic health records. Likewise, machine learning models such as the ensemble methods will be allowed to access the same set of data as the polygenic risk score. The investigator has extensive experience in data science research, and have published > 20 international publications on cardiometabolic disease prediction and treatment. It is expected that the investigator will lead his team to complete this research in 30 months.