Exposure to biomedical, lifestyle, socioeconomic and mental health risk factors and association with long-term health outcomes using traditional statistical and machine learning approaches
Principal Investigator:
Dr David Lai
Approved Research ID:
57557
Approval date:
January 15th 2020
Lay summary
Promoting health and well-being allows people to prevent ill-health in their everyday lives. Exposure to biomedical, lifestyle, socioeconomic and mental health risk factors may impair long-term health and increase the burden of chronic disease. In the UK Biobank, traditional statistical and machine learning approaches may augment the performance of predictive modelling and these models may provide insights into reducing the likelihood of chronic disease. The aims of the project are as follows: 1. Use the UK Biobank datasets to study exposure to biomedical, lifestyle, socioeconomic and mental health risk factors and determine the effects of exposure to these risk factors on long-term health outcomes by using traditional statistical and machine learning approaches. 2. Apply machine learning approaches to identify core components contributing to model performance from a set of biomedical, lifestyle, socioeconomic and mental health risk factors. Using both supervised and unsupervised learning approaches, specified variables will be dimensionally reduced to choose prominent attributes for feature extraction and improve prediction power of the models. This study would focus on UK Biobank volunteer metadata which fall into the following four categories: 1. Lifestyle risk would be measured by exposure risk to physical activity, sleep, sedentary behaviour, smoking, alcohol and diet. A single measure for physical activity behaviour would be generated from accelerometer-derived time spent in low, moderate and vigorous intensity physical activity as well as sedentary behaviour and sleep. These variables would be compared with questionnaire responses. Diet would be assessed from food frequency questionnaires. Other lifestyle risk factors would be measured by using continuous or categorical variables of risk exposure. 2. Socioeconomic traits would be measured by household income, current employment status, marital status, living status, educational level and other parameters including residential air pollution, and land use density. 3. Mental health survey data would be aggregated according to DSM (Diagnostic and Statistical Manual of Mental Disorders) and clinical measures of outcome. 4. Biomedical and health risk factors as defined by bodily states that carry direct and specific risk would be studied to improve prediction modelling performance. The project would be completed over a period of 36 months. Machine learning approaches would be used to improve predictive performances over traditional statistical approaches when studying causal links between exposure to risk factors and long-term health outcomes. These machine learning approaches may allow further insights that guide health policy.