Skip to navigation Skip to main content Skip to footer

Approved research

Aging Study by Genotype and DNA Methylation in Neural Networks

Principal Investigator: Dr Yu Zhang
Approved Research ID: 56317
Approval date: April 1st 2020

Lay summary

Research on predicting human biological aging has been booming during the last few decades. While traditional formula of aging prediction is based on linear models, relatively few works have explored the effectiveness of neural network models, which tends to have the advantage of learning more complex relationship from the data. In this project, we will study three age-related diseases: cardiovascular disease, cancer, and diabetes. The duration is three years. Neural networks are computing systems that are inspired by, but not identical to, biological neural networks that constitute human brains. A neural network learns to perform tasks by considering examples with labeled features. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as "cat" or "no cat" and using the results to identify cats in other images. They do this without any prior knowledge of cats because the neural network can learn the characteristics of cat (fur, tails, whiskers, etc.) from a large number of examples. However, biological gnome data usually consists of hundreds of thousands of features (i.e. genetic markers), while the samples (i.e. patients) are limited due to the convoluted, expensive procedure of gathering data. This introduces the problem of overfitting which leads to poor generalization when applied to different datasets. Overfitting means that the learning is too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably. In this project, we will explore several neural network models for handling biological gnome data while taking care of overfitting problems. We propose several models: Basic Neural Network, Dropout Neural Network, Least Absolute Shrinkage and Selection Operator (LASSO) Neural Network, Elastic Net Neural Network, and Correlation Pre-Filtered Neural Network (CPFNN). Our goal is to choose a model with best age prediction by comparing these models and use that model to achieve a decent result on predicting the probability of age-related disease. The project will have several significant contributions: 1) we will develop a new direction on how to approach high dimension low sample size data in the computer science field; 2) our model can be extended in other fields which also has high dimension low sample size data such as finance; 3) we will provide a sophisticated model on age prediction and age-related disease prediction, which will have a huge impact on clinic fields.