Statistical methods for large-scale genomic analysis
Principal Investigator:
Dr Pier Francesco Palamara
Approved Research ID:
43206
Approval date:
April 18th 2019
Lay summary
We will develop new statistical and computational methods to enable the analysis of very large data sets containing genomic, environmental, and health-related information. We will study several properties of the data, such as the extent to which a group of individuals are genetically related, and use this information to develop new computational strategies and improve a number of analyses that aim at studying past evolutionary events (e.g. detecting evidence for natural selection), improve the detection of trait- and disease-associated regions of the genome (e.g. via analyses such as genotype imputation, haplotype phasing, GWAS), studying genetic, phenotypic, and environmental variation (e.g. understanding the interplay between genes and environment, quantifying heritability), and producing new evolutionary and functional genomic annotations (e.g. predicting whether a genomic region is involved in certain biological processes). The data in the UK Biobank will enable us to develop and test new computational methods, and to apply them in these analyses. We will analyze the full cohort and a wide range of phenotypes, including diseases (e.g. type 2 diabetes) and quantitative traits (e.g. height, BMI). These new methods and analyses will improve our ability to process genomic, environmental, and health-related data, and are aimed at providing a better understanding human evolution, biological process, and the causes of disease.