Approved Research
Comparison and replication of GWAS and gene-environment interaction signals and polygenic risk scores for complex traits in the African AWI-Gen cohort study
Approved Research ID: 63215
Approval date: November 26th 2020
Lay summary
African populations are experiencing an increase in the health burden of non-communicable disease, as has been experienced in developed regions of the world over the past 50 years. The sustainability development goals have therefore emphasized the need to reduce the prevalence of obesity, high blood pressure, diabetes, cancer and lung disease, among many others. The aim of our study is the compare and replicate the outcomes from genome-wide association studies (GWASs) and gene-environment interactions, as well as polygenic risk scores, detected or developed in the African AWI-Gen cohort study to data from the UK Biobank. AWI-Gen is an African population-based cohort of ~12,000 male and female participants (40 to years old at baseline) from Ghana, Burkina Faso, Kenya and South Africa with data similar to, though not as extensive as that in the UK Biobank. There are few cross-sectional population studies of older adults in Africa making it difficult to do replication studies to test the robustness and transferability of results.
We will use the UK Biobank to strengthen our studies and to better understand the similarities and differences in the contributions of genetic susceptibility and gene-environment interaction to disease between African and non-African populations. Since Africa is the cradle of humankind and has genetic variants not found elsewhere in the world, there is an opportunity to make novel discoveries that could inform improved treatments and health outcomes worldwide.
Scope extension:
Non-communicable diseases, such as those we are investigating, are on the rise globally but the key risk factors and combinations of triggers and outcomes are poorly understood in African countries that are challenged by poor access to healthcare and limited resources.
We have built a cohort of ~12,000 individuals (referred to as AWI-Gen), aged 40 to 60 years at baseline, including men and women from four sub-Saharan African countries (Ghana, Burkina Faso, Kenya and South Africa). Our aim is to study genetic and environmental contributions to cardiovascular, metabolic, kidney, and lung diseases and traits and cognition in this African cohort. There is very little comparative data on African populations and it is unclear how well the data from non-African populations can be transferred to African populations. We seek access to data from the UK Biobank, including data from ~8000 people of African ancestry, to perform replication studies and to examine transferability of genetic associations and gene-environment interactions across different populations. It is important to understand genetic susceptibility or risk for common non-communicable diseases and traits, given a range of different social, demographic and environmental conditions in African communities.
In the extended scope we will use the UKBB genome-wide genotyping data from African-ancestry and European-ancestry participants to generate random selections of sub-groups to use in simulations. We will select several thousand participants from each group and then randomly assign them into pseudo phenotype groups (cases and controls). No phenotype data (other than basic data including age, sex and genetic PCs) will be used to inform the randomisation into the different groups. Then we will perform GWAS between these pseudo datasets with multiple iterations on the method and cut offs used (e.g. for MAF). These simulated data sets will be used to estimate appropriate p value cut-offs for GWAS in African data (also compared to European data) and to assess the potential for (and nature of) false genetic associations. This will help us to get a better understanding of the appropriate use of cut offs for African datasets to avoid reporting false associations. We would plan to publish this work with a set of recommendations that we hope will be useful to the scientific community. No additional data will be required.