Uncovering associations between pre-existing conditions and COVID-19 Severity: A polygenic risk score approach across three large biobanks

Precision Health members Dr. Lars Fritsche (Associate Research Scientist, Biostatistics, School of Public Health), Dr. Bhramar Mukherjee (Siobán D Harlow Collegiate Professor of Public Health, John D Kalbfleisch Distinguished University Professor of Biostatistics, Chair, Department of Biostatistics), Seunggeun Lee and others were part of a team that intended to explore and understand the genetic underpinnings of severe COVID-19 forms. Utilizing Michigan Genomics Initiative data and resources, they were able to make meaningful observations and have insights to share in a recently published piece in PLOS Genetics. Below, we talk further with Dr. Fritsche and Dr. Mukherjee to learn more:

Citation: Fritsche LG, Nam K, Du J, Kundu R, Salvatore M, Shi X, et al. (2023) Uncovering associations between pre-existing conditions and COVID-19 Severity: A polygenic risk score approach across three large biobanks. PLoS Genet 19(12): e1010907.

Briefly, what was the aim of this project at the outset?
Dr. Fritsche: Our study aimed to identify individuals with a genetic predisposition for severe COVID-19 and to study the enrichment of pre-existing, pre-pandemic diagnoses that may characterize especially vulnerable individuals, by leveraging polygenic risk scores (PRS). The PRS served as a proxy for COVID-19 severity and allowed us to study genetic and health data from over 500,000 individuals across three biobanks (Michigan Genomics Initiative, UK Biobank, and NIH All of Us) instead of relying on sparse COVID-19 outcome data.

What Precision Health data & or resources did you use, and how did these data utilized contribute to the ultimate success/outcome of the project?
Dr. Fritsche: For this project, we leveraged Precision Health’s MGI cohort, focusing on time-stamped clinical records and extensive imputed genetic data for constructing genome-wide PRSs. The comprehensive clinical histories in the EHR data allowed for a systematic creation of the MGI cohort’s pre-pandemic medical phenome. Additionally, the free access to the ARMIS2 computing environment provided by UM Precision Health was crucial for our data analysis.

Anything to share on the ease/usability/usefulness of these tools, data & resources that could encourage/help future users?
Dr. Fritsche: The provided computing environment, particularly the reliability and flexibility of ARC’s ARMIS2, enabled efficient and secure data processing and analysis. The genetic data and the accompanying auxiliary data (e.g., pre-computed principal components, inferred relatedness, and ancestry) were readily available, well-documented, and highly compatible with the data of the other two large biobanks.

What were some of your most salient findings?
Dr. Mukherjee: Our study confirmed the enrichment of certain pre-existing conditions, like obesity, metabolic disorders, smoking behavior, and cardiovascular diseases, in individuals predisposed to severe COVID-19, showcasing the value of using PRS as proxies for disease outcomes.

Hospitalization and severe outcome data for COVID-19 are porous, sparse  and heterogeneous across these biobanks. If we used actual health outcomes, our statistical power to discover associations with pre-existing conditions will be low. The COVID-19 severity polygenic risk score serves as a proxy for the severe health outcomes and can be calculated for any individual who has genetic data available. This is advantageous when you are looking at a rare outcome that is subject to heterogenous coding and reporting practices.

Based on your experience with Precision Health resources etc and this publication/project, will you look to Precision Health in the future for support on other studies?
Dr. Fritsche: Although this was my final COVID-19 project, the insights gained from extracting and processing EHR data, harmonizing it across three biobanks, and integrating genetics will remain invaluable for my future projects.

Moving forward, I plan to transition to larger, still-growing biobanks like NIH’s All of Us, which offer more diverse cohorts, deep whole genome sequencing (WGS) data, comprehensive survey and mobile health data. However, MGI and Precision Health will be an important independent dataset, that will come in handy as testing or validation cohort.