Data Access & Tools

Genetic Data

Participants in Precision Health Research Studies (MGI et al.) agree to donate a biospecimen. Once collected, the participants’ biospecimen is sent to the Central Biorepository for processing; once the DNA is isolated, a portion is then set aside for array genotyping by the Advanced Genomics Core. The MGI study team then processes and imputes the genetic data (more on this below).

Cohort Profile (July 2021)
There are currently ~84K consented participants through the MGI and partner studies and we anticipate the addition of ~10K new participants per year (in normal operational conditions). All MGI participants with available genetic data have received care at the University of Michigan Health System. 

The MGI study team processes genetic data for all genotyped participants at a given time and makes these data available with the release of a “Data Freeze”. To date, the MGI has released 4 Data Freezes (Table 1).  

Data Freeze # # Participants Included Release Date
1 35,065 February 2017
2 47,513 September 2018
3 56,984 March 2020
4 60,215 July 2021

Table 1. Chronology of MGI Data Freezes. Data Freeze 4, the most current Freeze, was released in July 2021 and contains data from 60,215 genotyped MGI participants. 28,251 (≈ 47%) of participants are male and 31,964 (≈ 53%) are female. MGI participant ages range from 18 to above 89. The median age is 59 years, 62 for males and 57 for females (Figure 1). 


Figure 1. Distribution of age and genotyped-inferred sex of MGI participants included in Data Freeze 4. For MGI participants without a deceased date in our records, we report age as the number of years between date of birth and Jan 1st 2021. For MGI participants with a deceased date in our records, we report age as the number of years between date of birth and death.


The self-reported race of genotyped MGI participants as recorded during a medical office visit is Caucasian (51,967), African American (3,859), Unknown (2,229), Asian (1,829), American Indian or Alaska Native (273), and Native Hawaiian and Other Pacific Islander (58). The inferred majority genetic ancestry of the genotyped participants is primarily European (53,054) with smaller numbers of African (3,761), East Asian (1,281), Central/South Asian (891), West Asian (780), and Native American (448) (Figure 2). 

Figure 2: Genotype-inferred majority ancestry and self-reported race of MGI participants included in Data Freeze 4. (A.) Majority ancestry as inferred for MGI participants using the ADMIXTURE software with Human Genome Diversity Panel genotypes and continental population labels used as reference.(B.) Race as self-reported by MGI participants during a medical office visit. The left plot in each inset summarizes the full genotyped MGI cohort. The right plot in each inset is a zoom in view focusing on the non-European/non-Caucasian component of the cohort.


The EHR-derived data that are available for genotyped MGI participants spans many different phenotypic categories and the construction of large study cohorts is possible for a variety of phenotypes of potential interest to U-M researchers  (Figure 3).


Figure 3. Examples of abundant phenotypes among MGI participants included in Data Freeze 4. We classified ICD-9 billing codes from MGI participants into PheWAS phenotype codes using the PheWAS R package and plotted the phenotypes with the largest case counts from each of 17 distinct phenotype categories.


Available Genetic Data (July 2021)
Several genotype array- and sequence-based datasets are available for request by approved U-M researchers who would like to perform their own analysis of MGI genetic data (Table 2).

Data Type Description #Participants w/ Data Type
Genome-wide genotypes ~600K variants directly assayed by genotyping array and genotype imputed to > 51M variants with the Trans-Omics for Precision Medicine reference panel or > 32M variants with the Haplotype Reference Consortium panel. All currently available genotypes were assayed on the Infinium CoreExome array. In the future, most recently recruited MGI participants will be genotyped on the Infinium Global Screening Array to improve genome coverage for non-European participants 60,215
Whole exome sequences Sequence data covering protein coding gene regions (~2% of genome) as captured by the Roche/Nimblegen SeqCap EZ v2.0 or Agilent SureSelect V5-post systems 561
Targeted sequences Sequence data covering 151 targeted gene regions 963
HLA gene allele and amino acid inferences Inferences for human leukocyte antigen genes HLA-A, -B, -C, -DQA1, -DQB1, -DRB1, -DPA1, and -DPB1 60,215
Pharmacogenomic star allele inferences Inferences for 51 distinct pharmacogenes with polymorphic alleles, including CYP2C9, CYP2B6, CYP2C19, CYP3A5, NUDT15, TPMT, SLCO1B1, UGT1A1, DPYD, and CYP2D6* 60,215

Table 2. Genetic data available with the release of Data Freeze 4. * Pharmacogene alleles based on structural variation are not inferred.


Genetic Analysis Resources
Several resources provide researchers with the opportunity to use results from analyses of MGI genetic data (Table 3).

Resource Description
MGI PheWeb (Data Freeze 2) Online database of genome-wide associations for EHR-derived ICD billing codes from participants of the MGI. MGI PheWeb is current to Data Freeze 2
MGI + BioVU LabWAS Summary statistics from a meta-analysis of 70 EHR-derived quantitative laboratory measurements from the BioVU cohort from the Vanderbilt University Health System and the MGI cohort (Goldstein et al. PLoS Genetics 2020)
Custom genetic analysis An expert team of MGI analysts are available to support custom genetic analyses on MGI data such as genome-wide association or gene-based analyses. This service is available at no charge. Contact for further information
MGI Encore (requires VPN connection) An online tool that assists investigators with running genome-wide association studies using MGI genotype data and their own uploaded/selected phenotype data. Please contact for information on obtaining phenotype data

Table 3. Available genetic analysis resources.


Completion of the prerequisites listed here is required for access. U-M VPN login is required to access all of the components of the Precision Health secure enclave. We encourage you to connect with us at so we can facilitate your access.