Our Research

Michigan Genomics Initiative

About the Michigan Genomics Initiative

The Michigan Genomics Initiative (MGI) is a collaborative research effort among physicians, researchers, and patients at the University of Michigan (U-M) with the goal of combining patient electronic health record (EHR) data with corresponding genetic data to gain novel biomedical insights. 

Through an opt-in consent document, MGI participants agree to provide the study team with access to EHR data for clinical information and a biospecimen (usually a tube of blood or a vial of saliva). MGI participants may also be asked to answer survey questions depending on the clinic from which they are recruited. Each MGI participant agrees that he/she may be re-contacted in the future for follow-up studies if they have a genotype or clinical condition of interest to investigators across the U-M research enterprise. 

Biospecimens that are collected from participants are sent to the Central Biorepository for processing, and DNA is isolated from the biospecimens. A portion of that DNA is set aside for array genotyping by the Advanced Genomics Core

Data collected through MGI are available by request to U-M researchers with a corresponding IRB-approval**. See section on “How To Request MGI Data” for data request details. 

Cohort Profile (July 2021)

There are currently ~84K consented participants through the MGI and partner studies and the addition of ~10K new participants per year is anticipated. Currently, all MGI participants with available genetic data have received care at the University of Michigan Health System. 

The MGI study team processes genetic data for all genotyped participants at a given time and makes these data available with the release of a “Data Freeze”. To date, the MGI has released 4 Data Freezes (Table 1).  

Data Freeze # # Participants Included Release Date
1 35,065 February 2017
2 47,513 September 2018
3 56,984 March 2020
4 60,215 July 2021

Table 1. Chronology of MGI Data Freezes. 

Data Freeze 4, the most current Freeze, was released in July 2021 and contains data from 60,215 genotyped MGI participants. 28,251 (≈ 47%) of participants are male and 31,964 (≈ 53%) are female. MGI participant ages range from 18 to above 89. The median age is 59 years, 62 for males and 57 for females (Figure 1). 

Figure 1. Distribution of age and genotyped-inferred sex of MGI participants included in Data Freeze 4. For MGI participants without a deceased date in our records, we report age as the number of years between date of birth and Jan 1st 2021. For MGI participants with a deceased date in our records, we report age as the number of years between date of birth and death.

The self-reported race of genotyped MGI participants as recorded during a medical office visit is Caucasian (51,967), African American (3,859), Unknown (2,229), Asian (1,829), American Indian or Alaska Native (273), and Native Hawaiian and Other Pacific Islander (58). The inferred majority genetic ancestry of the genotyped participants is primarily European (53,054) with smaller numbers of African (3,761), East Asian (1,281), Central/South Asian (891), West Asian (780), and Native American (448) (Figure 2). 

Figure 2: Genotype-inferred majority ancestry and self-reported race of MGI participants included in Data Freeze 4. (A.) Majority ancestry as inferred for MGI participants using the ADMIXTURE software with Human Genome Diversity Panel genotypes and continental population labels used as reference. (B.) Race as self-reported by MGI participants during a medical office visit. The left plot in each inset summarizes the full genotyped MGI cohort. The right plot in each inset is a zoom in view focusing on the non-European/non-Caucasian component of the cohort.

The EHR-derived data that are available for genotyped MGI participants spans many different phenotypic categories and the construction of large study cohorts is possible for a variety of phenotypes of potential interest to U-M researchers  (Figure 3).

Figure 3. Examples of abundant phenotypes among MGI participants included in Data Freeze 4. We classified ICD-9 billing codes from MGI participants into PheWAS phenotype codes using the PheWAS R package and plotted the phenotypes with the largest case counts from each of 17 distinct phenotype categories.

Available Genetic Data (July 2021)
Several genotype array- and sequence-based datasets are available for request by approved U-M researchers who would like to perform their own analysis of MGI genetic data (Table 2).

Data Type Description #Participants w/ Data Type
Genome-wide genotypes ~600K variants directly assayed by genotyping array and genotype imputed to > 51M variants with the Trans-Omics for Precision Medicine reference panel or > 32M variants with the Haplotype Reference Consortium panel. All currently available genotypes were assayed on the Infinium CoreExome array. In the future, most recently recruited MGI participants will be genotyped on the Infinium Global Screening Array to improve genome coverage for non-European participants 60,215
Whole exome sequences Sequence data covering protein coding gene regions (~2% of genome) as captured by the Roche/Nimblegen SeqCap EZ v2.0 or Agilent SureSelect V5-post systems 561
Targeted sequences Sequence data covering 151 targeted gene regions 963
HLA gene allele and amino acid inferences Inferences for human leukocyte antigen genes HLA-A, -B, -C, -DQA1, -DQB1, -DRB1, -DPA1, and -DPB1 60,215
Pharmacogenomic star allele inferences Inferences for 51 distinct pharmacogenes with polymorphic alleles, including CYP2C9, CYP2B6, CYP2C19, CYP3A5, NUDT15, TPMT, SLCO1B1, UGT1A1, DPYD, and CYP2D6* 60,215

Table 2. Genetic data available with the release of Data Freeze 4. * Pharmacogene alleles based on structural variation are not inferred.

Genetic Analysis Resources
Several resources provide researchers with the opportunity to use results from analyses of MGI genetic data (Table 3).

Resource Description
MGI PheWeb (Data Freeze 2) Online database of genome-wide associations for EHR-derived ICD billing codes from participants of the MGI. MGI PheWeb is current to Data Freeze 2
MGI + BioVU LabWAS Summary statistics from a meta-analysis of 70 EHR-derived quantitative laboratory measurements from the BioVU cohort from the Vanderbilt University Health System and the MGI cohort (Goldstein et al. PLoS Genetics 2020)
Custom genetic analysis An expert team of MGI analysts are available to support custom genetic analyses on MGI data such as genome-wide association or gene-based analyses. This service is available at no charge. Contact PHDataHelp@umich.edu for further information
MGI Encore [coming soon] An online tool that assists investigators with running genome-wide association studies using MGI genotype data and their own uploaded/selected phenotype data. Please contact PHDataHelp@umich.edu for information on obtaining phenotype data

Table 3. Available genetic analysis resources.

How To Request MGI Data 

To access these data, please apply through our ticketing system (submit a ”Custom Data Request” in  JIRA). You  will  need  to  submit  an  IRB  application through IRBMED to access these data**, which you can apply for in eResearch Regulatory Management.  For further assistance, please contact the Research Scientific Facilitators at phdatahelp@umich.edu, who can guide you through the data request process. The following “how-to” video offers a visual guide to the request process:


**An IRB application will be required for individual-level data access. All IRB applications should go through IRBMED and not HSBS.

Type of IRB application needed by investigators for clinical and/or genetic data:

Dataset Type IRB application
Aggregate datasets No IRB application required
 “De-Identified” or “Limited” datasets (per HIPAA definition) Require an IRB application. At a minimum, receive a “not-regulated” determination
Datasets with Protected Health Information (PHI) beyond the limited dataset level Require IRB review and approval or exempt determination

For IRB applications, please reference MGI HUM00071298.
De-Identified data and genomic data requests on their own are pre-approved by the MGI committee and do not need a specific letter or commitment to submit to IRB. Biospecimen requests and re-contact of MGI patients will need Precision Health MGI Access Committee approval.
Contact the Data Office for Clinical & Translational Research (DOCTR) with any IRB-related questions: DataOffice@umich.edu.