About the Michigan Genomics Initiative
The Michigan Genomics Initiative (MGI) is a collaborative research effort among physicians, researchers, and patients at the University of Michigan (U-M) with the goal of combining patient electronic health record (EHR) data with corresponding genetic data to gain novel biomedical insights.
Through an opt-in consent document, MGI participants agree to provide the study team with access to EHR data for clinical information and a biospecimen (usually a tube of blood or a vial of saliva). Prospective participants are provided a graphical pamphlet during the consent process that describes potential risks and benefits of the MGI study, how participant data will be used, and information on withdrawing from the study. MGI participants may also be asked to answer survey questions depending on the clinic from which they are recruited. Each MGI participant agrees that he/she may be re-contacted in the future for follow-up studies if they have a genotype or clinical condition of interest to investigators across the U-M research enterprise.
Biospecimens that are collected from participants are sent to the Central Biorepository for processing, and DNA is isolated from the biospecimens. A portion of that DNA is set aside for array genotyping by the Advanced Genomics Core.
Data collected through MGI are available by request to U-M researchers with a corresponding IRB-approval**. See section on “How To Request MGI Data” for data request details.
Cohort Profile (February 2023)
There are currently ~95K consented participants through the MGI and partner studies and the addition of ~10K new participants per year is anticipated. Currently, all MGI participants with available genetic data have received care at the University of Michigan Health System.
The MGI study team processes genetic data for all genotyped participants at a given time and makes these data available with the release of a “Data Freeze”. To date, the MGI has released 5 Data Freezes (Table 1).
Data Freeze # | # Participants Included | Release Date |
1 | 35,065 | February 2017 |
2 | 47,513 | September 2018 |
3 | 56,984 | March 2020 |
4 | 60,215 | July 2021 |
5 | 70,439 | November 2022 |
Table 1. Chronology of MGI Data Freezes.
Data Freeze 5, the most current Freeze, was released in November 2022 and contains data from 70,439 genotyped MGI participants. 33,081 (≈ 47%) of participants are male and 37,358 (≈ 53%) are female. The median age, as calculated from date of birth in electronic health record as of January 1st 2022 or time of death, was 60 years (median of 62 years for males and 57 for females). 173 participants were under 18 years of age (Figure 1).

Figure 1. Distribution of age and genotyped-inferred sex of MGI participants included in Data Freeze 5. For MGI participants without a deceased date in our records, we report age as the number of years between date of birth and Jan 1st 2022. For MGI participants with a deceased date in our records, we report age as the number of years between date of birth and death.
The self-reported race of genotyped MGI participants as recorded during a medical office visit is Caucasian (n=60,598), African American (n=4,561), Unknown (n=2,946), Asian (n=1,910), American Indian or Alaska Native (n=355), Native Hawaiian and Other Pacific Islander (n=69). The inferred majority genetic ancestry of the genotyped participants is primarily European (n=61,113) with smaller numbers of African (n=4,450), Western Asian (n=1,885), Eastern Asian (n=1,426), Central/South Asian (n=963), and Native American (n=602) (Figure 2).

Figure 2: Genotype-inferred majority ancestry and self-reported race of MGI participants included in Data Freeze 5. (A.) Majority ancestry as inferred for MGI participants using the ADMIXTURE software with Human Genome Diversity Panel genotypes and continental population labels used as reference. (B.) Race as self-reported by MGI participants during a medical office visit. The left plot in each inset summarizes the full genotyped MGI cohort. The right plot in each inset is a zoom in view focusing on the non-European/non-Caucasian component of the cohort.
The EHR-derived data that are available for genotyped MGI participants spans many different phenotypic categories and the construction of large study cohorts is possible for a variety of phenotypes of potential interest to U-M researchers (Figure 3).

Figure 3. Examples of abundant phenotypes among MGI participants included in Data Freeze 5. We classified ICD-9 and ICD-10 billing codes from MGI participants into PheWAS phenotype codes using the PheWAS R package and plotted the phenotypes with the largest case counts from each of 17 distinct phenotype categories.
Available Genetic Data (February 2023)
Several genotype array- and sequence-based datasets are available for request by approved U-M researchers who would like to perform their own analysis of MGI genetic data (Table 2).
Data Type (click link for docs) | Description | #Participants w/ Data Type |
Genome-wide genotypes | Genotypes for ~570K sites for 60,176 participants assayed by one of three versions of a customized Illumina Infinium CoreExome genotyping array and for 682,590 sites for 10,263 participants assayed by a customized Illumina Infinium Global Screening Array and genotype imputed to > 50 million sites using the Trans-Omics for Precision Medicine reference panel | 70,439 |
Polygenic scores | Polygenic scores for 70,266 participants included in Freeze 5 for six traits: thyroid cancer, primary open angle glaucoma, abdominal aortic aneurysm, chronic obstructive pulmonary disease, asthma, and gout | 70,266 |
Whole exome sequences | Sequence data covering protein coding gene regions (~2% of genome) as captured by the Roche/Nimblegen SeqCap EZ v2.0 or Agilent SureSelect V5-post systems | 606 |
Targeted sequences | Sequence data covering 151 targeted gene regions | 964 |
Table 2. Genetic data available with the release of Data Freeze 5.
Genetic Analysis Resources
Several resources provide researchers with the opportunity to use results from analyses of MGI genetic data (Table 3).
Resource | Description |
Consultations* | We can discuss available data, recommended software, compute environment, etc. We can also provide feedback on your study design and analysis plan* |
Data Curation and Delivery | We will upload requested genetic data directly to your HIPAA-compliant environment or grant you access to a central copy. We are interested in expanding our services to better support your research. If there are specific file formats or types of genetic data that we currently do not offer but you would like to request, please email PHDataHelp@umich.edu |
Custom genetic analysis† | An expert team of MGI analysts are available to support custom genetic analyses on MGI data free of charge including:
|
Presentation | We can offer a one-time presentation at your lab or workgroup meeting to go over results from your custom analysis request or introduce topics in genetics (GWAS, polygenic risk scores, etc.) |
MGI Encore‡ (requires VPN connection) |
An online tool that assists investigators with running genome-wide association studies using MGI genotype data and their own uploaded/selected phenotype data. Please contact PHDataHelp@umich.edu for information on obtaining phenotype data and instructions for accessing Encore. |
MGI + BioVU LabWAS‡ | Summary statistics from a meta-analysis of 70 EHR-derived quantitative laboratory measurements from the BioVU cohort from the Vanderbilt University Health System and the MGI cohort (Goldstein et al. PLoS Genetics 2020) |
MGI PheWeb (Data Freeze 3)‡ | Online database of genome-wide associations for EHR-derived ICD billing codes from participants of the MGI. MGI PheWeb is current to Data Freeze 3 |
Table 3. Available genetic analysis resources. *Time limit typically 1 hour. Additional requests will be honored at our discretion and as time allows. †Contact PHDataHelp@umich.edu for further information. ‡For our self-serve tools, we are more than happy to provide one-on-one assistance if the provided documentation is not sufficient. We can offer a 30-minute tutorial per service.
To access these data, please apply through our ticketing system (submit a ”Custom Data Request” in JIRA). You will need to submit an IRB application through IRBMED to access these data**, which you can apply for in eResearch Regulatory Management. For further assistance, please contact the Research Scientific Facilitators at phdatahelp@umich.edu, who can guide you through the data request process. The following “how-to” video offers a visual guide to the request process:
**An IRB application will be required for individual-level data access. All IRB applications should go through IRBMED and not HSBS.
Type of IRB application needed by investigators for clinical and/or genetic data:
Dataset Type | IRB application |
Aggregate datasets | No IRB application required |
“De-Identified” or “Limited” datasets (per HIPAA definition) | Require an IRB application. At a minimum, receive a “not-regulated” determination |
Datasets with Protected Health Information (PHI) beyond the limited dataset level | Require IRB review and approval or exempt determination |
For IRB applications, please reference MGI HUM00071298.
De-Identified data and genomic data requests on their own are pre-approved by the MGI committee and do not need a specific letter or commitment to submit to IRB. Biospecimen requests and re-contact of MGI patients will need Precision Health MGI Access Committee approval.
Contact the Data Office for Clinical & Translational Research (DOCTR) with any IRB-related questions: DataOffice@umich.edu.