About the Michigan Genomics Initiative
The Michigan Genomics Initiative (MGI) is a collaborative research effort among physicians, researchers, and patients at the University of Michigan (U-M) with the goal of combining patient electronic health record (EHR) data with corresponding genetic data to gain novel biomedical insights.
Through an opt-in consent document, MGI participants agree to provide the study team with access to EHR data for clinical information and a biospecimen (usually a tube of blood or a vial of saliva). MGI participants may also be asked to answer survey questions depending on the clinic from which they are recruited. Each MGI participant agrees that he/she may be re-contacted in the future for follow-up studies if they have a genotype or clinical condition of interest to investigators across the U-M research enterprise.
Biospecimens that are collected from participants are sent to the Central Biorepository for processing, and DNA is isolated from the biospecimens. A portion of that DNA is set aside for array genotyping by the Advanced Genomics Core.
Data collected through MGI are available by request to U-M researchers with a corresponding IRB-approval**. See section on “How To Request MGI Data” for data request details.
Cohort Profile (July 2021)
There are currently ~84K consented participants through the MGI and partner studies and the addition of ~10K new participants per year is anticipated. Currently, all MGI participants with available genetic data have received care at the University of Michigan Health System.
The MGI study team processes genetic data for all genotyped participants at a given time and makes these data available with the release of a “Data Freeze”. To date, the MGI has released 4 Data Freezes (Table 1).
|Data Freeze #||# Participants Included||Release Date|
Table 1. Chronology of MGI Data Freezes.
Data Freeze 4, the most current Freeze, was released in July 2021 and contains data from 60,215 genotyped MGI participants. 28,251 (≈ 47%) of participants are male and 31,964 (≈ 53%) are female. MGI participant ages range from 18 to above 89. The median age is 59 years, 62 for males and 57 for females (Figure 1).
Figure 1. Distribution of age and genotyped-inferred sex of MGI participants included in Data Freeze 4. For MGI participants without a deceased date in our records, we report age as the number of years between date of birth and Jan 1st 2021. For MGI participants with a deceased date in our records, we report age as the number of years between date of birth and death.
The self-reported race of genotyped MGI participants as recorded during a medical office visit is Caucasian (51,967), African American (3,859), Unknown (2,229), Asian (1,829), American Indian or Alaska Native (273), and Native Hawaiian and Other Pacific Islander (58). The inferred majority genetic ancestry of the genotyped participants is primarily European (53,054) with smaller numbers of African (3,761), East Asian (1,281), Central/South Asian (891), West Asian (780), and Native American (448) (Figure 2).
Figure 2: Genotype-inferred majority ancestry and self-reported race of MGI participants included in Data Freeze 4. (A.) Majority ancestry as inferred for MGI participants using the ADMIXTURE software with Human Genome Diversity Panel genotypes and continental population labels used as reference. (B.) Race as self-reported by MGI participants during a medical office visit. The left plot in each inset summarizes the full genotyped MGI cohort. The right plot in each inset is a zoom in view focusing on the non-European/non-Caucasian component of the cohort.
The EHR-derived data that are available for genotyped MGI participants spans many different phenotypic categories and the construction of large study cohorts is possible for a variety of phenotypes of potential interest to U-M researchers (Figure 3).
Figure 3. Examples of abundant phenotypes among MGI participants included in Data Freeze 4. We classified ICD-9 billing codes from MGI participants into PheWAS phenotype codes using the PheWAS R package and plotted the phenotypes with the largest case counts from each of 17 distinct phenotype categories.
Available Genetic Data (July 2021)
Several genotype array- and sequence-based datasets are available for request by approved U-M researchers who would like to perform their own analysis of MGI genetic data (Table 2).
|Data Type (click link for docs)||Description||#Participants w/ Data Type|
|Genome-wide genotypes||~600K variants directly assayed by genotyping array and genotype imputed to > 51M variants with the Trans-Omics for Precision Medicine reference panel or > 32M variants with the Haplotype Reference Consortium panel. All currently available genotypes were assayed on the Infinium CoreExome array. In the future, most recently recruited MGI participants will be genotyped on the Infinium Global Screening Array to improve genome coverage for non-European participants||60,215|
|Whole exome sequences||Sequence data covering protein coding gene regions (~2% of genome) as captured by the Roche/Nimblegen SeqCap EZ v2.0 or Agilent SureSelect V5-post systems||561|
|Targeted sequences||Sequence data covering 151 targeted gene regions||963|
|HLA gene allele and amino acid inferences||Inferences for human leukocyte antigen genes HLA-A, -B, -C, -DQA1, -DQB1, -DRB1, -DPA1, and -DPB1||60,215|
|Pharmacogenomic star allele inferences||Inferences for 51 distinct pharmacogenes with polymorphic alleles, including CYP2C9, CYP2B6, CYP2C19, CYP3A5, NUDT15, TPMT, SLCO1B1, UGT1A1, DPYD, and CYP2D6*||60,215|
|Local ancestry inferences||Estimation of the genetic ancestry of tracts of DNA in MGI participants.||60,215|
Table 2. Genetic data available with the release of Data Freeze 4. * Pharmacogene alleles based on structural variation are not inferred.
Genetic Analysis Resources
Several resources provide researchers with the opportunity to use results from analyses of MGI genetic data (Table 3).
|Consultations*||We can discuss available data, recommended software, compute environment, etc. We can also provide feedback on your study design and analysis plan*|
|Data Curation and Delivery||We will upload requested genetic data directly to your HIPAA-compliant environment or grant you access to a central copy. We are interested in expanding our services to better support your research. If there are specific file formats or types of genetic data that we currently do not offer but you would like to request, please email PHDataHelp@umich.edu|
|Custom genetic analysis†||An expert team of MGI analysts are available to support custom genetic analyses on MGI data free of charge including:
|Presentation||We can offer a one-time presentation at your lab or workgroup meeting to go over results from your custom analysis request or introduce topics in genetics (GWAS, polygenic risk scores, etc.)|
(requires VPN connection)
|An online tool that assists investigators with running genome-wide association studies using MGI genotype data and their own uploaded/selected phenotype data. Please contact PHDataHelp@umich.edu for information on obtaining phenotype data and instructions for accessing Encore.|
|MGI + BioVU LabWAS‡||Summary statistics from a meta-analysis of 70 EHR-derived quantitative laboratory measurements from the BioVU cohort from the Vanderbilt University Health System and the MGI cohort (Goldstein et al. PLoS Genetics 2020)
|MGI PheWeb (Data Freeze 3)‡||Online database of genome-wide associations for EHR-derived ICD billing codes from participants of the MGI. MGI PheWeb is current to Data Freeze 3|
Table 3. Available genetic analysis resources. *Time limit typically 1 hour. Additional requests will be honored at our discretion and as time allows. †Contact PHDataHelp@umich.edu for further information. ‡For our self-serve tools, we are more than happy to provide one-on-one assistance if the provided documentation is not sufficient. We can offer a 30-minute tutorial per service.
To access these data, please apply through our ticketing system (submit a ”Custom Data Request” in JIRA). You will need to submit an IRB application through IRBMED to access these data**, which you can apply for in eResearch Regulatory Management. For further assistance, please contact the Research Scientific Facilitators at email@example.com, who can guide you through the data request process. The following “how-to” video offers a visual guide to the request process:
**An IRB application will be required for individual-level data access. All IRB applications should go through IRBMED and not HSBS.
Type of IRB application needed by investigators for clinical and/or genetic data:
|Dataset Type||IRB application|
|Aggregate datasets||No IRB application required|
|“De-Identified” or “Limited” datasets (per HIPAA definition)||Require an IRB application. At a minimum, receive a “not-regulated” determination|
|Datasets with Protected Health Information (PHI) beyond the limited dataset level||Require IRB review and approval or exempt determination|
For IRB applications, please reference MGI HUM00071298.
De-Identified data and genomic data requests on their own are pre-approved by the MGI committee and do not need a specific letter or commitment to submit to IRB. Biospecimen requests and re-contact of MGI patients will need Precision Health MGI Access Committee approval.
Contact the Data Office for Clinical & Translational Research (DOCTR) with any IRB-related questions: DataOffice@umich.edu.