Michigan Genomics Initiative

About the Michigan Genomics Initiative

The Michigan Genomics Initiative (MGI) is a collaborative research effort among physicians, researchers, and patients at the University of Michigan (U-M) with the goal of combining patient electronic health record (EHR) data with corresponding genetic data to gain novel biomedical insights.

Through an opt-in consent document, MGI participants agree to provide the study team with access to EHR data for clinical information and a biospecimen (usually a tube of blood or a vial of saliva). Prospective participants are provided a graphical pamphlet during the consent process that describes potential risks and benefits of the MGI study, how participant data will be used, and information on withdrawing from the study. MGI participants may also be asked to answer survey questions depending on the clinic from which they are recruited. Each MGI participant agrees that he/she may be re-contacted in the future for follow-up studies if they have a genotype or clinical condition of interest to investigators across the U-M research enterprise.

Biospecimens that are collected from participants are sent to the Central Biorepository for processing, and DNA is isolated from the biospecimens. A portion of that DNA is set aside for array genotyping by the Advanced Genomics Core.

Data collected through MGI are available by request to U-M researchers with a corresponding IRB-approval**. See section on “How To Request MGI Data” for data request details.

Cohort Profile (September 2023)

There are currently ~100K consented participants through the MGI and partner studies and the addition of ~10K new participants per year is anticipated. Currently, all MGI participants with available genetic data have received care at the University of Michigan Health System.

The MGI study team processes genetic data for all genotyped participants at a given time and makes these data available with the release of a “Data Freeze”. To date, the MGI has released 6 Data Freezes (Figure 1).

Figure 1. Chronology of MGI Data Freezes.

Data Freeze 6, the most current Freeze, was released in September 2023 and contains data from 80,529 genotyped MGI participants. 43,350 (≈ 46%) of participants are male and 37,179 (≈ 54%) are female. The median age, as calculated from date of birth in electronic health record as of January 1st 2023 or time of death, was 60 years (median of 63 years for males and 57 for females). (Figure 2).

Figure 2. Distribution of age and genotyped-inferred sex of MGI participants included in Data Freeze 6. For MGI participants without a deceased date in our records, we report age as the number of years between date of birth and Jan 1st 2023. For MGI participants with a deceased date in our records, we report age as the number of years between date of birth and death.

The self-reported races of participants as recorded during a medical office visit consisted of Caucasian (n=69,120), African American (n=5,128), Asian (n=2,474), Other (n=1,862), Unknown (n=754), American Indian or Alaska Native (n=438), or Native Hawaiian and Other Pacific Islander (n=78). 334 participants refused to report a race and 341 had a missing value for race. The inferred majority genetic ancestry of the participants was primarily European (n=69,589) with smaller numbers of African (n=4,993), Western Asian (n=2,260), Eastern Asian
(n=1,784), Central/South Asian (n=1,181), and Native American (n=722) descent (Figure 3).

Figure 3: Genetic admixture of MGI participants. We inferred the genetic ancestry of MGI participants using the ADMIXTURE software with Human Genome Diversity Panel genotypes and super-population labels as reference. We defined the majority ancestry for each participant as the continental population label with the largest reported Q value (ancestry fraction) from
ADMIXTURE. Each inset is a stacked barplot of Q values for each participant belonging to the respective majority ancestry population.

The EHR-derived data that are available for genotyped MGI participants spans many different phenotypic categories and the construction of large study cohorts is possible for a variety of phenotypes of potential interest to U-M researchers (Figure 4).

Figure 4. Examples of abundant phenotypes among MGI participants included in Data Freeze 6. We classified ICD-9 and ICD-10 billing codes from MGI participants into PheWAS phenotype codes using the PheWAS R package and plotted the phenotypes with the largest case counts from each of 17 distinct phenotype categories.

MGI PheWEB

Our PheWeb contains results from genome-wide association studies (GWAS) of 1728 EHR-derived broad PheWAS codes for approximately 52 million imputed variants in a multi-ancestry cohort of 80,381 individuals from the Michigan Genomics Initiative (MGI). Phenotypes are derived from ICD diagnosis codes extracted from patient Electronic Health Records using the PheWAS R package. The PheWeb interface allows interactive exploration of individual GWAS as well as phenome-wide association analysis of individual genetic variants.
More information about the Freeze 6 Pheweb can be found here. Please see the video below for a demonstration of the PheWEB features.

Available Genetic Data (September 2023)
Several genotype array- and sequence-based datasets are available for request by approved U-M researchers who would like to perform their own analysis of MGI genetic data (Table 1).

Data Type (click link for docs)	Description	#Participants w/ Data Type
Genome-wide genotypes	Genotypes for ~570K sites for 60,715 participants assayed by one of three versions of a customized Illumina Infinium CoreExome genotyping array and for 619,235 sites for 19,814 participants assayed by a customized Illumina Infinium Global Screening Array and genotype imputed to > 52 million sites using the Trans-Omics for Precision Medicine reference panel.	80,529
HLA alleles	HLA gene allele and amino acid inferences for three HLA class I genes (HLA-A, -B and -C), five class II genes (HLA-DQA1 ,-DQB1 , -DRB1 , -DPA1 , and -DPB1), and MHC region single nucleotide variations (SNVs) for 80,529 MGI participants included in Data Freeze 6.	80,529
Polygenic scores	Polygenic scores for 70,266 participants included in Freeze 5 for six traits: thyroid cancer, primary open angle glaucoma, abdominal aortic aneurysm, chronic obstructive pulmonary disease, asthma, and gout.	70,266
Star alleles	in-silico inferences for star alleles and activity phenotypes for 13 important pharmacogenes.	70,266
Whole exome sequences	Sequence data covering protein coding gene regions (~2% of genome) as captured by the Roche/Nimblegen SeqCap EZ v2.0 or Agilent SureSelect V5-post systems.	606
Targeted sequences	Sequence data covering 151 targeted gene regions.	964

Table 1. Genetic data available.

Genetic Analysis Resources
Several resources provide researchers with the opportunity to use results from analyses of MGI genetic data (Table 2).

Resource	Description
Consultations*	We can discuss available data, recommended software, compute environment, etc. We can also provide feedback on your study design and analysis plan*.
Genetic Data: Central copy access	We will provide access to a central copy directory containing MGI genetic data. To access the genetic data, you will need to use one of our Precision Health Computing Environments. If there are specific file formats or types of genetic data that we currently do not offer but you would like to request, please email PHDataHelp@umich.edu.
Genetic Data: Subset	We will upload a requested subset of genetic data (i.e. specific variants or cohorts) directly to your HIPAA-compliant environment.
Custom genetic analysis†	An expert team of MGI analysts are available to support custom genetic analyses on MGI data free of charge including: Genome-wide association studies (GWAS) Gene-based burden tests Ancestry analysis Polygenic risk score calculation Interpretation of results* Text/feedback for Methods or Results section of manuscripts
Presentation	We can offer a one-time presentation at your lab or workgroup meeting to go over results from your custom analysis request or introduce topics in genetics (GWAS, polygenic risk scores, etc.).
MGI Encore‡ (requires VPN connection)	An online tool that assists investigators with running genome-wide association studies using MGI genotype data and their own uploaded/selected phenotype data. Please contact PHDataHelp@umich.edu for information on obtaining phenotype data and instructions for accessing Encore.
MGI + BioVU LabWAS‡	Summary statistics from a meta-analysis of 70 EHR-derived quantitative laboratory measurements from the BioVU cohort from the Vanderbilt University Health System and the MGI cohort (Goldstein et al. PLoS Genetics 2020).
MGI PheWeb (Data Freeze 3)‡	Online database of genome-wide associations for EHR-derived ICD billing codes from European participants of the MGI Data Freeze 3. To request summary statistics from the MGI PheWeb, please email PHDataHelp@umich.edu.
MGI PheWeb (Data Freeze 6)‡	Online database of genome-wide associations for EHR-derived ICD billing codes from multi-ancestry participants of the MGI Data Freeze 6. To request summary statistics from the MGI PheWeb, please email PHDataHelp@umich.edu.

Table 2. Available genetic analysis resources. *Time limit typically 1 hour. Additional requests will be honored at our discretion and as time allows. †Contact PHDataHelp@umich.edu for further information. ‡For our self-serve tools, we are more than happy to provide one-on-one assistance if the provided documentation is not sufficient. We can offer a 30-minute tutorial per service.

How To Request MGI Data

To access these data, please send an inquiry to phdatahelp@umich.edu. To access these data, you will need to submit an IRB application through the IRBMED eResearch Regulatory Management system**. Requirements for data access, including links for required certifications and data attestations, are detailed on the Precision Health Analytics Platform documentation website.

For further assistance or to get access to the MGI data, please contact the Research Scientific Facilitators at phdatahelp@umich.edu

**An IRB application will be required for individual-level data access. All IRB applications should go through IRBMED and not HSBS.

Type of IRB application needed by investigators for clinical and/or genetic data:

Dataset Type	IRB application
Aggregate datasets	No IRB application required
“De-Identified” or “Limited” datasets (per HIPAA definition)	Require an IRB application. At a minimum, receive a “not-regulated” determination
Datasets with Protected Health Information (PHI) beyond the limited dataset level	Require IRB review and approval or exempt determination

For IRB applications, please reference MGI HUM00071298.
De-Identified data and genomic data requests on their own are pre-approved by the MGI committee and do not need a specific letter or commitment to submit to IRB. Biospecimen requests and re-contact of MGI patients will need Precision Health MGI Access Committee approval.
Contact the Data Office for Clinical & Translational Research (DOCTR) with any IRB-related questions: DataOffice@umich.edu.

Our Research