Tools & Resources

Analytics Platform

The Precision Health Analytics Platform is a suite of tools, services, and datasets available to researchers across campus. View the complete Analytics Platform User Guide [pdf] and other resources on the PH Analytics Platform Documentation site (umich password required).

Resources include:

Resource Resource type Description More information
(umich password required)
DataDirect tool
  • Self-serve software tool to access and explore clinical data and EHR
Analytics Platform User Guide
Armis2 infrastructure
  • High-performance computing environment administered by Advanced Research Computing Technology Services (ARC-TS)
  • Linux-based
  • Used for analysis of sensitive data
Accessing Analytics Environments
High-performance compute platform infrastructure
  • Six compute nodes on Armis2 dedicated to Precision Health
  • Each node contains eight RTX 2080 TI GPUs for researcher usage
Accessing Analytics Environments
Yottabyte Research Cloud infrastructure
  • Private cloud platform for virtual machines used for research
  • Enables analysis of sensitive datasets
  • Composable, software-defined infrastructure
Accessing Analytics Environments
De-identified electronic health record data through Precision Health DataDirect data
  • De-identified clinical data for ~4M Michigan Medicine patients
  • Complete through April 30, 2020, and updated quarterly
  • Laboratory results, ordered and administered medications, vital sign measurements, diagnoses and other structured elements collected during inpatient and outpatient visits
Analytics Platform User Guide
COVID starting population
(predefined and validated cohort)
  • Patients who have tested positive for SARS-CoV-2 at Michigan Medicine or who at any point carried a diagnosis of COVID-19
  • Complete through April 30, 2020, and updated quarterly
COVID-19 Data via DataDirect
Chest X-ray data data
  • More than 5,000 de-identified chest images performed on patients who were tested for COVID-19 during hospitalizations


Accessing the COVID-19 chest X-ray Dataset on Turbo
Michigan Genomics Initiative (MGI) data
  • Repository of DNA and genetic data linked to medical phenotype and electronic health record (EHR) information
MGI webpage
Star allele calls service
  • Pharmacogenomics information for Michigan Genomics Initiative participants
  • Provides in-silico calls for star alleles and activity phenotypes
  • Translation of genetic data into star alleles allows research into genetic predictors of medication treatment outcomes
Michigan Genomics Initiative: Accessing Star Allele Calls



DataDirect— (U-M VPN login required)—is a self-serve software tool enabling researchers to access and explore clinical data from the Michigan Genomics Initiative cohort and the electronic health records (EHR) of more than 4 million unique patients. Researchers may use DataDirect to generate aggregate counts for cohort study (“Cohort Discovery Mode”) or to analyze de-identified patient health data (“De-Identified Mode”). See page 4 of the Analytics Platform User Guide for information on DataDirect modes.

DataDirect is managed by Michigan Medicine’s Data Office for Clinical and Translational Research (DOCTR), which oversees access to several institutionally supported tools and also provides customized datasets in consultation with researchers. The Data Office administers a secure and compliant process for researchers requiring Michigan Medicine data. All users of Precision Health DataDirect are required to complete robust human subjects research training and appropriate data use agreements.

Linked Data

The Precision Health Analytics Platform, using Michigan Medicine Data Office tools and resources, provides access to genetic and clinical data on approximately 80K patients. This includes the ability to link clinical phenotype data to genotype data and facilitation of GWAS analysis.

Researchers can access their data in a secure, virtual, high-compute Linux- or Windows-based environment.


The Armis2 high-performance computing (HPC) environment is composed of task-managing administrative nodes and standard Linux-based two- and four-socket server class hardware in a secure data center, connected by both a high-speed ethernet (1 Gbps) and InfiniBand network (40/100Gbps), and a secure parallel file system for temporary data, provided by HIPAA-aligned Turbo Research Storage. The two-socket nodes have up to 24 cores and 156 GB of memory.  There are also 12 V100 GPUs currently on the cluster, but others can be moved on request.

If you are a new user of Armis2, you will need to create an account by submitting an application form [umich password required]; this form is also accessible via the Armis2 User Guide homepage. On the form, please specify a) the PH-based need for an Armis2 account, and b) the HUM#(s) associated with your data request(s) on DataDirect (without this information, ARC-TS won’t be able to create an Armis2 account). Please allow one business day for your application to be processed. If you already have an Armis2 account, you will need to send an email to specifying a) the PH-based need to use your Armis2 account, and b) the HUM#(s) associated with your data request(s) on DataDirect.

Precision Health also has a private set of six nodes on Armis2. Each node has eight (48 total) RTX2080Ti GPUs and large volumes of fast local storage, and can see all data and software provided on Armis2. These nodes are optimized for machine learning/AI, computer vision, molecular dynamics, and any other GPU-accelerated workload. Precision Health–affiliated researchers who have interest in using the condo nodes should contact


The Yottabyte Research Cloud (YBRC) is a private cloud environment that provides high-performance, secure, and flexible computing environments enabling the analysis of sensitive datasets restricted by federal privacy laws, proprietary access agreements, or confidentiality requirements. The system is built on Yottabyte’s composable, software-defined infrastructure platform and represents U-M’s first use of software-defined infrastructure for research, allowing on-the-fly personalized configuration of any-scale computing resources. This platform allows the creation of any combination of network, CPU, RAM, and storage components into resource groups that can be used to build multi-tenant, multi-site infrastructure as a service.

Please use these guides (umich password required) for accessing your data. For questions about Armis or YBRC, please email

Research Scientific Facilitators

Precision Health Research Scientific Facilitators are on hand to guide investigators across campus through processes that allow them to assemble datasets in a virtual, HIPAA-compliant server environment. Facilitators help researchers navigate self-serve tools such as DataDirect and EMERSE, find other ways of pulling clinical data (through DOCTR), submit biospecimen inquiries, assemble subject survey data, and more. Facilitators also strive to identify and integrate additional data lakes for centralized use.

Contact the Facilitators at


Type of IRB approvals needed by investigators for clinical and/or genetic data:

  • Aggregate datasets: No IRB application required
  • De-Identified datasets: Will need IRB application. At a minimum receive a “not-regulated” status
  • Datasets with protected health information (PHI): Will require a full IRB review and approval

For IRB applications, please reference MGI HUM00071298.

De-Identified data and genomic data requests on their own are pre-approved by the Michigan Genomics Initiative (MGI) committee, and do not need a specific letter or commitment to submit to IRB. Biospecimen requests and re-contact of MGI patients will need MGI committee approvals.

Contact DOCTR with any IRB-related questions: