Data Analytics & IT

Analytics Platform

The Precision Health Analytics Platform is a suite of tools, services, and datasets available to researchers across campus (view the complete Analytics Platform User Guide [pdf]). Resources include:


DataDirect ( is a self-serve software tool enabling researchers to access and explore clinical data from the Michigan Genomics Initiative cohort and the electronic health records (EHR) of more than 4 million unique patients (see “DataDirect Modes,” below, for details).

DataDirect is managed by Michigan Medicine’s Data Office for Clinical and Translational Research (DOCTR), which oversees access to several institutionally supported tools and also provides customized datasets in consultation with researchers. The Data Office administers a secure and compliant process for researchers requiring Michigan Medicine data.

Linked Data

The Precision Health Analytics Platform, using Michigan Medicine Data Office tools and resources, provides access to genetic and clinical data on approximately 60K patients. This includes the ability to link clinical phenotype data to genotype data and facilitation of GWAS analysis.

Researchers can access their data in a secure, virtual, high-compute Linux- or Windows-based environment.


The Armis2 high-performance computing (HPC) environment is composed of task-managing administrative nodes and standard Linux-based two- and four-socket server class hardware in a secure data center, connected by both a high-speed ethernet (1 Gbps) and InfiniBand network (40Gbps), and a secure parallel file system for temporary data, provided by HIPAA-aligned Turbo Research Storage. Armis2 is currently provided as a pilot service. The two-socket nodes have up to 24 cores and 156 GB of memory.  There are also eight K20x GPUs currently on the cluster, but others can be moved on request.

If you are a new user of Armis2, you will need to create an account by submitting an application form (this form is accessible via the Armis2 User Guide homepage). On the form, please specify a) the PH-based need for an Armis2 account, and b) the HUM#(s) associated with your data request(s) on DataDirect (without this information, ARC-TS won’t be able to create an Armis2 account). Please allow one business day for your application to be processed. If you already have an Armis2 account, you will need to send an email to specifying a) the PH-based need to use your Armis2 account, and b) the HUM#(s) associated with your data request(s) on DataDirect.


The Yottabyte Research Cloud (YBRC) is a private cloud environment that provides high-performance, secure, and flexible computing environments enabling the analysis of sensitive datasets restricted by federal privacy laws, proprietary access agreements, or confidentiality requirements. The system is built on Yottabyte’s composable, software-defined infrastructure platform and represents U-M’s first use of software-defined infrastructure for research, allowing on-the-fly personalized configuration of any-scale computing resources. This platform allows us to create any combination of network, CPU, RAM, and storage components into resource groups that can be used to build multi-tenant, multi-site infrastructure as a service.

Please use these guides for accessing your data in Turbo. For questions about Armis or YBRC, please email

Research Scientific Facilitators

Precision Health Research Scientific Facilitators are on hand to guide investigators across campus through processes that allow them to assemble datasets in a virtual, HIPAA-compliant server environment. Facilitators help researchers navigate self-serve tools such as DataDirect and EMERSE, find other ways of pulling clinical data (through DOCTR), submit biospecimen inquiries, assemble subject survey data, and more. Facilitators also strive to identify and integrate additional data lakes for centralized use.

Contact the Facilitators at

DataDirect Modes

Researchers may use DataDirect to generate aggregate counts for cohort study (“Cohort Discovery Mode”) or to analyze de-identified patient health data (“De-Identified mode”).

Cohort Discovery Mode

The simplest DataDirect mode provides aggregate counts for cohort discovery (i.e., assembling a group of individuals with parameters of interest). This mode of the tool allows researchers to explore whether the data contained in DataDirect are sufficient to support their research.

Prerequisites for accessing DataDirect Cohort Discovery Mode are:


De-Identified Mode

De-ID Download Mode provides researchers the ability to analyze de-identified patient health data. Resulting datasets will be loaded onto a HIPAA-compliant, secure virtual machine managed by Advanced Research Computing (ARC).

Prerequisites for accessing DataDirect De-ID Download Mode are:

Type of IRB approvals needed by investigators for clinical and/or genetic data:

  • Aggregate datasets: No IRB application required.
  • De-Identified datasets: Will need IRB application. At a minimum receive a “not-regulated” status.
  • Datasets with protected health information (PHI): Will require a full IRB review and approval.

For IRB applications, please reference MGI HUM00071298.

De-Identified data and genomic data requests on their own are pre-approved by the Michigan Genomics Initiative (MGI) committee, and do not need a specific letter or commitment to submit to IRB. Biospecimen requests and re-contact of MGI patients will need MGI committee approvals.

Contact DOCTR with any IRB-related questions: