Expanded Chest X-ray Repository marks importance of imaging data in precision health
Precision Health’s Chest X-Ray Repository now contains 750K images spanning three years, with more to come. Project co-leads Jessica Fried, MD, and Michael Sjoding, MD, explain the utility and applications of these data both in machine learning/AI and clinical decision making.
[Level-1 login required to access some of the links below.]
In January 2021, Precision Health announced substantial additions to its Analytics Platform, including a collection of chest X-rays from patients tested for COVID; by May, these images numbered more than 81,000.
Now Precision Health can give researchers access to data beyond the COVID environment: 750,000 chest X-ray images obtained between January 2019 and September 2021 are available to researchers across campus, with plans to include images from as far back as 2015.
Researchers interested in accessing these data need to complete certain requirements and can direct any questions to PHDataHelp@umich.edu.
The imaging project is led by Mike Sjoding, MD, an Assistant Professor of Internal Medicine-Pulmonary/Critical Care, and Jessica Fried, MD, a Clinical Assistant Professor of Radiology. Below, Sjoding and Fried answer some questions about the Chest X-ray Repository and what’s on the horizon for Precision Health imaging repositories.
What’s significant/unique about how these imaging data can enhance health research?MS: This size of this dataset. It will really allow researchers to use some of the most cutting-edge tools in their research, which are often quite data hungry.
JF: This initiative provides investigators with access to robust de-identified imaging data on a massive and deep scale. The magnitude of the collection will appeal to investigators working in the machine learning and AI space. The ability to marry imaging data with nuanced clinical information from the EHR will provide novel opportunities for data scientists and health services researchers to explore new territory in the pursuit of improving health care delivery and outcomes.
What are the limitations/concerns when making these data available, and how have you addressed these concerns?
JF: Curation of large, rich, de-identified imaging datasets is time consuming and requires specialized expertise. These often represent insurmountable barriers to many investigators who would otherwise jump at the opportunity to include imaging data in their work. We have removed those barriers and hope to see investigators across the U-M campus actively incorporate imaging data into their analytics.
Why were chest X-rays, in particular, chosen as the first imaging dataset for Precision Health?
MS: Chest X-rays are the most common imaging study performed at hospitals and clinics, and there is so much useful information contained in these images, so it’s really a great imaging modality to focus on for this project.
How does combining digital image data with clinical data improve treatment and diagnoses?
MS: Clinicians use both digital imaging data and clinical data every day for diagnosis and treatment decisions, as they contain complementary information. It’s exciting we can offer both types of data to researchers, rather than just one or the other, because this aligns more closely with how medicine is practiced.
JF: Imaging has become central and critical to the practice of medicine. Similar to laboratory studies, imaging directly impacts medical decision making. Unlocking the ability to incorporate imaging information with clinical data mined from the EHR is like adding a key piece of the jigsaw puzzle that has been missing. Doing so will allow investigators in this space to appreciate a clearer picture and tell a more complete story.
We are really only seeing the tip of the iceberg of how much sophisticated data analytics can impact health care delivery and patient outcomes. Beyond retrospective analyses, predictive and anticipatory medicine is the next frontier in which tools such as these will allow data scientists to make groundbreaking discoveries.
What other kinds of imaging data will Precision Health make available?
MS: We are working on brain MRI next. We are always looking for recommendations from researchers about what we should target after that.
JF: Our next repository will focus on brain MRI. There is already significant interest in applications of machine learning and artificial intelligence in the field that will benefit from large, de-identified datasets, and this is seen as a natural next step in growing the portfolio.
As these resources become available to U-M investigators, we hope to be guided by interest and need in the research community to broaden the imaging data library.