Artificial intelligence tools trained to detect pneumonia on chest X-rays significantly decreased in performance when tested on data at multiple health systems, a study published in PLOS Medicine found.
The study findings suggest AI in medicine must be carefully tested for performance among various populations, or the deep learning models might not perform as accurately as anticipated.
The study, conducted at New York City-based Icahn School of Medicine at Mount Sinai, evaluated how AI models identified pneumonia in 158,000 chest X-rays in three medical institutions. The researchers looked at computer system frameworks called convolutional neural networks, which analyze medical imaging and give a computer-aided diagnosis.
In three out of five comparisons between medical institutions, the convolutional neural networks' performance in diagnosing diseases on X-rays from hospitals outside its own network was significantly lower than on X-rays from the original health system.
However, the study found convolutional neural networks were highly accurate at detecting the hospital system where an X-ray was acquired. The challenge of using deep learning models in medicine is that they use a huge number of parameters, making it difficult to pinpoint the variables that drive predictions, such as the types of CT scanners used at a hospital and the resolution quality of imaging, the researchers said.
"Our findings should give pause to those considering rapid deployment of artificial intelligence platforms without rigorously assessing their performance in real-world clinical settings reflective of where they are being deployed," said senior author Eric Oermann, MD.
More articles on clinical leadership and infection control:
Investing $2 per person in antibiotic stewardship annually could thwart superbugs
Machine learning model helps inform sepsis treatment in ER
Texas researchers to study whether malaria infection protects against Ebola