ChatGPT-4's clinical reasoning shines in certain scenarios: Study

Naomi Diaz - Tuesday, December 12th, 2023

Large language models may not always exhibit poor performance in clinical reasoning and, in specific restricted scenarios, could surpass the capabilities of clinicians, according to a Dec. 11 study published in JAMA Network Open.

Researchers from Boston-based Beth Israel Deaconess Medical Center conducted a study where they pitted ChatGPT-4 against clinicians. The researchers inputted a set of five example medical cases into the chatbot that had been given to clinicians in a previously published survey on probabilistic reasoning.The researchers then gave ChatGPT an identical prompt 100 times, soliciting the likelihood of a specific diagnosis based on the patient's presentation.

They also tasked the chatbot with updating its estimates in response to certain test results, such as mammography for breast cancer. The team then compared the probabilistic estimates with responses obtained from the survey, which encompassed more than 550 human practitioners.

The researchers found that in all five cases, ChatGPT-4 demonstrated superior accuracy compared to human clinicians when assessing pretest and post-test probability following a negative test result. The large language model did not perform as well after positive test results, however.

ChatGPT-4's clinical reasoning shines in certain scenarios: Study

Articles We Think You'll Like

Featured Whitepapers

Featured Webinars