Are large language models useful for health systems? Stanford experts weigh in

Hospitals and health systems should hold off on their investments in large language models as there is currently no way to evaluate whether these tools will be useful, fair and reliable, according to a Feb. 27 Stanford University article. 

Stanford scholars reviewed 80 different clinical foundation models, built from a variety of healthcare datasets, and found the following limitations:

  1. General-purpose language models such as ChatGPT can be useful for clinical tasks, but these tools underperform when it comes to medical-specific clinical language models.

  2. Clinical language models are often trained on large-scale private EHR datasets that provide the best performances.
     
  3. Foundation models for EHRs lack a common mechanism for distributing models to the broader research community.

  4. Many foundation models for EHRs have their model weights published, meaning researchers will have to re-train these models from scratch to use them or to validate their performance.

  5. The researchers said that there needs to be better evaluation paradigms, datasets and shared tasks in order to determine the value of clinical foundation models to health systems.

Copyright © 2024 Becker's Healthcare. All Rights Reserved. Privacy Policy. Cookie Policy. Linking and Reprinting Policy.

 

Featured Whitepapers

Featured Webinars