Stanford wields a 'FURM' test on AI

At Stanford Health Care, artificial intelligence is a key tool supporting the organization's mission to advance precision health and improve patient outcomes. However, as President and CEO David Entwistle and Chief Data Scientist Nigam Shah, MD, PhD, told Becker's, successfully implementing AI requires a rigorous, thoughtful approach — one that goes beyond chasing the next shiny object.

"There’s a lot of 'wow,'" Mr. Entwistle said. "Don't be excited by all the glitz and glamour.” 

Stanford Health Care's approach, he said, is to "really take a step back as a CEO and as an organization and say, 'What do we need to improve or work on?'"

The Palo Alto, Calif.-based health system's process for evaluating, implementing and monitoring new technologies, particularly AI, employs a fair, useful and reliable AI model, or FURM, assessment. This assessment consists of three stages that address key aspects of the use case: purpose and goals ("what and why"), implementation feasibility ("how"), and outcomes evaluation ("effect"). Each stage includes specific categories of questions designed to guide decision making.

"Some of them are detailed and about interviewing multiple stakeholders for their values in order to find ethical mismatches," Dr. Shah said. "Some of them are about who's going to do the downstream workflow given the AI tool's output. Others are about who pays for the tool, who's going to maintain it, and if the license renewal comes up, whose budget is going to pay for it. And so we've kind of figured out all of this upfront. They're essentially answering: 'Is it fair?,' 'Is it reliable?,' 'Is it useful?' and 'Can we monetarily sustain it long term, as well?'"

As part of the FURM assessment, Stanford conducts an ethical or stakeholder value mismatch analysis. Dr. Shah said this involves asking affected parties, including clinicians, the following questions: What do you hope this tool does? What are your biggest concerns? What's the worst that could happen?

"The answers from the different stakeholders are often quite enlightening in terms of what we need to check — what are people worried about," he said.

Still, not every situation requires a FURM assessment. To address this, Stanford created a process called the Responsible AI Life Cycle, which determines when an assessment is necessary. It is based on principles laid out in the Responsible AI for Safe and Equitable (RAISE) Health initiative.

"If we're talking about AI for transcription in our Zoom meeting, for example, I wouldn't need a FURM assessment. And so, it's to make sure we're doing these assessments on the right things," Dr. Shah explained.

After the full assessment process, a governance group, including members of the executive team, makes the final decision on whether the model is deployed.

One notable example of Stanford's successful AI integration is its advanced care planning model. Recognized as part of the Healthcare Information and Management Systems Society's Davies Award of Excellence, "the model helps inform clinicians of what is likely to happen next in the health of patients with serious illness, so they have a better sense of when to initiate advanced care planning conversations," said Mr. Entwistle. "These conversations help patients prepare for life transitions associated with a terminal illness and help health systems ensure that patients receive the right care for their individual circumstances."

Over the past two to three years, the model has led to improved care of about 6,700 patients, according to Dr. Shah.

In another example, Stanford is deploying Nuance Dragon Ambient eXperience, or DAX copilot, which automates clinical notetaking during patient visits. Deployment began in primary care clinics. Stanford plans to gradually expand it service line by service line and is exploring the technology's potential use in nursing.

Additionally this month, Stanford shared positive feedback from physicians related to a new in-house AI tool designed to assist them in sharing lab results. According to a news release, the tool leverages Anthropic's Claude 3.5 Sonnet LLM via Amazon Bedrock to draft interpretations of clinical test and lab results in plain language, which physicians review and approve before sending to patients.

Mr. Entwistle said these efforts align with Stanford's goal of enhancing workflow efficiency and have resulted in measurable time savings. For example, Stanford's AI tool for lab result interpretations has shown that 50% of AI-generated notes align with what physicians would have written.

"Think about the time savings that we are starting to achieve, and by looking at the different modalities in healthcare, what else can be done to create that time saving, so the patient and the physician really do have time to talk with each other," Mr. Entwistle said. "There are a lot of exciting areas in this, and I think that we just dipped our toe in the water, because there's so much more ahead."

He recommended that health system executives evaluate technologies "with a lens toward your organization. So, as you bring these pieces in, make sure it's your team in front with a model that you use to evaluate technology on a consistent basis."

Additionally, Mr. Entwistle recommended that executives keep in mind the types of models students moving into healthcare careers are being trained on. 

"If they're being trained on these tools, and then they come to your organization and the tools are not there, how much of this becomes table stakes, so to speak?" he said. "Like cell phones, some technology that we use that's pretty pervasive now, at some point that was new."

Copyright © 2025 Becker's Healthcare. All Rights Reserved. Privacy Policy. Cookie Policy. Linking and Reprinting Policy.


You can unsubscribe from these communications at any time. For more information, please review our Privacy Policy
.
 

Articles We Think You'll Like