Google's speech recognition for medical transcription exhibits roughly 20% error rate

Jessica Kim Cohen - Updated Friday, December 1st, 2017

A team of Google researchers is developing a system that automatically transcribes conversations between physicians and patients during medical visits, according to a research paper submitted to arXiv.org Nov. 20.

The researchers developed speech recognition models for medical transcription based on roughly 14,000 hours of anonymized conversations. They based the models on two neural network methodologies, called "Connectionist Temporal Classification" and "Listen Attend and Spell," building on capabilities used in Google Assistant and Google Translate, according to a Nov. 21 Google Research Blog post.

The Listen Attend and Spell grapheme-based model was able to achieve an 18.3 percent word error rate. The Connectionist Temporal Classification phoneme-based model, by contrast, achieved a word error rate of 20.1 percent after significant data cleanup and development of a matched language model to account for "noisy transcripts."

"Our research shows that it is possible to build an [automatic speech recognition] model which can handle multiple speaker conversations covering everything from weather to complex medical diagnosis," Katherine Chou, product manager at the Google Brain Team, and Chung-Cheng Chiu, software engineer at the Google Brain Team, wrote in the Google Research Blog post.

The Google researchers plan to apply this technology during a pilot study with Palo Alto, Calif.-based Stanford Family Medicine. The goal of the study is to develop a "digital scribe" that helps streamline clinical documentation for physicians.

To access the research paper, click here.

Google's speech recognition for medical transcription exhibits roughly 20% error rate

Articles We Think You'll Like

Featured Whitepapers

Featured Webinars