close
close

Association-anemone

Bite-sized brilliance in every update

Researchers say an AI-powered transcription tool used in hospitals is making up things no one has ever said
asane

Researchers say an AI-powered transcription tool used in hospitals is making up things no one has ever said

Associated Press

SAN FRANCISCO (AP) — Tech giant OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near “human-level robustness and accuracy.”

But Whisper has a major flaw: It’s prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the made-up texts — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imaginary medical treatments.

Experts said such inventions are problematic because Whisper is used in a host of industries around the world to translate and transcribe interviews, generate text in popular consumer technologies and create captions for videos.

More worrying, they said, is a rush of medical centers to use whisper-based tools to transcribe patients’ consultations with doctors, despite OpenAI’ warnings that the tool should not be used in “high risk areas”.

It’s difficult to perceive the full extent of the problem, but researchers and engineers said they frequently encountered Whisper’s hallucinations in their work. A University of Michigan One researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcripts he inspected before he began trying to improve the model.

One machine learning engineer said he initially found hallucinations in about half of the more than 100 hours of Whisper transcripts he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper.

Problems persist even in short, well-recorded audio samples. A recent study by computer scientists found 187 hallucinations in more than 13,000 clear audio fragments they examined.

This trend would lead to tens of thousands of faulty transcripts over millions of records, the researchers said.

Such mistakes could have “really serious consequences”, particularly in hospital settings, he said Alondra Nelsonwho until last year led the White House Office of Science and Technology Policy for the Biden administration.

“Nobody wants a misdiagnosis,” said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. “There should be a higher bar.”

Whispering is also used to create subtitles for the deaf and hard of hearing – a population at particular risk of faulty transcriptions. That’s because deaf and hard-of-hearing people have no way of identifying the artifacts are “hidden among all the other textures,” he said Christian Voglerwho is deaf and directs Gallaudet University’s Technology Access Program.

OpenAI asked to fix the problem

The prevalence of such hallucinations has prompted OpenAI experts, advocates and former employees to call on the federal government to consider AI regulation. At the very least, they said, OpenAI needs to address the flaw.

“This seems solvable if the company is willing to prioritize it,” said William Saunders, a research engineer in San Francisco who left OpenAI in February over concerns about the company’s direction. “It’s problematic if you put this out there and people are overconfident in what it can do and integrate it into all these other systems.”

A OpenAI The spokesperson said the company is continually studying how to reduce hallucinations and praised the researchers’ findings, adding that OpenAI incorporates feedback into model updates.

While most developers assume transcription tools misspell words or make other errors, engineers and researchers said they’ve never seen another AI-powered transcription tool hallucinate as much as Whisper.

whispering hallucinations

The tool is integrated into some versions of OpenAI’s flagship chatbot, ChatGPT, and is an embedded offering in Oracle and Microsoft cloud computing platforms that serve thousands of companies worldwide. It is also used to transcribe and translate text into multiple languages.

In the past month alone, a recent version of Whisper has been downloaded more than 4.2 million times from the open-source AI platform HuggingFace. Sanchit Gandhi, a machine learning engineer there, said Whisper is the most popular open-source speech recognition model and is integrated into everything from call centers to voice assistants.

Teachers Allison Koenecke from Cornell University and Mona Sloane from the University of Virginia examined thousands of short snippets they obtained from TalkBank, a research repository hosted at Carnegie Mellon University. They determined that nearly 40 percent of the hallucinations were harmful or worrisome because the speaker could be misinterpreted or misrepresented.

In one example they found, a speaker said, “He, the boy, was going to, I’m not sure exactly, take the umbrella.”

But the transcription software added: “He took a big piece of the cross, a small, small piece … I’m sure he didn’t have a terror knife, so he killed a lot of people.”

A speaker on another recording described “two other girls and a lady”. Whisper made up additional comments about race, adding “two other girls and a lady, um, who were black.”

In a third transcript, Whisper invented a non-existent drug called “hyperactive antibiotics”.

Researchers aren’t sure why Whisper and similar tools hallucinate, but software developers said the inventions tend to occur in the middle of pauses, background sounds or music playback.

OpenAI advised in its online disclosures against using Whisper in “decision-making contexts where flaws in accuracy can lead to pronounced flaws in outcomes.”

Transcription of doctor’s appointments

That warning hasn’t stopped hospitals or medical centers from using speech-to-text models, including Whisper, to transcribe what’s said during doctor visits to free up health care providers to spend less time taking notes or typing the reports.

More than 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children’s Hospital Los Angeles, have begun using a whisper-based tool built by Nablawhich has offices in France and the USA

This tool has been fine-tuned to medical language to transcribe and summarize patient interactions, said Nabla’s chief technology officer, Martin Raison.

Company officials said they are aware that Whisper can hallucinate and are mitigating the problem.

It’s impossible to compare Nabla’s AI-generated transcript to the original recording because Nabla’s tool deletes the original audio for “data security reasons,” Raison said.

Nabla said the tool has been used to transcribe about 7 million medical visits.

Saunders, the former OpenAI engineer, said the deletion of the original audio could be a concern if transcriptions aren’t double-checked or clinicians can’t access the recording to verify they’re correct.

“You can’t catch errors if you take the ground truth,” he said.

Nabla said no model is perfect, and theirs currently requires healthcare providers to quickly edit and approve transcribed notes, but that could change.

Privacy Concerns

Because patients’ meetings with their doctors are confidential, it’s hard to know how AI-generated transcripts affect them.

A California State Representative, Rebecca Bauer-Kahansaid she took one of her children to the doctor earlier this year and refused to sign a form provided by the health network that asked her permission to share the audio of the consultation with providers that included Microsoft Azure, the cloud computing system led by OpenAI’s largest investor. . Bauer-Kahan didn’t want such intimate medical conversations shared with tech companies, she said.

“The trigger was very specific that for-profit companies would have the right to have this,” said Bauer-Kahan, a Democrat who represents part of the San Francisco suburbs in the state Assembly. “I was like, ‘absolutely not.'”

John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws.

___

Schellmann reported from New York.

___

This story was produced in partnership with the Pulitzer Center’s AI Accountability Network, which also partially supported the Whisper academic study.

___

The Associated Press receives financial assistance from the Omidyar Network to support coverage of artificial intelligence and its impact on society. AP is solely responsible for all content. Find APs sTANDARDS for working with philanthropies, a list of supporters and coverage areas funded at AP.org.

___

The Associated Press and OpenAI have a license agreement and technology allowing OpenAI access to part of AP’s text archives.