close
close

Association-anemone

Bite-sized brilliance in every update

Health systems join forces to publicly test and rank leading AI models
asane

Health systems join forces to publicly test and rank leading AI models

This sound is generated automatically. Please let us know if you have any feedback.

Since the launch of ChatGPT in 2022, tech companies have been scrambling to bring AI-powered generative tools to the healthcare market. However, suppliers faced a dilemma about what – and if – to buy.

That Google, Amazon, Microsoft and OpenAI are rapidly expanding their suite of AI offerings, vendors say they don’t know how to compare the effectiveness of products or determine which tool might best meet their specific needs.

A group of health systems, led by Boston-based Major General Brigham, hopes to solve this problem.

On Wednesday, the academic medical center launched Healthcare AI Challenge Collaborativewhich will allow participating clinicians to test the latest AI offerings in simulated clinical settings. Clinicians will pit models against each other in head-to-head competition and produce public rankings of commercial instruments by the end of the year.

Participating health systems say the chance to directly compare AI products is long overdue.

Despite the rapid proliferation of AI in healthcare, the industry has been slow to agree on how to assess quality. Industry groups have struggled to release assessment frameworks, however, guidelines remains in draft form.

Without standardized evaluation metrics, it’s difficult to compare even the most similar tools, he said Richard Bruce, associate professor of radiology and vice chair of informatics at the University of Wisconsin, School of Medicine and Public Health.

“Are there (common) values ​​that directly compare them? Currently, to my knowledge, other than user surveys and anecdotes, the tools are not directly compared against each other,” he said. “There’s no easy way to get an apples-to-apples comparison.”

So far, Emory Healthcare, the radiology departments at the University of Wisconsin School of Medicine and Public Health and the University of Washington School of Medicine and the industry group the American College of Radiology are participating in the collaboration. MGB said it plans to expand the program.

Health systems will initially test nine models, according to an MGB spokesperson, including products from Microsoft, Google, Amazon Web Services, OpenAI and Harrison.AI.

Clinicians will evaluate the models on generating a draft report, key findings, differential diagnosis and other factors, according to the MGB.

Metrics for evaluating models “evolve,” Bruce said, and may depend on the clinical use case of the tool. For example, while model accuracy will always be heavily weighted, there are some situations, such as when the model is used to produce a text report, where readability might be more important.

“Some of this is going to have a very subjective quality,” Bruce said. “For example, do I think the style in which this text is presented is easier to read or more accessible to patients?”

Ultimately, health systems will create an appropriate “ranking” of tools Dushyant Sahani, professor and chairman of the department of radiology at the University of Washington.

STANDINGS it will be used both to provide feedback to technology companies and to help health systems buy technology, according to MGB.

Health systems that don’t directly participate in the challenge may be able to use the rankings to decide which tools to buy, according to Sahani — which he sees as a win for health equity.

In the race to deploy AI, experts have expressed concern that vendors with fewer resources, who may not have time to research new tools, could be left behind.

“Health systems can use transparent rankings to inform decision-making and set benchmarks,” said Sahani. “The consortium’s insights and best practices can be adopted by non-participating health systems.”

Google and Microsoft declined to comment for this article.