New trials compare Claude, ChatGPT, and DeepSeek on mammography reports
While AI is excelling at administrative tasks it still struggles to beat the human eye in complex cancer diagnosis. A comparative study pitted three major language models—ChatGPT-4o, Claude 3 Opus, and DeepSeek-R1—against human radiologists in analyzing mammography reports for breast cancer risks (BI-RADS 4). The results were clear: human radiologists significantly outperformed all three AI models. While the AI tools demonstrated high sensitivity meaning they were good at flagging potential issues they suffered from low specificity leading to a high volume of false alarms. The study concludes that for now these tools are best used as "safety net" assistants to ensure nothing is missed rather than as independent diagnostic agents.
Read the original article at: https://medinform.jmir.org/2025/1/e80182
Follow us on Instagram, Twitter, and Facebook to stay up to date with what's new in healthcare all around the world.
Comments
Post a Comment