HIT Consultant December 9, 2025
What You Should Know:
– Sword Health has unveiled MindEval, the industry’s first benchmark designed to evaluate Large Language Models (LLMs) based on American Psychological Association (APA) guidelines and realistic, multi-turn conversations.
– The initial study of 12 leading models revealed significant deficiencies in clinical safety and effectiveness, particularly as conversations lengthened or symptoms became severe. By open-sourcing this tool, Sword Health aims to establish a universal standard for safety and clinical competence in the rapidly growing field of AI-assisted mental health support.
Sword Health’s Open-Source Benchmark Reveals Critical Flaws in Leading Models
We are living through a quiet crisis in digital health. While regulators and ethicists debate the future of AI, millions of users are already turning to...







