Entrepreneur, CEO, And Co-Founder of Hippocratic AI

HEART benchmark assesses ability of LLMs and humans to offer emotional support

Researchers from Hippocratic AI, Stanford University, the University of California San Diego, and the University of Texas at Austin introduced a new benchmark called HEART to evaluate how well large language models (LLMs) and humans provide emotional support in multi-turn conversations. HEART stands for Human alignment, Empathetic responsiveness, Attunement, Resonance, and Task-following, and it is designed to assess whether responses feel supportive, natural, context-aware, and safe. Unlike earlier benchmarks that focused mainly on task completion, HEART evaluates emotional intelligence across extended back-and-forth dialogue. In blinded comparisons, researchers found that leading LLMs often matched or sometimes exceeded average human responses in perceived empathy, with about 80% agreement between human and model judges.

As part of the study, the team evaluated Polaris, an LLM developed by Hippocratic AI. Polaris performed strongly, generating responses that conveyed deep empathy and were comparable to other state-of-the-art systems, while operating at sub-second latency suitable for real-time applications. The researchers noted that humans still perform better in areas such as adaptive reframing and nuanced tone shifts during challenging exchanges. The HEART framework is intended to guide the development of more emotionally supportive AI systems, including healthcare-focused applications, and may expand in the future to assess multimodal and voice-based models while examining how users actually experience support over time.

Read the full article here.

more news