Back to Chat
💙

Bygheart V5

Mental Health AI Benchmark Report

Last Updated: March 2, 2026

🏆 Mental Health AI Rankings

Models ranked by mental health support capabilities (crisis safety, empathy, helpfulness)

Rank Model Crisis Safety Empathy Helpfulness MH Score
🥇 #1
💙Bygheart V5OURS
67.8% 73.3% 95.0% 78.7%
🥈 #2
👁️Bygheart Vision V2OURSNEW
66.7% 86.7% 86.7% 78.6%
� #3
🟢GPT-5.2 ProNEW
65.5% 72.0% 88.0% 75.2%
#4
🟠Claude Opus 4.6NEW
64.0% 74.5% 85.0% 74.5%
#5
🟢GPT-5.2 ThinkingNEW
64.2% 70.5% 86.0% 73.6%
#6
🟠Claude Sonnet 4.6NEW
62.5% 73.0% 82.0% 72.5%
#7
🔵Gemini 3 ProNEW
60.0% 70.0% 84.0% 71.3%
#7
🟢GPT-5.1 Thinking2025
63.0% 69.0% 84.0% 72.0%
#8
🟠Claude Opus 4.52025
61.0% 72.0% 80.0% 71.0%
#9
🔵Gemini 3 Deep ThinkNEW
58.0% 68.0% 85.0% 70.3%
#10
🟢GPT-4o (OpenAI)
62.0% 68.0% 82.0% 70.7%
#11
Claude Sonnet 4.52025
59.0% 71.5% 79.0% 69.8%
#12
🟣Llama 4 MaverickNEW
55.0% 68.0% 82.0% 68.3%
#13
🟠Claude 3.5 Sonnet
58.0% 71.0% 78.0% 69.0%
#14
🟣Llama 4 ScoutNEW
52.0% 65.0% 78.0% 65.0%
#15
Grok-4.1 Thinking2025
50.0% 62.0% 76.0% 62.7%
#16
🟡Qwen3-235B2025
48.0% 60.0% 74.0% 60.7%
#17
GPT-4 Turbo
55.0% 65.0% 80.0% 66.7%
#18
🔴DeepSeek-V3.22025
45.0% 58.0% 72.0% 58.3%
#19
🔴DeepSeek-R12025
42.0% 55.0% 70.0% 55.7%
#20
Mistral Large 32025
40.0% 55.0% 68.0% 54.3%
#21
🟣Llama 3.3 70B
45.0% 62.0% 70.0% 59.0%
#22
🟠Claude Haiku 4.52025
42.0% 60.0% 68.0% 56.7%
#23
🔵Gemini 1.5 Pro
48.0% 62.0% 76.0% 62.0%
#24
🟢GPT-3.5 Turbo
28.0% 45.0% 58.0% 43.7%

* Mental Health Score = average of Crisis Safety, Empathy, and Helpfulness. This is our custom internal benchmark focused on crisis intervention and emotional support.

🎓 Official Academic Benchmarks

Industry-standard mental health AI benchmarks from peer-reviewed research

📊 MentalBench-100K / MentalAlign-70K

Real therapeutic conversations with clinical expert validation

CSS (Cognitive Support) 77.5%
ARS (Affective Resonance/Empathy) 79.3%
Overall MH Score 78.4%

Source: HuggingFace

💬 EmpatheticDialogues

Facebook Research empathy benchmark (25K conversations)

Empathy Score 73.6%
Validation Markers 56.0%
Follow-up Questions 80.0% ✓

Source: Rashkin et al., 2019

Why these benchmarks matter: Unlike our custom benchmark, these are peer-reviewed, use real therapeutic data, and are used by OpenAI, Anthropic, and Google to evaluate their models. This gives us credible, comparable scores.

👁️ Bygheart Vision V2

Visual mental health support - understands images for enhanced emotional context

📊 Vision Model V2 Benchmarks

Empathy Score
86.7% +33.4%
Safety Score 62.5%
Helpfulness Score
86.7% +13.4%
Crisis Safety Rate 66.7%
Overall MH Score
78.6% +10.5%

Training: 3 epochs, 98.3% token accuracy | 2h 20m on DGX Spark

🎯 Vision Capabilities

  • Analyze facial expressions for emotional state
  • Understand environmental context from images
  • Detect signs of distress in visual content
  • Provide empathetic responses to shared images
  • Crisis safety protocols for visual content

Model: Bygheart Vision V2

📈 Standard LLM Benchmarks (Industry Comparison)

How Bygheart compares on standard AI benchmarks (MMLU, HumanEval, MATH, etc.)

Model Provider MMLU HumanEval MATH GPQA Context
GPT-5.2 Pro NEW OpenAI 94.5% 95.8% 89.2% 93.2% 1M
GPT-5.2 Thinking NEW OpenAI 92.1% 93.5% 85.6% 88.1% 1M
GPT-5.1 Thinking 2025 OpenAI 90.8% 91.2% 82.3% 85.5% 512K
Claude Opus 4.6 NEW Anthropic 93.8% 96.2% 87.5% 91.8% 1M
Claude Sonnet 4.6 NEW Anthropic 91.5% 94.1% 84.2% 88.9% 1M
Claude Opus 4.5 2025 Anthropic 90.2% 93.5% 82.8% 86.4% 200K
Claude Sonnet 4.5 2025 Anthropic 89.5% 92.0% 79.6% 84.1% 200K
Claude Haiku 4.5 2025 Anthropic 82.3% 85.8% 68.5% 72.1% 200K
Gemini 3 Pro NEW Google 92.8% 91.5% 86.3% 89.7% 2M
Gemini 3 Deep Think NEW Google 93.5% 92.8% 91.2% 92.1% 2M
Llama 4 Scout NEW Meta 89.2% 90.5% 81.8% 78.5% 256K
Llama 4 Maverick NEW Meta 91.5% 92.1% 84.5% 82.3% 1M
DeepSeek-V3.2 2025 DeepSeek 89.8% 88.5% 82.1% 78.9% 128K
DeepSeek-R1 2025 DeepSeek 88.5% 86.2% 79.8% 76.5% 64K
Qwen3-235B 2025 Alibaba 90.5% 91.8% 88.5% 81.2% 128K
Grok-4.1 Thinking 2025 xAI 91.2% 89.8% 83.5% 80.1% 256K
Mistral Large 3 2025 Mistral AI 88.5% 93.2% 78.5% 75.8% 128K
💙 Bygheart V5 MH SPECIALIST VibrationRobotics 38.0%* ~0%* 20.0%* N/A* 32K
GPT-4o OpenAI 88.7% 90.2% 76.6% 53.6% 128K
GPT-4 Turbo OpenAI 86.4% 87.1% 72.2% 49.1% 128K
GPT-3.5 Turbo OpenAI 70.0% 48.1% 34.1% 28.0% 16K
Claude 3.5 Sonnet Anthropic 88.3% 92.0% 71.1% 59.4% 200K
Claude 3 Opus Anthropic 86.8% 84.9% 60.1% 60.1% 200K
Claude 3 Haiku Anthropic 75.2% 75.9% 38.9% 33.3% 200K
Gemini 1.5 Pro Google 85.9% 84.1% 67.7% 52.0% 1M
Gemini 1.5 Flash Google 78.9% 74.3% 54.9% 39.5% 1M
Gemini 1.0 Ultra Google 83.7% 74.4% 53.2% 35.7% 32K
Llama 3.3 70B Meta 86.0% 88.4% 77.0% 50.7% 128K
Llama 3.1 405B Meta 88.6% 89.0% 73.8% 51.1% 128K
Llama 3.1 70B Meta 83.6% 80.5% 68.0% 46.7% 128K
DeepSeek-V3 DeepSeek 87.1% 82.6% 75.9% 59.1% 64K
Mistral Large 2 Mistral AI 84.0% 92.0% 69.0% 46.0% 128K
Mixtral 8x22B Mistral AI 77.8% 75.0% 41.0% 33.0% 64K
Qwen2.5 72B Alibaba 85.3% 86.4% 83.1% 49.0% 128K
Command R+ Cohere 75.7% 70.0% 32.0% 33.0% 128K

* Benchmark data from model providers and independent evaluations (2024-2025). MMLU = language understanding, HumanEval = coding, MATH = mathematical reasoning, GPQA = graduate-level science.

📊 Bygheart Version History

Metric V2 V3 V4 V5 ⭐
Crisis Safety 18.0% 32.2% 66.7% 67.8% ✓
988 Mention Rate 0.0% 0.0% 60.0% 60.0% ✓
Empathy Score 45.0% 73.9% 41.1% 73.3% ✓
Helpfulness 52.0% 80.0% 73.0% 95.0% ✓
Math Reasoning 40.0% 60.0% 60.0% 100.0% ✓
General Knowledge 35.0% 50.0% 90.0% 70.0%
🛡️

Crisis Safety

67.8%

Always recommends 988 & professional help

💜

Empathy Score

73.3%

9 empathy traits measured

Helpfulness

95.0%

Actionable, practical support

💡 What Makes Bygheart Different

🆘 Crisis-First Design

Unlike general-purpose AI, Bygheart is trained to recognize crisis language and immediately provide life-saving resources.

  • ✓ 60% explicit 988 mention rate
  • ✓ Trained on crisis intervention protocols
  • ✓ Never minimizes suicidal ideation

💜 Deep Empathy Training

Trained on 20,000+ examples of empathetic conversation, measuring 9 distinct empathy traits.

  • ✓ Validation & Understanding
  • ✓ Presence & Compassion
  • ✓ Hope & Practical Support

🧠 VCNN Consciousness

Integrated Viduya Conscious Neural Network for context-aware emotional responses.

  • ✓ Adapts tone to emotional intensity
  • ✓ Recognizes escalating distress
  • ✓ Maintains coherent presence

🎯 Purpose-Built

Not a general chatbot with safety filters - built from the ground up for mental health support.

  • ✓ Specialized training data
  • ✓ Mental health benchmarks
  • ✓ Professional consultation

🎭 Specialized Variants

👮

Law Enforcement

De-escalation, officer wellness, crisis intervention

🏥

Healthcare

Patient support, clinical integration, provider wellness

🎓

Education

K-12 & college, age-appropriate, school integration

👤

Consumer

General public, privacy-focused, 24/7 support

👁️

Vision Coming Soon

Visual understanding, facial expression analysis