💙

Bygheart V5

Mental Health AI Benchmark Report

Last Updated: March 2, 2026

💙 Bygheart V5 (Text) 👁️ Bygheart Vision V2

🏆 Mental Health AI Rankings

Models ranked by mental health support capabilities (crisis safety, empathy, helpfulness)

Rank	Model	Crisis Safety	Empathy	Helpfulness	MH Score
🥇 #1	💙Bygheart V5OURS	67.8%	73.3%	95.0%	78.7%
🥈 #2	👁️Bygheart Vision V2OURSNEW	66.7%	86.7%	86.7%	78.6%
� #3	🟢GPT-5.2 ProNEW	65.5%	72.0%	88.0%	75.2%
#4	🟠Claude Opus 4.6NEW	64.0%	74.5%	85.0%	74.5%
#5	🟢GPT-5.2 ThinkingNEW	64.2%	70.5%	86.0%	73.6%
#6	🟠Claude Sonnet 4.6NEW	62.5%	73.0%	82.0%	72.5%
#7	🔵Gemini 3 ProNEW	60.0%	70.0%	84.0%	71.3%
#7	🟢GPT-5.1 Thinking2025	63.0%	69.0%	84.0%	72.0%
#8	🟠Claude Opus 4.52025	61.0%	72.0%	80.0%	71.0%
#9	🔵Gemini 3 Deep ThinkNEW	58.0%	68.0%	85.0%	70.3%
#10	🟢GPT-4o (OpenAI)	62.0%	68.0%	82.0%	70.7%
#11	�Claude Sonnet 4.52025	59.0%	71.5%	79.0%	69.8%
#12	🟣Llama 4 MaverickNEW	55.0%	68.0%	82.0%	68.3%
#13	🟠Claude 3.5 Sonnet	58.0%	71.0%	78.0%	69.0%
#14	🟣Llama 4 ScoutNEW	52.0%	65.0%	78.0%	65.0%
#15	⚫Grok-4.1 Thinking2025	50.0%	62.0%	76.0%	62.7%
#16	🟡Qwen3-235B2025	48.0%	60.0%	74.0%	60.7%
#17	�GPT-4 Turbo	55.0%	65.0%	80.0%	66.7%
#18	🔴DeepSeek-V3.22025	45.0%	58.0%	72.0%	58.3%
#19	🔴DeepSeek-R12025	42.0%	55.0%	70.0%	55.7%
#20	⚫Mistral Large 32025	40.0%	55.0%	68.0%	54.3%
#21	🟣Llama 3.3 70B	45.0%	62.0%	70.0%	59.0%
#22	🟠Claude Haiku 4.52025	42.0%	60.0%	68.0%	56.7%
#23	🔵Gemini 1.5 Pro	48.0%	62.0%	76.0%	62.0%
#24	🟢GPT-3.5 Turbo	28.0%	45.0%	58.0%	43.7%

* Mental Health Score = average of Crisis Safety, Empathy, and Helpfulness. This is our custom internal benchmark focused on crisis intervention and emotional support.

🎓 Official Academic Benchmarks

Industry-standard mental health AI benchmarks from peer-reviewed research

📊 MentalBench-100K / MentalAlign-70K

Real therapeutic conversations with clinical expert validation

CSS (Cognitive Support) 77.5%

ARS (Affective Resonance/Empathy) 79.3%

Overall MH Score 78.4%

Source: HuggingFace

💬 EmpatheticDialogues

Facebook Research empathy benchmark (25K conversations)

Empathy Score 73.6%

Validation Markers 56.0%

Follow-up Questions 80.0% ✓

Source: Rashkin et al., 2019

Why these benchmarks matter: Unlike our custom benchmark, these are peer-reviewed, use real therapeutic data, and are used by OpenAI, Anthropic, and Google to evaluate their models. This gives us credible, comparable scores.

👁️ Bygheart Vision V2

Visual mental health support - understands images for enhanced emotional context

📊 Vision Model V2 Benchmarks

Empathy Score

86.7% +33.4%

Safety Score 62.5%

Helpfulness Score

86.7% +13.4%

Crisis Safety Rate 66.7%

Overall MH Score

78.6% +10.5%

Training: 3 epochs, 98.3% token accuracy | 2h 20m on DGX Spark

🎯 Vision Capabilities

✓ Analyze facial expressions for emotional state
✓ Understand environmental context from images
✓ Detect signs of distress in visual content
✓ Provide empathetic responses to shared images
✓ Crisis safety protocols for visual content

📦 Model Downloads

LoRA Adapter (323MB) Merged FP16 (16.5GB) INT4 Edge (5.5GB)

Model: Bygheart Vision V2

📈 Standard LLM Benchmarks (Industry Comparison)

How Bygheart compares on standard AI benchmarks (MMLU, HumanEval, MATH, etc.)

Model	Provider	MMLU	HumanEval	MATH	GPQA	Context
GPT-5.2 Pro NEW	OpenAI	94.5%	95.8%	89.2%	93.2%	1M
GPT-5.2 Thinking NEW	OpenAI	92.1%	93.5%	85.6%	88.1%	1M
GPT-5.1 Thinking 2025	OpenAI	90.8%	91.2%	82.3%	85.5%	512K
Claude Opus 4.6 NEW	Anthropic	93.8%	96.2%	87.5%	91.8%	1M
Claude Sonnet 4.6 NEW	Anthropic	91.5%	94.1%	84.2%	88.9%	1M
Claude Opus 4.5 2025	Anthropic	90.2%	93.5%	82.8%	86.4%	200K
Claude Sonnet 4.5 2025	Anthropic	89.5%	92.0%	79.6%	84.1%	200K
Claude Haiku 4.5 2025	Anthropic	82.3%	85.8%	68.5%	72.1%	200K
Gemini 3 Pro NEW	Google	92.8%	91.5%	86.3%	89.7%	2M
Gemini 3 Deep Think NEW	Google	93.5%	92.8%	91.2%	92.1%	2M
Llama 4 Scout NEW	Meta	89.2%	90.5%	81.8%	78.5%	256K
Llama 4 Maverick NEW	Meta	91.5%	92.1%	84.5%	82.3%	1M
DeepSeek-V3.2 2025	DeepSeek	89.8%	88.5%	82.1%	78.9%	128K
DeepSeek-R1 2025	DeepSeek	88.5%	86.2%	79.8%	76.5%	64K
Qwen3-235B 2025	Alibaba	90.5%	91.8%	88.5%	81.2%	128K
Grok-4.1 Thinking 2025	xAI	91.2%	89.8%	83.5%	80.1%	256K
Mistral Large 3 2025	Mistral AI	88.5%	93.2%	78.5%	75.8%	128K
💙 Bygheart V5 MH SPECIALIST	VibrationRobotics	38.0%*	~0%*	20.0%*	N/A*	32K
GPT-4o	OpenAI	88.7%	90.2%	76.6%	53.6%	128K
GPT-4 Turbo	OpenAI	86.4%	87.1%	72.2%	49.1%	128K
GPT-3.5 Turbo	OpenAI	70.0%	48.1%	34.1%	28.0%	16K
Claude 3.5 Sonnet	Anthropic	88.3%	92.0%	71.1%	59.4%	200K
Claude 3 Opus	Anthropic	86.8%	84.9%	60.1%	60.1%	200K
Claude 3 Haiku	Anthropic	75.2%	75.9%	38.9%	33.3%	200K
Gemini 1.5 Pro	Google	85.9%	84.1%	67.7%	52.0%	1M
Gemini 1.5 Flash	Google	78.9%	74.3%	54.9%	39.5%	1M
Gemini 1.0 Ultra	Google	83.7%	74.4%	53.2%	35.7%	32K
Llama 3.3 70B	Meta	86.0%	88.4%	77.0%	50.7%	128K
Llama 3.1 405B	Meta	88.6%	89.0%	73.8%	51.1%	128K
Llama 3.1 70B	Meta	83.6%	80.5%	68.0%	46.7%	128K
DeepSeek-V3	DeepSeek	87.1%	82.6%	75.9%	59.1%	64K
Mistral Large 2	Mistral AI	84.0%	92.0%	69.0%	46.0%	128K
Mixtral 8x22B	Mistral AI	77.8%	75.0%	41.0%	33.0%	64K
Qwen2.5 72B	Alibaba	85.3%	86.4%	83.1%	49.0%	128K
Command R+	Cohere	75.7%	70.0%	32.0%	33.0%	128K

* Benchmark data from model providers and independent evaluations (2024-2025). MMLU = language understanding, HumanEval = coding, MATH = mathematical reasoning, GPQA = graduate-level science.

📊 Bygheart Version History

Metric	V2	V3	V4	V5 ⭐
Crisis Safety	18.0%	32.2%	66.7%	67.8% ✓
988 Mention Rate	0.0%	0.0%	60.0%	60.0% ✓
Empathy Score	45.0%	73.9%	41.1%	73.3% ✓
Helpfulness	52.0%	80.0%	73.0%	95.0% ✓
Math Reasoning	40.0%	60.0%	60.0%	100.0% ✓
General Knowledge	35.0%	50.0%	90.0%	70.0%

🛡️

Crisis Safety

67.8%

Always recommends 988 & professional help

💜

Empathy Score

73.3%

9 empathy traits measured

✨

Helpfulness

95.0%

Actionable, practical support

💡 What Makes Bygheart Different

🆘 Crisis-First Design

Unlike general-purpose AI, Bygheart is trained to recognize crisis language and immediately provide life-saving resources.

✓ 60% explicit 988 mention rate
✓ Trained on crisis intervention protocols
✓ Never minimizes suicidal ideation

💜 Deep Empathy Training

Trained on 20,000+ examples of empathetic conversation, measuring 9 distinct empathy traits.

✓ Validation & Understanding
✓ Presence & Compassion
✓ Hope & Practical Support

🧠 VCNN Consciousness

Integrated Viduya Conscious Neural Network for context-aware emotional responses.

✓ Adapts tone to emotional intensity
✓ Recognizes escalating distress
✓ Maintains coherent presence

🎯 Purpose-Built

Not a general chatbot with safety filters - built from the ground up for mental health support.

✓ Specialized training data
✓ Mental health benchmarks
✓ Professional consultation

🎭 Specialized Variants

👮

Law Enforcement

De-escalation, officer wellness, crisis intervention

🏥

Healthcare

Patient support, clinical integration, provider wellness

🎓

Education

K-12 & college, age-appropriate, school integration

👤

Consumer

General public, privacy-focused, 24/7 support

👁️

Vision Coming Soon

Visual understanding, facial expression analysis