82.9%
V1 HumanEval
75.1%
V1 MBPP
28.7%
V3 HumanEval
73.2%
V3 MBPP
Code Assistant
Quick Examples
Model Info
Model
Bygheart Coder V3
Parameters
7B
Training Data
172K curated examples
Context Length
32K tokens
Benchmark Results
HumanEval
V1: 82.9%OpenAI benchmark for Python function generation. Current results use updated in-house execution harness.
Problems:
164
V1 Passed:
136
V3 Passed:
47
MBPP
V1: 75.1%Mostly Basic Python Problems. Tests ability to solve entry-level programming problems.
Problems:
257
V1 Passed:
193
V3 Passed:
188
Comparison with Other Models
| Model | Size | HumanEval | MBPP |
|---|---|---|---|
| Bygheart Coder V1 | 7B | 82.9% | 75.1% |
| Bygheart Coder V2 | 7B | 31.1% | 73.2% |
| Bygheart Coder V3 | 7B | 28.7% | 73.2% |
| GPT-4 | ~1.8T | 87.1% | 83.0% |
| Claude 3.5 Sonnet | ~175B | 92.0% | 87.6% |
| CodeLlama 34B | 34B | 48.8% | 55.0% |
| DeepSeek Coder 6.7B | 6.7B | 78.6% | 74.2% |
| StarCoder2 7B | 7B | 35.4% | 54.4% |