| 排名Rank | 名次区间Rank Range | 模型Model | 确认成功Confirmed Success | 赞扬/抱怨Praise vs Complaint | 可控性Steerability | Bash 恢复Bash Recovery | 工具幻觉Tool Hallucination | 会话数Sessions | |
|---|---|---|---|---|---|---|---|---|---|
| 🥇 | 1 - 5 |
GPT 5.5 (High)
OpenAI · Proprietary
|
9.22%±1.29% | 6.13%±2.30% | 14.52%±4.76% | 9.59%±2.39% | 14.13%±1.34% | 1.75%±0.21% | 26.8K |
| 🥈 | 1 - 6 |
Claude Opus 4.7 (Thinking)
Anthropic · Proprietary
|
8.26%±1.21% | 7.86%±2.25% | 9.12%±4.42% | 9.21%±2.28% | 13.44%±1.04% | 1.69%±0.22% | 26.7K |
| 🥉 | 1 - 6 |
Claude Opus 4.6
Anthropic · Proprietary
|
7.91%±1.22% | 7.29%±2.27% | 11.87%±4.32% | 6.87%±2.16% | 11.77%±1.36% | 1.75%±0.21% | 26.8K |
| 4 | 1 - 6 |
GPT 5.4 (High)
OpenAI · Proprietary
|
7.79%±1.34% | 7.20%±2.36% | 9.11%±4.97% | 8.01%±2.71% | 12.90%±1.34% | 1.75%±0.21% | 26.6K |
| 5 | 1 - 6 |
GPT 5.5
OpenAI · Proprietary
|
7.68%±1.29% | 1.85%±2.30% | 13.60%±4.71% | 8.11%±2.31% | 13.12%±1.36% | 1.75%±0.21% | 27.0K |
| 6 | 2 - 6 |
Claude Opus 4.7
Anthropic · Proprietary
|
6.48%±1.25% | 5.17%±2.33% | 7.75%±4.53% | 6.04%±2.30% | 11.73%±1.23% | 1.69%±0.21% | 26.9K |
| 7 | 7 - 8 |
Claude Sonnet 4.6
Anthropic · Proprietary
|
3.37%±1.13% | 1.34%±2.38% | 1.96%±3.74% | 3.54%±2.10% | 12.20%±2.02% | 1.73%±0.21% | 26.8K |
| 8 | 7 - 10 |
GLM 5.1
智谱 ZAI · MIT
|
1.87%±1.39% | 4.57%±2.78% | 0.14%±4.88% | 3.01%±2.91% | 6.18%±1.50% | 1.75%±0.21% | 22.0K |
| 9 | 8 - 13 |
DeepSeek V4 Pro
DeepSeek · MIT
|
0.36%±1.39% | 2.78%±2.69% | 1.66%±4.94% | 3.91%±2.88% | 1.16%±1.26% | 0.18%±0.42% | 22.3K |
| 10 | 8 - 13 |
Gemini 3.5 Flash
Google · Proprietary
|
0.39%±1.24% | 2.02%±2.65% | 2.64%±4.12% | 1.77%±2.37% | 2.79%±1.62% | 1.71%±0.21% | 19.9K |
| 11 | 9 - 13 |
Gemini 3.1 Pro Preview
Google · Proprietary
|
0.81%±1.13% | 0.53%±2.39% | 1.97%±3.67% | 1.35%±2.08% | 4.56%±2.01% | 1.65%±0.24% | 26.8K |
| 12 | 9 - 13 |
Kimi K2.6
月之暗面 · Modified MIT
|
1.15%±1.26% | 0.10%±2.57% | 5.48%±4.26% | 3.91%±2.68% | 1.77%±1.75% | 1.75%±0.21% | 23.4K |
| 13 | 9 - 14 |
DeepSeek V4 Flash
DeepSeek · MIT
|
1.43%±1.61% | 2.19%±3.00% | 2.25%±5.62% | 7.86%±3.29% | 0.99%±1.77% | 0.20%±0.49% | 22.1K |
| 14 | 13 - 15 |
Qwen 3.6 Plus
阿里巴巴 · Proprietary
|
4.01%±1.43% | 1.39%±2.81% | 5.90%±5.00% | 10.96%±3.07% | 0.08%±1.75% | 1.88%±0.64% | 21.6K |
| 15 | 14 - 15 |
Grok Build 0.1
xAI · Proprietary
|
5.31%±1.26% | 6.33%±2.92% | 15.85%±3.92% | 7.00%±2.87% | 6.15%±1.57% | 3.53%±0.64% | 15.7K |
| 16 | 16 - 18 |
Minimax M2.7
MiniMax · Modified MIT
|
8.39%±1.24% | 10.80%±2.96% | 20.06%±3.74% | 10.05%±2.61% | 2.77%±2.05% | 1.75%±0.21% | 22.2K |
| 17 | 16 - 18 |
Grok 4.3 (High)
xAI · Proprietary
|
9.45%±2.22% | 15.85%±5.07% | 16.61%±6.10% | 9.30%±4.23% | 3.87%±5.11% | 1.62%±1.52% | 4.8K |
| 18 | 16 - 18 |
Gemini 3 Flash
Google · Proprietary
|
9.47%±1.23% | 13.68%±2.55% | 14.49%±3.12% | 6.41%±2.16% | 13.69%±3.58% | 0.91%±0.61% | 26.7K |
| 19 | 19 - 19 |
Gemma 4 31B
Google · Apache 2.0
|
14.89%±2.40% | 9.30%±3.03% | 11.50%±4.34% | 7.34%±2.85% | 30.32%±8.61% | 15.99%±5.55% | 15.9K |
| 20 | 20 - 20 |
Grok 4.3
xAI · Proprietary
|
23.31%±2.03% | 13.52%±2.63% | 14.30%±3.55% | 6.17%±2.18% | 83.23%±8.53% | 0.65%±0.58% | 25.9K |
没有找到匹配的模型,换个关键词试试?No matching models found. Try a different keyword?
OpenAI
Anthropic
智谱 ZAI
DeepSeek
Google
月之暗面
阿里巴巴
xAI
MiniMax