排名Rank 名次区间Rank Range 模型Model 确认成功Confirmed Success 赞扬/抱怨Praise vs Complaint 可控性Steerability Bash 恢复Bash Recovery 工具幻觉Tool Hallucination 会话数Sessions
🥇 1 - 5
GPT 5.5 (High)
OpenAI · Proprietary
9.22%±1.29% 6.13%±2.30% 14.52%±4.76% 9.59%±2.39% 14.13%±1.34% 1.75%±0.21% 26.8K
🥈 1 - 6
Claude Opus 4.7 (Thinking)
Anthropic · Proprietary
8.26%±1.21% 7.86%±2.25% 9.12%±4.42% 9.21%±2.28% 13.44%±1.04% 1.69%±0.22% 26.7K
🥉 1 - 6
Claude Opus 4.6
Anthropic · Proprietary
7.91%±1.22% 7.29%±2.27% 11.87%±4.32% 6.87%±2.16% 11.77%±1.36% 1.75%±0.21% 26.8K
4 1 - 6
GPT 5.4 (High)
OpenAI · Proprietary
7.79%±1.34% 7.20%±2.36% 9.11%±4.97% 8.01%±2.71% 12.90%±1.34% 1.75%±0.21% 26.6K
5 1 - 6
GPT 5.5
OpenAI · Proprietary
7.68%±1.29% 1.85%±2.30% 13.60%±4.71% 8.11%±2.31% 13.12%±1.36% 1.75%±0.21% 27.0K
6 2 - 6
Claude Opus 4.7
Anthropic · Proprietary
6.48%±1.25% 5.17%±2.33% 7.75%±4.53% 6.04%±2.30% 11.73%±1.23% 1.69%±0.21% 26.9K
7 7 - 8
Claude Sonnet 4.6
Anthropic · Proprietary
3.37%±1.13% 1.34%±2.38% 1.96%±3.74% 3.54%±2.10% 12.20%±2.02% 1.73%±0.21% 26.8K
8 7 - 10
GLM 5.1
智谱 ZAI · MIT
1.87%±1.39% 4.57%±2.78% 0.14%±4.88% 3.01%±2.91% 6.18%±1.50% 1.75%±0.21% 22.0K
9 8 - 13
DeepSeek V4 Pro
DeepSeek · MIT
0.36%±1.39% 2.78%±2.69% 1.66%±4.94% 3.91%±2.88% 1.16%±1.26% 0.18%±0.42% 22.3K
10 8 - 13
Gemini 3.5 Flash
Google · Proprietary
0.39%±1.24% 2.02%±2.65% 2.64%±4.12% 1.77%±2.37% 2.79%±1.62% 1.71%±0.21% 19.9K
11 9 - 13
Gemini 3.1 Pro Preview
Google · Proprietary
0.81%±1.13% 0.53%±2.39% 1.97%±3.67% 1.35%±2.08% 4.56%±2.01% 1.65%±0.24% 26.8K
12 9 - 13
Kimi K2.6
月之暗面 · Modified MIT
1.15%±1.26% 0.10%±2.57% 5.48%±4.26% 3.91%±2.68% 1.77%±1.75% 1.75%±0.21% 23.4K
13 9 - 14
DeepSeek V4 Flash
DeepSeek · MIT
1.43%±1.61% 2.19%±3.00% 2.25%±5.62% 7.86%±3.29% 0.99%±1.77% 0.20%±0.49% 22.1K
14 13 - 15
Qwen 3.6 Plus
阿里巴巴 · Proprietary
4.01%±1.43% 1.39%±2.81% 5.90%±5.00% 10.96%±3.07% 0.08%±1.75% 1.88%±0.64% 21.6K
15 14 - 15
Grok Build 0.1
xAI · Proprietary
5.31%±1.26% 6.33%±2.92% 15.85%±3.92% 7.00%±2.87% 6.15%±1.57% 3.53%±0.64% 15.7K
16 16 - 18
Minimax M2.7
MiniMax · Modified MIT
8.39%±1.24% 10.80%±2.96% 20.06%±3.74% 10.05%±2.61% 2.77%±2.05% 1.75%±0.21% 22.2K
17 16 - 18
Grok 4.3 (High)
xAI · Proprietary
9.45%±2.22% 15.85%±5.07% 16.61%±6.10% 9.30%±4.23% 3.87%±5.11% 1.62%±1.52% 4.8K
18 16 - 18
Gemini 3 Flash
Google · Proprietary
9.47%±1.23% 13.68%±2.55% 14.49%±3.12% 6.41%±2.16% 13.69%±3.58% 0.91%±0.61% 26.7K
19 19 - 19
Gemma 4 31B
Google · Apache 2.0
14.89%±2.40% 9.30%±3.03% 11.50%±4.34% 7.34%±2.85% 30.32%±8.61% 15.99%±5.55% 15.9K
20 20 - 20
Grok 4.3
xAI · Proprietary
23.31%±2.03% 13.52%±2.63% 14.30%±3.55% 6.17%±2.18% 83.23%±8.53% 0.65%±0.58% 25.9K
🔍

没有找到匹配的模型,换个关键词试试?No matching models found. Try a different keyword?