| 🥇 | 45 - 53 |
glm-image
智谱 ZAI · MIT
|
1010±15.6 | 1.2K 票votes | 1010.107 [994.5, 1025.7] | 63.45 · 不稳Volatile |
没有找到匹配的模型,换个关键词试试?No matching models found. Try a different keyword?
查看各种 AI 大语言模型在数学推理、代码生成、创意写作及其他开放式文本任务中的综合 Elo 积分排名,数据来源于真实用户的匿名盲测投票。
每个榜单会采用官网对应的核心排序指标。通用 Arena 榜单通常是模型竞技积分;智能体榜单则展示净提升及多项任务指标。
Each leaderboard uses the official primary ranking metric. Most Arena boards use competitive scores; the Agent board uses net improvement and task-specific metrics.
主指标的完整浮点数值。对于通用 Arena 榜单可用于区分整数分相近的模型;智能体榜单以百分比和置信区间展示。
The full floating-point value of the primary metric. It helps distinguish close scores on standard boards; Agent metrics are shown as percentages with confidence intervals.
衡量结果的统计稳定程度。绿色·稳定(<5)更可信;橙色·波动(5~12);红色·不稳(>12)说明排名还可能明显变化。
Measures statistical stability. Green·Stable (<5); Orange·Unstable (5–12); Red·Volatile (>12) means the ranking may still move significantly.
真实实力有 95% 概率落在 [lower, upper] 区间内,显示为 ±N。N 越小排名越可信,票数越多 N 越小。
The model's true strength falls within [lower, upper] with 95% probability, shown as ±N. Smaller N = more reliable ranking; more votes → smaller N.
该模型或厂商参与评测的总次数。数量越高,结果的统计置信度越高、越不容易受单次样本影响而波动。
Total evaluation volume for the model or lab. Higher counts mean higher statistical confidence and less sensitivity to individual samples.
Apache/Llama:完全开源可商用;MIT:开源限制极少;Proprietary:闭源商业模型。协议决定你能否把它集成到自己的产品里。
Apache/Llama: Fully open-source, commercial use allowed; MIT: Open-source, minimal restrictions; Proprietary: Closed-source commercial model. License determines if you can integrate it into your product.
| 🥇 | 45 - 53 |
glm-image
智谱 ZAI · MIT
|
1010±15.6 | 1.2K 票votes | 1010.107 [994.5, 1025.7] | 63.45 · 不稳Volatile |
没有找到匹配的模型,换个关键词试试?No matching models found. Try a different keyword?