Benchmarks
BabyVision
🏆 Seed-2.0: 60.6% · Visual Reasoning
BabyVision Performance Timeline
Open
Proprietary
Human (94.1%)
Performance (%)
100
80
60
40
20
0
Apr 2025
Jun 2025
Aug 2025
Oct 2025
Dec 2025
Feb 2026
Apr 2026
Jun 2026
Release Date
Human 94.1%
Leaderboard Release
Kimi-VL
KimiVL-A3B
Moonshot AI · Apr 10, 2025
12.4%
Kimi-K2.5
Kimi-K2.5
Moonshot AI · Jan 27, 2026
36.5%
MiMo-VL
MimoVL-7B-RL
Xiaomi · Apr 23, 2025
15.1%
Grok-4
Grok-4
xAI · Jul 9, 2025
16.2%
Step3
Step3
StepFun · Jul 31, 2025
14.7%
InternVL3.5
InternVL3.5-241B
OpenGVLab · Aug 26, 2025
19.2%
Qwen3-VL
Qwen3VL-235B-Thinking
Alibaba · Sep 23, 2025
22.2%
Qwen3-VL-Plus
Qwen3-VL-Plus
Alibaba · Sep 23, 2025
19.2%
Gemini 3 Pro
Gemini-3.0-Pro
Google · Nov 18, 2025
49.7%
Claude 4.5
Claude-4.5-Opus
Anthropic · Nov 24, 2025
14.2%
GLM-4.6V
GLM4.6V
Zhipu AI · Dec 8, 2025
17.6%
GPT-5.2
GPT-5.2
OpenAI · Dec 11, 2025
34.4%
Seed-1.8
Seed-1.8
ByteDance · Dec 18, 2025
30.2%
Claude 4.6
Claude-Opus-4.6
Anthropic · Feb 5, 2026
14.8%
Seed-2.0 (high)
Seed-2.0 (high)
ByteDance · Feb 14, 2026
60.6%
Qwen3.5
Qwen3.5-397B-A17B
Alibaba · Feb 16, 2026
43.3%
Gemini-3.1-Pro
Gemini-3.1-Pro
Google · Feb 23, 2026
51.6%
Hover over data points for details. Blue = Open, Orange = Proprietary. Dashed line = Human baseline (94.1%).
BabyVision Leaderboard
| Rank | Model | Type | Company | Release Date | Score | Progress |
|---|---|---|---|---|---|---|
| - | HumanBASELINE | Human | - | - | 94.1% | |
| 1 | Seed-2.0-Pro (high) | Proprietary | ByteDance | Feb 14, 2026 | 60.6% | |
| 2 | Gemini-3.1-Pro | Proprietary | Feb 23, 2026 | 51.6% | ||
| 3 | Gemini-3.0-Pro | Proprietary | Nov 18, 2025 | 49.7% | ||
| 4 | Qwen3.5-397B-A17B | Open | Alibaba | Feb 16, 2026 | 43.3% | |
| 5 | Kimi-K2.5 | Open | Moonshot AI | Jan 27, 2026 | 36.5% | |
| 6 | GPT-5.2 | Proprietary | OpenAI | Dec 11, 2025 | 34.4% | |
| 7 | Seed-1.8 | Proprietary | ByteDance | Dec 18, 2025 | 30.2% | |
| 8 | Qwen3VL-235B-Thinking | Open | Alibaba | Sep 23, 2025 | 22.2% | |
| 9 | InternVL3.5-241B | Open | OpenGVLab | Aug 26, 2025 | 19.2% | |
| 10 | Qwen3-VL-Plus | Proprietary | Alibaba | Sep 23, 2025 | 19.2% | |
| 11 | GLM4.6V | Open | Zhipu AI | Dec 8, 2025 | 17.6% | |
| 12 | Grok-4 | Proprietary | xAI | Jul 9, 2025 | 16.2% | |
| 13 | MimoVL-7B-RL | Open | Xiaomi | Apr 23, 2025 | 15.1% | |
| 14 | Claude-Opus-4.6 | Proprietary | Anthropic | Feb 5, 2026 | 14.8% | |
| 15 | Step3 | Open | StepFun | Jul 31, 2025 | 14.7% | |
| 16 | Claude-4.5-Opus | Proprietary | Anthropic | Nov 24, 2025 | 14.2% | |
| 17 | KimiVL-A3B | Open | Moonshot AI | Apr 10, 2025 | 12.4% |
Citation
For details of BabyVision, please read our paper. If you find it useful in your research, please kindly cite:
@misc{chen2026babyvisionvisualreasoninglanguage,
title={BabyVision: Visual Reasoning Beyond Language},
author={Liang Chen and Weichu Xie and Yiyan Liang and Hongfeng He and Hans Zhao and Zhibo Yang and Zhiqi Huang and Haoning Wu and Haoyu Lu and Y. charles and Yiping Bao and Yuantao Fan and Guopeng Li and Haiyang Shen and Xuanzhong Chen and Wendong Xu and Shuzheng Si and Zefan Cai and Wenhao Chai and Ziqi Huang and Fangfu Liu and Tianyu Liu and Baobao Chang and Xiaobo Hu and Kaiyuan Chen and Yixin Ren and Yang Liu and Yuan Gong and Kuan Li},
year={2026},
eprint={2601.06521},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.06521},
}