Benchmarks

BabyVision

🏆 Seed-2.0: 60.6% · Visual Reasoning

BabyVision Performance Timeline

Open
Proprietary
Human (94.1%)
Performance (%)
100
80
60
40
20
0
Apr 2025
Jun 2025
Aug 2025
Oct 2025
Dec 2025
Feb 2026
Apr 2026
Jun 2026
Release Date
Human 94.1%
Leaderboard Release
Kimi-VL
KimiVL-A3B
Moonshot AI · Apr 10, 2025
12.4%
Kimi-K2.5
Kimi-K2.5
Moonshot AI · Jan 27, 2026
36.5%
MiMo-VL
MimoVL-7B-RL
Xiaomi · Apr 23, 2025
15.1%
Grok-4
Grok-4
xAI · Jul 9, 2025
16.2%
Step3
Step3
StepFun · Jul 31, 2025
14.7%
InternVL3.5
InternVL3.5-241B
OpenGVLab · Aug 26, 2025
19.2%
Qwen3-VL
Qwen3VL-235B-Thinking
Alibaba · Sep 23, 2025
22.2%
Qwen3-VL-Plus
Qwen3-VL-Plus
Alibaba · Sep 23, 2025
19.2%
Gemini 3 Pro
Gemini-3.0-Pro
Google · Nov 18, 2025
49.7%
Claude 4.5
Claude-4.5-Opus
Anthropic · Nov 24, 2025
14.2%
GLM-4.6V
GLM4.6V
Zhipu AI · Dec 8, 2025
17.6%
GPT-5.2
GPT-5.2
OpenAI · Dec 11, 2025
34.4%
Seed-1.8
Seed-1.8
ByteDance · Dec 18, 2025
30.2%
Claude 4.6
Claude-Opus-4.6
Anthropic · Feb 5, 2026
14.8%
Seed-2.0 (high)
Seed-2.0 (high)
ByteDance · Feb 14, 2026
60.6%
Qwen3.5
Qwen3.5-397B-A17B
Alibaba · Feb 16, 2026
43.3%
Gemini-3.1-Pro
Gemini-3.1-Pro
Google · Feb 23, 2026
51.6%

Hover over data points for details. Blue = Open, Orange = Proprietary. Dashed line = Human baseline (94.1%).

Can MLLMs See Like a 3-Year-Old?

BabyVision Leaderboard

Rank Model Type Company Release Date Score Progress
- HumanBASELINE Human - - 94.1%
1 Seed-2.0-Pro (high) Proprietary ByteDance Feb 14, 2026 60.6%
2 Gemini-3.1-Pro Proprietary Google Feb 23, 2026 51.6%
3 Gemini-3.0-Pro Proprietary Google Nov 18, 2025 49.7%
4 Qwen3.5-397B-A17B Open Alibaba Feb 16, 2026 43.3%
5 Kimi-K2.5 Open Moonshot AI Jan 27, 2026 36.5%
6 GPT-5.2 Proprietary OpenAI Dec 11, 2025 34.4%
7 Seed-1.8 Proprietary ByteDance Dec 18, 2025 30.2%
8 Qwen3VL-235B-Thinking Open Alibaba Sep 23, 2025 22.2%
9 InternVL3.5-241B Open OpenGVLab Aug 26, 2025 19.2%
10 Qwen3-VL-Plus Proprietary Alibaba Sep 23, 2025 19.2%
11 GLM4.6V Open Zhipu AI Dec 8, 2025 17.6%
12 Grok-4 Proprietary xAI Jul 9, 2025 16.2%
13 MimoVL-7B-RL Open Xiaomi Apr 23, 2025 15.1%
14 Claude-Opus-4.6 Proprietary Anthropic Feb 5, 2026 14.8%
15 Step3 Open StepFun Jul 31, 2025 14.7%
16 Claude-4.5-Opus Proprietary Anthropic Nov 24, 2025 14.2%
17 KimiVL-A3B Open Moonshot AI Apr 10, 2025 12.4%

Citation

For details of BabyVision, please read our paper. If you find it useful in your research, please kindly cite:

@misc{chen2026babyvisionvisualreasoninglanguage,
      title={BabyVision: Visual Reasoning Beyond Language}, 
      author={Liang Chen and Weichu Xie and Yiyan Liang and Hongfeng He and Hans Zhao and Zhibo Yang and Zhiqi Huang and Haoning Wu and Haoyu Lu and Y. charles and Yiping Bao and Yuantao Fan and Guopeng Li and Haiyang Shen and Xuanzhong Chen and Wendong Xu and Shuzheng Si and Zefan Cai and Wenhao Chai and Ziqi Huang and Fangfu Liu and Tianyu Liu and Baobao Chang and Xiaobo Hu and Kaiyuan Chen and Yixin Ren and Yang Liu and Yuan Gong and Kuan Li},
      year={2026},
      eprint={2601.06521},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.06521}, 
}