VibeThinker 3B Model Outperforms Claude Opus 4.5 on Reasoning Using SFT+GRPO Training
Researchers have published a paper on arXiv presenting VibeThinker, a 3-billion-parameter language model that outperforms Anthropic's Claude Opus 4.5 on reasoning benchmarks. The result is achieved through a novel combination of SFT (Supervised Fine-Tuning) and GRPO training methods. The paper, available at arxiv.org/abs/2606.16140, has just been shared and has attracted minimal community discussion so far.
Comments
No comments yet — be the first to weigh in 👇
No comments yet. Be the first!