Text-state baseline
Run local text models on shallow 3x3 states to separate cube reasoning from visual perception.
lab / Rubik's Arena
A visual Rubik's cube benchmark: deterministic scrambles, strict move validation, JSON traces, replay artifacts, and public receipts. First real model race is next.
2D net prompts show all six cube faces; one 3D screenshot would hide state.
Fixture oracle runs are harness checks and are intentionally not public model results.
Real VLM lanes are pending until an image-capable endpoint is registered.
current status
Run local text models on shallow 3x3 states to separate cube reasoning from visual perception.
Image-capable models get a PNG net showing all six faces, then return strict JSON moves.
Once real lanes exist, replace this placeholder with a clearer animated replay/video.
The fixture oracle stays harness-only; public leaderboard entries should come from actual text-model or VLM runs.