World Model Benchmark

Why track world-model benchmarks

Ask three labs whether their world model is "good" and you'll get three incompatible tables. Evaluation of world models is scattered across paper appendices, leaderboard sites, and GitHub READMEs, with new suites appearing every few months. This site keeps a single, maintained index of those benchmarks — what each one measures, who runs it, and where its numbers live — with every fact linked to the paper, project page, or repository it came from.

The maintainers are the team at QUVISS. We build AI production tools that sit downstream of the model families measured here, which gives us a working reason to keep this index current — and a conflict of interest you should know about. Our rule is simple: nothing appears on these pages without a primary source attached, and we editorialize about coverage, not about winners.

Inclusion rules