A side-by-side reading of Kimi K2.6, MiniMax M2.7, Claude Opus 4.7, and GPT-5.5 across the benchmarks the labs themselves report. Open weights versus closed flagships, spring 2026.
GPT-5.5 leads on terminal use and knowledge-work tasks; Opus 4.7 holds the line on long-form coding; the open models close the gap on the composite index for a tenth of the price. — Editor's note
| GPT-5.5 | 57 |
| Opus 4.7 | 57 |
| Kimi K2.6 | 54 |
| MiniMax M2.7 | 50 |
| MiniMax M2.7 | 34% |
| Opus 4.7 | 36% |
| Kimi K2.6 | 39% |
| GPT-5.5 | — |
Sources. Artificial Analysis Intelligence Index v4.0; OpenAI GPT-5.5 announcement (April 23, 2026); Anthropic Opus 4.7 system card; MiniMax M2.7 release notes; Moonshot Kimi K2.6 technical report; Vellum and BuildFast comparison reports.
Method. Vendor-reported scores where available. Empty cells indicate the lab did not publish a comparable number. Terminal-Bench 2.0 and tau-2 Bench Telecom use original prompts without prompt tuning. SWE-bench Verified for MiniMax is from its release notes; Opus 4.7's number is the published headline figure.
Caveats. Self-reported benchmarks favor the publishing lab. Independent verification lags release dates. Token-efficiency, latency, and real-world coding behavior are not captured in any single composite score.