When did GPT-5.5 launch?

OpenAI launched GPT-5.5 (internal codename 'Spud') on April 23, 2026, to ChatGPT Plus, Pro, Business, and Enterprise users. GPT-5.5 and GPT-5.5 Pro became available in the API on April 24, 2026.

Is GPT-5.5 actually better than GPT-5 for coding?

On CodeRabbit's curated review benchmark, GPT-5.5 raised the expected-issue-found rate from 58.3% to 79.2% (+20.9 points), improved precision from 27.9% to 40.6% (+12.7 points), and produced 75 comments per pull request versus 67 at baseline. That's roughly +36% more real bugs caught with only +12% more comments. It also leads every publicly available model on Terminal-Bench 2.0 at 82.7%.

What is the GPT-5.5 token economy advantage?

Inside agent harnesses doing plan → act → inspect → retry loops, GPT-5.5 uses roughly half the tokens per step compared to verbose 5.x baselines. That lets the same fixed token budget run about 2.4× more retries before exhausting it — which is the practical reason 5.5 'feels different' on real agent workloads.

What are the known quirks of GPT-5.5?

Four worth knowing: (1) An indigo/violet UI color bias — it defaults to #4F46E5, #6366F1, #7C3AED-style palettes unless told otherwise. (2) It follows weak prompts too literally and no longer asks clarifying questions. (3) Less verbose output can feel like less effort, but on debug and regression tasks it actually isolates the failure faster. (4) Polished, but not original — execution quality jumped, originality didn't.

Should small businesses switch to GPT-5.5?

Yes — for scoped, real work. Give it a bounded task, explicit constraints, and a way to verify its output against the real system. It will out-execute the previous baseline on review signal, code-fix scope, and token economy. For architecture-level decisions and original creative work, the human is still the load-bearing element.

Back to insights

BreakingApril 25, 2026/ 7 min read/ Daily Ep. 017

GPT-5.5 is here. The internet has thoughts.

By Tijo Gaucher & Annaki Nguyen · Human+AI

GPT-5.5 launch — humanai.news Daily Episode 017, April 25 2026

TL;DR

Shipped:April 23, 2026 in ChatGPT & Codex; API live April 24. Codename “Spud.”
Real bugs caughtin code review: 58.3% → 79.2% (+20.9 pt).
Precision: 27.9% → 40.6% (+12.7 pt). Less noise on PRs.
Token economy: ~half the tokens per agent step → ~2.4× more retries inside the same budget.
Watch out for: indigo/violet UI bias, prompts followed too literally, polished but unoriginal output.
Verdict for SMEs: switch — for scoped, real work. Architecture taste is still on you.

OpenAI shipped GPT-5.5 — internal codename “Spud”— on April 23, 2026. The framing on the OpenAI blog calls it a “new class of intelligence.” The framing in our timeline calls it incremental. Both are right, depending on which job you're asking it to do.

Underneath the marketing, this is the biggest jump in developer-workflow signal we've seen in a release this year. Below: the receipts, the quiet feature nobody put on a slide, the four quirks worth knowing, and the verdict for any small or mid-sized team trying to decide whether to flip the switch this week.

What changed with GPT-5.5

GPT-5.5 is the first fully retrained base model since GPT-4.5, natively omnimodal, and it leads every publicly available model on Terminal-Bench 2.0 at 82.7%— the benchmark that measures whether a model can actually finish a multi-step coding task at the command line, not just write a snippet. It rolled out to ChatGPT Plus, Pro, Business, and Enterprise on April 23, with API access for GPT-5.5 and GPT-5.5 Pro arriving April 24. It also shipped inside Codex.

The headline is not “new class of intelligence.”The headline is: it's noticeably better at finding real bugs, while saying less to do it.

GPT-5.5 vs baseline on CodeRabbit's curated review set — expected issues found, precision, comments per PR

The CodeRabbit receipts

CodeRabbit — the AI code review platform — ran the new model on a curated review benchmark and published the numbers the day GPT-5.5 dropped. The shape of the result matters more than any single percentage:

Expected issues found: 58.3% → 79.2%— a +20.9-point jump on real bugs surfaced.
Precision: 27.9% → 40.6%— +12.7 points. When it speaks up, it's right more often.
Comments per PR: 67 → 75— +12% more output, but with a much higher hit-rate.

Translation: +36% more real bugs caught with only +12% more comments. The model isn't just throwing more text at the problem — signal-per-comment is up. That's the harder thing to move, and that's where 5.5's advantage actually lives.

The quiet feature nobody's talking about

Token economy — GPT-5.5 uses roughly half the tokens of the verbose baseline across plan, act, and inspect steps

The press release leans on the headline benchmarks. The thing that actually shifts your day-to-day is the token economy.

On agent harnesses doing plan → act → inspect → retry loops, GPT-5.5 uses roughly half the tokens per step compared to the verbose 5.x baseline. Plan tokens drop from ~1,310 to ~640. Act tokens drop from ~1,540 to ~880. Inspect tokens drop from ~1,180 to ~510.

Practically: inside the same fixed token budget, you can now run about 2.4× more retriesbefore the budget runs out. For anyone running coding agents, browser-automation agents, or scheduled jobs against the OpenAI API, that's a real money number, not a benchmark footnote. It's the reason 5.5 “feels different” long before you read a release blog.

Voices from the timeline

What press, researchers, devs, and skeptics are saying about GPT-5.5

The early consensus is unusually convergent for an OpenAI release week. The honest people from each camp are saying basically the same thing in different vocabulary:

CodeRabbit (early testing): “Quicker, leaner, more direct. Strong bias toward small workable changes instead of broad rewrites.”
Top Hacker News thread: “Codex feels like it actually shipped this time. Smaller diffs, fewer ‘helpful’ rewrites of unrelated files.”
Researcher take on X: “It rejects weak diagnoses now. That's the part nobody benchmarks but you feel it on PR #1.”
Skeptic camp:“Modest delta on a vendor's own benchmark. Wake me when it's on a held-out one.” A fair line. Worth noting that Terminal-Bench 2.0 isn't OpenAI's benchmark — that one's independent.

Four quirks worth knowing

1. The indigo-violet default

On UI tasks, GPT-5.5 produces unusually polished interaction work and detailed animations — but it defaults to a noticeable indigo/violet palette (#4F46E5, #6366F1, #7C3AED, #8B5CF6). OpenAI's own design playbook flags “avoid purple-on-white defaults” for a reason. Override it explicitly in your prompt or your product is going to look like every other AI-generated SaaS landing page from the last six months.

2. It follows weak prompts too literally

When the prompt is vague, contradictory, or under-specified, 5.5 doesn't pause to clarify the way 5.4 sometimes did. It just executes — fast — and the output mirrors your prompt's weaknesses. Prompt quality matters more now, not less.If you've been getting away with sloppy briefs, this is the release where you'll feel it.

3. Less verbose ≠ less smart

The “thinking out loud” padding is gone. On a quick demo it can feel like 5.5 is doing less work. On real debug, access-control, and API-regression tasks it's the opposite — it isolates the actual failure and proposes a scoped fix faster than 5.4 ever did. Don't confuse fewer words for less judgment.

4. Polished, not original

Execution quality jumped. Originality didn't. Ask 5.5 to ship something cleanly and it will. Ask it to surprise you and it falls back on familiar styling, familiar architecture, familiar metaphors. Architecture-level taste is still a human load-bearing element. Don't outsource it to the model and then complain it all looks the same.

Should you switch? The verdict.

Yes — for scoped, real work.

Give GPT-5.5 a bounded task, explicit constraints, and a way to verify its output against the real system. It will out-execute the previous baseline on three things that compound across a quarter: review signal on every pull request, code-fix scope on every bug, and token economy on every agent loop. Those three are exactly the loops most SMEs are spending real money on.

For the harder things — original product strategy, architecture decisions, creative judgment — the human is still the load-bearing element. The model just got a lot better at executing the decisions you make. That's good news for anyone trying to do more with a small team. It's also a reminder that a sharper tool punishes a vague brief.

What to do this week (if you run an SME)

Move your code review— if you're using an automated reviewer like CodeRabbit, Greptile, or CodiumAI, switch the underlying model to GPT-5.5 and re-run last week's merged PRs. Diff the comments. The +20.9 points is yours to claim, not just to read about.
Audit your agent prompts for vagueness. The literal-execution behavior means any sloppy phrasing in your existing prompts is now actively hurting output. Tighten them.
Override the color palette.If you're using 5.5 for UI generation, hard-code your brand colors in the system prompt. Otherwise enjoy your Indigo Phase.
Re-cost your agent budgets.The token economy shift means your existing per-task budgets are now over-provisioned. That's either margin you can capture or retries you can buy — either way, recalculate.

That's the bet Human+AI is built on: every model release is a hand-off opportunity. Pick one workflow this week, hand it to 5.5, and ship. The teams that compound those weekly hand-offs through 2026 are the ones who'll spend 2027 extending leads, not catching up.

Work with us

Want a GPT-5.5-powered agent running inside your business this month?

Human+AI is powered by Rapidclaw— we design and ship managed agentic workflows for SMEs. No API keys to wire up, no model selection to second-guess. We pick the right model (often 5.5, sometimes not), run the loop on reliable infrastructure, and hand you a thing that just works.

See what Rapidclaw builds