470
(openai.com)
Philip-J-Fry | 1 hour ago | 7 Comment
That's hilarious. Does OpenAI even know this doesn't work?
Show Reply 7 [+]
__jl__ | 1 hour ago | 12 Comment
OpenAI now has three price points: GPT 5.1, GPT 5.2 and now GPT 5.4. There version numbers jump across different model lines with codex at 5.3, what they now call instant also at 5.3.
Anthropic are really the only ones who managed to get this under control: Three models, priced at three different levels. New models are immediately available everywhere.
Google essentially only has Preview models! The last GA is 2.5. As a developer, I can either use an outdated model or have zero insurances that the model doesn't get discontinued within weeks.
Show Reply 12 [+]
minimaxir | 4 hours ago | 11 Comment
Also per pricing, GPT-5.4 ($2.50/M input, $15/M output) is much cheaper than Opus 4.6 ($5/M input, $25/M output) and Opus has a penalty for its beta >200k context window.
I am skeptical whether the 1M context window will provide material gains as current Codex/Opus show weaknesses as its context window is mostly full, but we'll see.
Per updated docs (https://developers.openai.com/api/docs/guides/latest-model), it supercedes GPT-5.3-Codex, which is an interesting move.
Show Reply 11 [+]
atkrad | now | 1 Comment
creamyhorror | 2 hours ago | 3 Comment
It might be my AGENTS.md requiring clearer, simpler language, but at least 5.4's doing a good job of following the guidelines. 5.3-Codex wasn't so great at simple, clear writing.
Show Reply 3 [+]
Alifatisk | 1 hour ago | 3 Comment
We got:
- GPT-5.1
- GPT-5.2 Thinking
- GPT-5.3 (codex)
- GPT-5.3 Instant
- GPT-5.4 Thinking
- GPT-5.4 Pro
Who’s to blame for this ridiculous path they are taking? I’m so glad I am not a Chat user, because this adds so much unnecessary cognitive load.
The good news here is the support for 1M context window, finally it has caught up to Gemini.
Show Reply 3 [+]
kgeist | 1 hour ago | 4 Comment
>Today, we’re releasing GPT‑5.4 in ChatGPT (as GPT‑5.4 Thinking),
>Note that there is not a model named GPT‑5.3 Thinking
They held out for eight months without a confusing numbering scheme :)
Show Reply 4 [+]
Chance-Device | 4 hours ago | 8 Comment
Show Reply 8 [+]
gavinray | 3 hours ago | 4 Comment
It's very similar to "Battle Brothers", and the fact that RPG games require art assets, AI for enemy moves, and a host of other logical systems makes it all the more impressive.
Show Reply 4 [+]
mattas | 4 hours ago | 16 Comment
They show an example of 5.4 clicking around in Gmail to send an email.
I still think this is the wrong interface to be interacting with the internet. Why not use Gmail APIs? No need to do any screenshot interpretation or coordinate-based clicking.
Show Reply 16 [+]
smoody07 | 2 hours ago | 5 Comment
Show Reply 5 [+]
egonschiele | 3 hours ago | 2 Comment
Show Reply 2 [+]
consumer451 | 1 hour ago | 1 Comment
> Theme park simulation game made with GPT‑5.4 from a single lightly specified prompt, using Playwright Interactive for browser playtesting and image generation for the isometric asset set.
Is "Playwright Interactive" a skill that takes screenshots in a tight loop with code changes, or is there more to it?
zone411 | 1 hour ago | 1 Comment
GPT-5.4 extra high scores 94.0 (GPT-5.2 extra high scored 88.6).
GPT-5.4 medium scores 92.0 (GPT-5.2 medium scored 71.4).
GPT-5.4 no reasoning scores 32.8 (GPT-5.2 no reasoning scored 28.1).
yanis_t | 3 hours ago | 12 Comment
Show Reply 12 [+]
nickysielicki | 3 hours ago | 9 Comment
In practice, if I buy $200/mo codex, can I basically run 3 codex instances simultaneously in tmux, like I can with claude code pro max, all day every day, without hitting limits?
Show Reply 9 [+]
senko | 54 minutes ago | 1 Comment
This is on the edge of what the frontier models can do. For 5.4, the result is better than 5.3-Codex and Opus 4.6. (Edit: nowhere near the RPG game from their blog post, which was presumably much more specced out and used better engineering setup).
I also tested it with a non-trivial task I had to do on an existing legacy codebase, and it breezed through a task that Claude Code with Opus 4.6 was struggling with.
I don't know when Anthropic will fire back with their own update, but until then I'll spend a bit more time with Codex CLI and GPT 5.4.
prydt | 3 hours ago | 3 Comment
Show Reply 3 [+]
twtw99 | 4 hours ago | 8 Comment
Show Reply 8 [+]
denysvitali | 4 hours ago | 3 Comment
gpt-5.4
Input: $2.50 /M tokens
Cached: $0.25 /M tokens
Output: $15 /M tokens
---
gpt-5.4-pro
Input: $30 /M tokens
Output: $180 /M tokens
Wtf
Show Reply 3 [+]
timpera | 3 hours ago | 1 Comment
This was definitely missing before, and a frustrating difference when switching between ChatGPT and Codex. Great addition.
jryio | 4 hours ago | 1 Comment
hmokiguess | 1 hour ago | 1 Comment
daft_pink | 2 hours ago | 3 Comment
Show Reply 3 [+]
butILoveLife | 1 hour ago | 1 Comment
I imagine they added a feature or two, and the router will continue to give people 70B parameter-like responses when they dont ask for math or coding questions.
rbitar | 3 hours ago | 1 Comment
ZeroCool2u | 4 hours ago | 5 Comment
Not sure if this is more concerning for the test time compute paradigm or the underlying model itself.
Maybe I'm misunderstanding something though? I'm assuming 5.4 and 5.4 Thinking are the same underlying model and that's not just marketing.
Show Reply 5 [+]
motbus3 | 2 hours ago | 1 Comment
nickandbro | 3 hours ago | 2 Comment
https://www.svgviewer.dev/s/gAa69yQd
Not the best pelican compared to gemini 3.1 pro, but I am sure with coding or excel does remarkably better given those are part of its measured benchmarks.
Show Reply 2 [+]
elmean | 3 hours ago | 8 Comment
Show Reply 8 [+]
dandiep | 3 hours ago | 4 Comment
Show Reply 4 [+]
Aldipower | 53 minutes ago | 1 Comment
bazmattaz | 3 hours ago | 5 Comment
Show Reply 5 [+]
paxys | 3 hours ago | 2 Comment
A couple months later:
"We are deprecating the older model."
Show Reply 2 [+]
XCSme | 2 hours ago | 1 Comment
jstummbillig | 2 hours ago | 1 Comment
This becomes increasingly less clear to me, because the more interesting work will be the agent going off for 30mins+ on high / extra high (it's mostly one of the two), and that's a long time to wait and an unfeasible amount of code to a/b
smusamashah | 1 hour ago | 1 Comment
GPT is not even close yo Claude in terms of responding to BS.
- | 3 hours ago | 1 Comment
alpineman | 3 hours ago | 1 Comment
7777777phil | 3 hours ago | 3 Comment
I'd believe it on those specific tasks. Near-universal adoption in software still hasn't moved DORA metrics. The model gets better every release. The output doesn't follow. Just had a closer look on those productivity metrics this week: https://philippdubach.com/posts/93-of-developers-use-ai-codi...
Show Reply 3 [+]
melbourne_mat | 40 minutes ago | 1 Comment
OsrsNeedsf2P | 3 hours ago | 2 Comment
Show Reply 2 [+]
motza | 1 hour ago | 1 Comment
strongpigeon | 3 hours ago | 3 Comment
Show Reply 3 [+]
bob1029 | 2 hours ago | 1 Comment
cj | 3 hours ago | 3 Comment
Interesting, the "Health" category seems to report worse performance compared to 5.2.
Show Reply 3 [+]
iamronaldo | 4 hours ago | 1 Comment
- | 1 hour ago | 1 Comment
swingboy | 3 hours ago | 1 Comment
- | 1 hour ago | 1 Comment
gigatexal | 34 minutes ago | 1 Comment
nthypes | 3 hours ago | 7 Comment
Show Reply 7 [+]
throwaway5752 | 1 hour ago | 1 Comment
thefounder | 20 minutes ago | 1 Comment
vicchenai | 3 hours ago | 1 Comment
fernst | 56 minutes ago | 1 Comment
beernet | 3 hours ago | 1 Comment
woeirua | 1 hour ago | 2 Comment
Show Reply 2 [+]
world2vec | 3 hours ago | 1 Comment
koakuma-chan | 2 hours ago | 3 Comment
numerusformassistant to=functions.ReadFile մեկնաբանություն 天天爱彩票网站json {"path":
Show Reply 3 [+]
ilaksh | 3 hours ago | 3 Comment
Show Reply 3 [+]
- | 2 hours ago | 1 Comment
OutOfHere | 3 hours ago | 1 Comment
lostmsu | 3 hours ago | 2 Comment
Show Reply 2 [+]
- | 3 hours ago | 1 Comment
HardCodedBias | 3 hours ago | 1 Comment
In terms of writing and research even Gemini, with a good prompt, is close to useable. That's likely not a differentiator.
oytis | 3 hours ago | 1 Comment
iamleppert | 3 hours ago | 1 Comment
Not including the Chinese models is also obviously done to make it appear like they aren't as cooked as they really are.
minimaxir | 4 hours ago | 2 Comment
Show Reply 2 [+]
jeff_antseed | 3 hours ago | 1 Comment
- | 3 hours ago | 1 Comment
shablulman | 4 hours ago | 1 Comment
readytion | 1 hour ago | 1 Comment
chromic04850 | 4 hours ago | 1 Comment
chromic04850 | 3 hours ago | 1 Comment
leftbehinds | 3 hours ago | 1 Comment
leftbehinds | 3 hours ago | 1 Comment