GPT-5.4

Overview

GPT-5.4 is OpenAI’s latest flagship model line, representing the fifth major generation with incremental version 4. The model family includes both thinking and production-optimized variants, along with lightweight alternatives (Mini and Nano). GPT-5.4 achieves improvements in factual accuracy, context handling, and token efficiency while expanding pricing options for cost-sensitive applications.

Timeline

2026-03-09-AI-Digest - GPT-5.4 model line introduced and initial benchmarks released
2026-03-16-AI-Digest - Thinking variant capabilities and performance analysis
2026-03-18-AI-Digest - Factual accuracy improvements and evaluation results
2026-03-23-AI-Digest - Mini variant benchmarking and lightweight model performance
2026-03-24-AI-Digest - SWE-bench and software engineering evaluation updates
2026-03-27-AI-Digest - Nano variant pricing and efficiency announcements
2026-04-04-AI-Digest - GPT-5.4 Thinking variant scores 75.0% on OSWorld-Verified, officially surpassing human-level desktop task automation; Responses API extended with shell tool, agent execution loop, and hosted container workspaces.
2026-04-05-AI-Digest — GPT-5.4 Thinking confirmed at 75.0% OSWorld-V (above 72.4% human baseline); Responses API extended for agentic development with shell tool and hosted workspaces.
2026-04-06-AI-Digest — Referenced in vibe coding and agentic workflow discussions alongside Codex.
2026-04-09-AI-Digest — GPT-5.4 ties Gemini 3.1 Pro Preview at the top of the Artificial Analysis Intelligence Index v4.0 with a score of 57, ahead of Claude Opus 4.6 (53) and Meta’s newly launched Muse Spark (52), confirming OpenAI’s continued frontier benchmark leadership even as commercial run-rate trails Anthropic.
2026-04-13-AI-Digest — OpenAI replaces o1-mini with o3-mini as default reasoning model (3x faster); launches Flex Compute pricing for o3 at 30% off-peak discount, signaling inference cost pressure even for flagship reasoning models.

Model Variants

GPT-5.4 Base/Thinking

Context window - 1M tokens
Reasoning capability - Advanced planning and step-by-step inference
Factual accuracy - 33% fewer factual errors versus predecessors

GPT-5.4 Mini

SWE-Bench Pro - 54.4% performance score
Target use case - Balanced performance and cost for general applications
Cost profile - Mainstream pricing tier

GPT-5.4 Nano

Input pricing - $0.20 per million tokens
Output pricing - $1.25 per million tokens
Use case - Ultra-low-cost inference for high-volume applications
Trade-off - Reduced model capability for minimal latency and maximum throughput

Key Specs & Benchmarks

Token Efficiency

Token reduction - 47% fewer tokens required than predecessor models
Implication - Improved inference speed and reduced API costs for equivalent tasks

Context Handling

Maximum context - 1M tokens across all variants
Practical advantage - Support for full-document analysis and extended reasoning

Accuracy Improvements

Factual errors - 33% reduction across benchmark suites
Reasoning quality - Improvement through thinking variant

Pricing Tiers

GPT-5.4 introduces stratified pricing to serve diverse market segments:

Pro/Thinking variant - Premium tier for complex reasoning
Base variant - Standard production tier
Mini - Mid-tier efficiency option
Nano - Cost-optimized tier with $0.20/$1.25 token pricing