MODEL
GPT-Realtime-2
Overview
GPT-Realtime-2 is OpenAI‘s GPT-5-class real-time voice model released to the API in May 2026. It brings GPT-5-level reasoning into a live conversational voice interface, priced by the token rather than by the minute.
Timeline
- 2026-05-11-AI-Digest — OpenAI releases GPT-Realtime-2 to the API alongside GPT-Realtime-Translate and GPT-Realtime-Whisper. GPT-Realtime-2 is priced at $32/1M audio input tokens and $64/1M audio output tokens — the highest-priced of the three and the only one billed by the token rather than by the minute. Use cases flagged by OpenAI: voice-to-action, systems-to-voice, and live voice-to-voice conversation. The token billing model (vs per-minute for Translate and Whisper) reflects the model’s higher reasoning density and differentiates it from the utility-bandwidth use cases the other two models serve.
Key Developments
-
GPT-5-Class Reasoning in Voice: First production-ready API model from OpenAI to bring GPT-5-level reasoning into a real-time voice interface, closing the historical tradeoff between voice latency and reasoning depth.
-
Token-Based Voice Billing: The $32/1M input / $64/1M output token billing is the production signal that OpenAI is treating GPT-Realtime-2 as a reasoning model with a voice interface, not a transcription service — pricing parity with the logic that high-reasoning density warrants token-level granularity.
Related
See also: OpenAI, GPT-Realtime-Translate, GPT-Realtime-Whisper, MOC - Developer Tools.