OpenAI unveiled GPT-5.6 Sol, Terra, and Luna on June 26, 2026 — its most capable model family to date — but launched in a limited preview available only to a small group of government-vetted trusted partners via its API and Codex platform.

Three Models, Three Price Points

The release introduces a tiered lineup: Sol, the flagship model built for complex agentic tasks and deep reasoning; Terra, a balanced everyday option comparable in quality to GPT-5.5 but roughly half the cost; and Luna, a fast, low-latency model for high-volume workloads.

Pricing per million tokens: Sol at $5 input / $30 output, Terra at $2.50 / $15, and Luna at $1 / $6.

On Terminal-Bench 2.1, a benchmark for coding and command-line reasoning, Sol Ultra scored 91.9%, compared with 88.8% for Sol, 88.0% for GPT-5.5, and 84.3% for Luna. OpenAI also reports improvements on biology (GeneBench v1) and cybersecurity evaluations (ExploitBench, ExploitGym).

New Reasoning Modes

Sol introduces a max reasoning-effort setting for deliberate, step-by-step analysis, and an ultra mode that delegates subtasks to parallel subagents to tackle complex multi-step work. Prompt caching has also been improved, with cache writes priced at 1.25× uncached input rates.

Government Oversight Sets a New Precedent

The restricted rollout follows a June 2 executive order from the Trump administration directing AI companies to give the US government early access to frontier models before broader release. OpenAI complied but stated: “We don’t believe this kind of government access process should become the long-term default.”

The precedent now applies to both OpenAI and Anthropic — whose Claude Mythos 5 was suspended for two weeks under a separate emergency directive before partial restoration for critical infrastructure organizations.

Broader access across ChatGPT, Codex, and the API is expected “soon.” A Cerebras-hosted version of Sol capable of up to 750 tokens per second is set to launch in July for select customers.

Safety Investment

OpenAI conducted 700,000 A100-equivalent GPU hours of automated red-teaming before launch. The safety stack covers model-level training refusals, real-time classifiers for cyber and biology misuse, account-level review, and differentiated access monitoring for higher-risk users.