OpenAI cut GPT-4o mini input pricing from $0.15 to $0.06 per million tokens, a 60% reduction that lands the model below the previous GPT-3.5 floor. Output pricing dropped proportionally. The move follows a pattern: whenever a model tier matures, OpenAI reprices it aggressively to defend volume against open-source alternatives and push developers up-market to frontier models.

The inference cost economics here are worth working through. At $0.06/million input tokens, a production application sending 50 tokens per request and handling 10 million requests per day is paying roughly $30 in input costs daily. The barrier to building latency-tolerant, high-volume AI features at that price point essentially disappears for any funded startup.

What changes: applications that were cost-constrained on GPT-4o mini can now run richer prompting strategies — longer system prompts, more examples, broader context windows — without blowing their unit economics. Classification, extraction, routing, and summarization pipelines that were being squeezed onto cheaper open models for cost reasons just got a reason to revisit the tradeoffs.

What doesn’t change: the latency ceiling of API-based inference. For real-time consumer applications where sub-100ms matters, the cost reduction does not solve the architectural problem. The on-device inference players (Qualcomm, Apple, NVIDIA’s Jetson line) are building for a different use case than where this pricing shift matters.

The broader pattern: every 12-18 months, models that were frontier-tier get repriced into commodity territory. The developers who build on top of frontier models today are building on top of what will be commodity infrastructure in two years. That’s been the right bet consistently.

openaigpt-4o-minipricinginferenceeconomics