The AI investment landscape is undergoing a fundamental shift as corporate “token panic” replaces the initial excitement of the AI infrastructure gold rush. While hyperscalers and labs have seen explosive revenue growth—with Anthropic’s ARR reaching $45 billion in May according to The Economist—a new era of cost-conscious monetization is forcing companies to balance AI spending against strict budget constraints.
Why is the AI “Token Panic” happening now?
The transition from a “growth at all costs” mindset to a focus on efficiency is driven by the realization that AI usage at scale is unexpectedly expensive. As organizations integrated agents and advanced reasoning models, token consumption surged, leading to scenarios where companies—such as the widely discussed example of Uber—burned through entire annual AI budgets in just four months.
This “AI Opex” burden has become a significant boardroom issue. Sam Altman confirmed this shift, noting that while companies were initially comfortable with their spending, cost has suddenly become the second biggest theme. By the first quarter of 2026, the question shifted from “how can we use this” to “my company spent my entire 2026 budget in Q1; can you make this more efficient?”
How are model providers changing their pricing?
To move away from endless subsidies, major providers including OpenAI, Microsoft, Google, and Anthropic have pivoted toward usage-based billing models. This approach ties costs directly to compute and token consumption rather than flat-rate subscriptions.
- April 2: OpenAI transitioned Codex pricing to align with API token usage.
- May 19: Google moved Gemini subscriptions from prompt limits to a “compute-used” model.
- June 1: Microsoft’s GitHub Copilot officially transitioned to usage-based billing.
This shift creates transparency that often reveals hidden costs. For instance, while some newer models maintain the same list price as predecessors, they utilize different tokenizers that can require up to 35% more tokens to process the same amount of text.
Will open-source models challenge the incumbents?
The gap between proprietary frontier models and open-source alternatives is narrowing, particularly regarding cost-efficiency. While models like GPT 5.5 and Opus 4.8 remain leaders in benchmark performance, alternatives from China—such as Qwen 3.7 and Deepseek V4—are positioned at 10x to 25x lower costs.
Deepseek’s V4 Pro and V4 Flash have already seen significant adoption, moving to the top of the charts on OpenRouter for tokens processed since their April release. For many application-layer companies, the strategy is shifting toward post-training on open-source base models to create specialized tools for coding or legal workflows, effectively bypassing the need for the most expensive frontier models when a “good enough” solution provides a better return on investment.
FAQ: Understanding the Current AI Market
Is the AI trade over?
No. Revenues for labs and hyperscalers continue to grow, and frontier models still provide meaningful value in high-stakes fields. However, the market is moving from a phase of speculative infrastructure build-out to a phase of rigorous ROI analysis.
What is the “DRAM Tax”?
This refers to the high cost of memory and compute required to run intensive AI models. As companies become more cost-sensitive, they are prioritizing observability and efficient architecture to avoid these taxes.
Why are companies looking at Chinese AI models?
The primary driver is the massive price disparity. With Chinese models offering comparable performance for specialized tasks at a fraction of the cost, enterprises are weighing those savings against integration and security considerations.
Are you adjusting your portfolio to account for the shift from AI infrastructure to AI efficiency? Share your thoughts in the comments below or track our model portfolio updates to see how we are navigating these thematic shifts.
