The Shifting Landscape: Why Inference Is the New Frontier of AI
For years, the artificial intelligence narrative was dominated by the massive compute requirements of the training phase. However, as of late last year, industry analysts have identified a pivot: inference workloads are rapidly becoming the primary driver of AI computing power. As organizations move from experimental pilots to full-scale production, the demand for processors capable of running these applications efficiently is surging.
Deloitte reports that inference is expected to account for two-thirds of AI computing power this year, rising from 50% in 2025. This shift has created a massive market opportunity, with estimates suggesting the sector for inference-focused AI chips could reach $50 billion this year alone. Meanwhile, projections from other industry observers indicate that AI inference workloads in data centers could climb significantly by 2030, marking a compound annual growth rate of 35%.
The Race for Hardware Efficiency
The race to capitalize on this growth is intense. Established semiconductor giants, including Nvidia, Advanced Micro Devices, Broadcom, and Intel, are all vying to engineer the most cost-effective processors for data centers and edge computing. Despite this heavy competition, market observers are increasingly looking toward Arm Holdings as a pivotal player in the inference era.
Unlike training, which is incredibly compute-intensive, inference can often be handled by a central processing unit (CPU). Arm’s focus on energy-efficient architecture has made it a preferred partner for both consumer electronics and enterprise chipmakers. Nvidia, for instance, utilizes Arm’s architecture for its Grace server CPU and its newer Vera CPU, the latter of which is designed to support agentic AI applications. Nvidia has begun delivering these CPUs to major organizations including Anthropic, SpaceX, Oracle, and OpenAI.
Why Arm’s Business Model Stands Out
Arm’s position in the ecosystem is unique because it acts as a “pick-and-shovel” provider. By licensing its intellectual property (IP) to a diverse range of companies—from hyperscalers like Google and Amazon to custom chip designers like Broadcom—Arm ensures it is involved in the success of the broader AI market rather than relying on a single product line.
The Revenue Scaling Strategy
Arm’s financial model is built on two primary pillars: licensing fees and royalties. Crucially, the royalty rate for its latest Armv9 architecture is nearly double that of its predecessor. This, combined with a move into developing its own silicon, has created a robust growth trajectory. The company anticipates its royalty revenue will increase at a compound annual growth rate of 20% between fiscal 2026 and 2031.
Frequently Asked Questions
- What is the difference between AI training and inference?
Training is the process of teaching an AI model using large datasets, while inference is the process of the model using that “knowledge” to make predictions or decisions in real-time. - Why is Arm Holdings considered a key player in inference?
Arm’s architecture is highly energy-efficient, making it ideal for the massive scale required for inference in both data centers and edge devices like smartphones. - How does Arm make money?
Arm generates revenue through upfront licensing fees for its chip designs and ongoing royalties from every chip sold that utilizes its architecture.
The semiconductor industry is evolving rapidly as AI integration moves into the mainstream. Are you looking to understand how these hardware shifts affect your portfolio? Subscribe to our newsletter for deep-dive analysis on the companies powering the next generation of computing.
