OpenAI and Broadcom have announced the development of “Jalapeño,” a custom AI inference processor designed to optimize performance per watt for large language models. The accelerator, built in a nine-month development cycle, is intended to power gigawatt-scale data centers alongside partners including Microsoft, with deployment expected to begin in 2026. According to OpenAI leadership, the chip is architected specifically for LLM inference rather than general-purpose computing.
Why Is OpenAI Building Its Own Silicon?
OpenAI is moving toward a “full-stack” infrastructure strategy to gain control over the hardware running its models. By designing chips, kernels, and networking systems, the company aims to reduce the cost and latency associated with serving AI to users. According to OpenAI President Greg Brockman, this approach is intended to make intelligence “more abundant” and affordable. Unlike general-purpose GPUs, Jalapeño is a “blank-slate” design, built specifically to handle the memory movement and networking patterns required by frontier LLMs.

How Does Jalapeño Compare to Existing Accelerators?
Early laboratory testing indicates that Jalapeño will offer performance per watt that is “substantially better” than current state-of-the-art accelerators, according to OpenAI. While final benchmarks have not been released, the architecture is designed to minimize data movement, which is often a primary bottleneck in high-performance computing. By balancing compute, memory, and networking resources, the chip aims to achieve realized utilization closer to its theoretical peak than general-purpose alternatives.

| Feature | Standard GPU | Jalapeño (Inference Processor) |
|---|---|---|
| Design Focus | General-purpose AI/Graphics | Specialized LLM Inference |
| Target Outcome | Versatility | Performance per Watt |
| Timeline | Multi-year cycles | 9-month development cycle |
What Role Did AI Play in the Chip’s Development?
The nine-month development cycle for Jalapeño is the fastest ASIC development cycle in high-performance semiconductors to date, according to OpenAI. The company utilized its own AI models to assist in the design and optimization process. This represents a shift in hardware engineering: using software to design the very infrastructure that will eventually host more advanced software. Broadcom provided the silicon implementation and networking technology, specifically utilizing its Tomahawk networking silicon to support the large-scale production requirements.
What Happens to Data Centers in 2026?
The collaboration between OpenAI, Broadcom, and system integrator Celestica is focused on the transition to “gigawatt-scale” data centers. These facilities require specialized hardware to manage the massive power and thermal loads associated with modern AI. According to Broadcom CEO Hock Tan, the partnership is part of a “multi-generation roadmap” that will see the deployment of these custom accelerators starting in 2026. This infrastructure is intended to support not just OpenAI’s internal needs, but the requirements of its broader partner ecosystem, including Microsoft.

Frequently Asked Questions
- Will Jalapeño replace NVIDIA GPUs?
Jalapeño is designed as a specialized inference accelerator. OpenAI’s strategy focuses on building its own custom silicon for specific LLM workloads, though it will continue to operate within a broader compute ecosystem.
- When will these chips be available?
Engineering samples are currently running in labs. Broadcom and OpenAI have stated that large-scale deployment is expected to begin by the end of 2026.
- Why is “performance per watt” so important?
As AI models grow in size, the electricity required to run them becomes a significant operational expense and a physical constraint on data center capacity. Higher performance per watt allows companies to deliver more intelligence without a linear increase in power consumption.
Are you interested in how hardware shifts will change the economics of AI? Subscribe to our newsletter for deep dives into the infrastructure powering the next generation of LLMs.
