Google in talks with Marvell to build new AI chips

by Chief Editor

The Great Silicon Shift: Why Google’s Move Toward Custom AI Chips Changes Everything

For years, the AI gold rush has had one primary arms dealer: Nvidia. Their GPUs have been the undisputed engine behind the explosion of Large Language Models (LLMs). But the wind is shifting. Google’s recent moves to collaborate with Marvell Technology on specialized AI chips signal a broader, more strategic trend: the era of “General Purpose” AI hardware is ending, and the era of bespoke silicon is beginning.

When a giant like Alphabet decides to build its own memory processing units (MPUs) and next-generation Tensor Processing Units (TPUs), it isn’t just about saving a few dollars on hardware. We see about solving the fundamental physics of AI: the “Memory Wall.”

Did you know? Google was one of the first companies to anticipate the AI boom, developing its first TPU as early as 2015 to accelerate the workloads of TensorFlow, its open-source machine learning framework.

Breaking the Memory Wall: The Rise of the MPU

To understand why Google is developing a memory processing unit (MPU), you have to understand the bottleneck. In traditional computing, the processor (the brain) and the memory (the storage) are separate. Data must travel back and forth between them constantly.

From Instagram — related to Google, Nvidia

As AI models grow to trillions of parameters, this “commute” becomes a massive energy drain and a speed killer. This is known as the von Neumann bottleneck. By integrating processing capabilities directly into the memory architecture, Google aims to reduce latency and power consumption exponentially.

This trend toward near-data processing is where the industry is headed. We are seeing a shift from “compute-centric” to “data-centric” architecture. If Google can move the computation to where the data lives, their AI responses will be faster, cheaper, and more sustainable.

The Strategic Play Against Nvidia

Nvidia’s H100s are incredible, but they are designed to be versatile. They can handle a wide range of tasks, which makes them slightly less efficient than a chip designed for one specific purpose. This is where Google’s TPUs gain an edge.

By designing chips specifically for inference—the process of actually running a trained AI model to provide an answer—Google can optimize for cost-per-query. For a company serving billions of Search and Gemini users, a 10% increase in efficiency translates to billions of dollars in saved electricity and infrastructure costs.

Pro Tip: For tech leaders and investors, keep a close eye on “Inference Costs.” While the world focused on the cost of training AI in 2023, the real profit margins in 2025 and beyond will be decided by who can run those models most efficiently.

The Future of AI Inference: Beyond the Data Center

The collaboration with Marvell suggests that the future of AI isn’t just about bigger chips, but smarter ones. We are entering the “Inference Era.” While training a model requires massive clusters of GPUs, running that model (inference) can happen anywhere—from a massive cloud server to a smartphone.

We can expect three major trends to emerge from this shift:

  • Vertical Integration: Like Apple did with its M-series chips, Google is integrating the software (Gemini), the platform (Android/Chrome), and the hardware (TPUs). This creates a “closed loop” of optimization.
  • Energy-Efficient AI: As data centers face scrutiny over power consumption, specialized chips that do more with less wattage will become the only viable way to scale.
  • Domain-Specific Accelerators: We will likely see chips optimized for specific AI tasks—some for image generation, some for logical reasoning, and others for real-time translation.

Real-world examples are already appearing. Amazon is developing its Trainium and Inferentia chips to reduce its reliance on external vendors, mirroring Google’s strategy to protect its margins and maintain control over its supply chain.

How This Affects the Cloud Landscape

For the average business, this hardware war is actually a win. When Google, AWS, and Microsoft compete on silicon, the cost of cloud AI services drops.

Marvell Partners with Google’s ATAP to Support Project Ara – @marvellsemi

Custom silicon allows Google Cloud Platform (GCP) to offer more competitive pricing for AI workloads. If they can provide the same performance as an Nvidia-based cluster but at 60% of the cost, they can aggressively capture market share from other cloud providers.

You can read more about how cloud infrastructure is evolving to support these massive workloads in our deep-dive on data center architecture.

Frequently Asked Questions

What is a TPU?
A Tensor Processing Unit (TPU) is an AI-accelerator application-specific integrated circuit (ASIC) developed by Google specifically to accelerate machine learning workloads.

Why can’t Google just use Nvidia GPUs?
While Nvidia GPUs are powerful, they are general-purpose. Custom silicon allows Google to optimize for their specific software architecture, reducing energy use and increasing speed.

What is the difference between AI Training and Inference?
Training is the process of “teaching” a model using massive datasets. Inference is the process of the model using that knowledge to answer a user’s prompt in real-time.

What does Marvell Technology bring to the table?
Marvell specializes in data infrastructure and semiconductor design, providing the expertise needed to bridge the gap between Google’s architectural vision and actual physical chip production.

Join the Conversation

Do you think custom silicon will eventually produce general-purpose GPUs obsolete, or will Nvidia maintain its crown? Let us know your thoughts in the comments below!

Subscribe for More AI Insights

You may also like

Leave a Comment