Intel’s Arc Pro B70 GPU Delivers Significant AI Inference Gains, Software Updates Boost Existing Hardware
Intel has announced substantial performance improvements in AI inference with its latest Arc Pro B70 GPU, as demonstrated by new results from the MLPerf Inference v6.0 benchmark suite. The Arc Pro B70 shows up to an 80% performance increase compared to its predecessor, the Arc Pro B60, when paired with Intel’s Xeon 6 processors. These gains position Intel as a more competitive player in the rapidly evolving professional AI computing market.
The benchmarks, reported by Wccftech and based on Intel’s official release, were conducted using a four-GPU system configuration featuring the Arc Pro B70 and Arc Pro B65, equipped with a combined 128GB of VRAM. This configuration successfully handled large language models (LLMs) with up to 120 billion parameters – a key metric for increasingly complex AI applications. The system also demonstrated a 1.6x increase in KV cache capacity compared to competing solutions in multi-GPU mode, enhancing its ability to process lengthy text sequences.
Here’s a summary of the MLPerf v6.0 inference benchmark results for the GPT-OSS-120B model:
- 4 x Arc Pro B70 (128 GB): Offline: 1536.90 Tokens/s | Server: 951.67 Tokens/s
- 4 x Arc Pro B60 Dual (192 GB): Offline: 1601.91 Tokens/s | Server: 884.24 Tokens/s
- 4 x Arc Pro B60 (96 GB): Offline: 841.04 Tokens/s | Server: 452.19 Tokens/s
Performance on the Llama2-70B model was as follows:
- 4 x Arc Pro B70 (128 GB): Offline: 2459.18 Tokens/s | Server: 1698.57 Tokens/s
- 4 x Arc Pro B60 Dual (192 GB): Offline: 3270.66 Tokens/s | Server: 2199.50 Tokens/s
- 4 x Arc Pro B60 (96 GB): Offline: 1697.66 Tokens/s | Server: 1106.26 Tokens/s
And for the smaller Llama3.1 8B model:
- 4 x Arc Pro B60 Dual (192 GB): Offline: 52.83 Tokens/s | Server: 49.17 Tokens/s
- 4 x Arc Pro B70 (128 GB): Offline: 36.07 Tokens/s | Server: 32.58 Tokens/s
- 4 x Arc Pro B60 (96 GB): Offline: 26.15 Tokens/s | Server: 24.57 Tokens/s
Intel emphasizes that these performance gains aren’t solely attributable to the new GPU. The company’s Xeon 6 processors, featuring integrated AMX and AVX-512 acceleration engines, contribute up to a 90% generational performance leap. This highlights the importance of a holistic hardware approach to AI inference.
Notably, Intel is also delivering performance improvements to existing hardware through software optimization. Users of the Arc Pro B60 GPU can achieve up to an 18% performance increase simply by updating their software, demonstrating the value of a continuously refined software stack.
The release of the Arc Pro B70 comes amid intense competition in the AI and data center computing market, with NVIDIA recently launching its own high-performance platforms. This competition is also influencing global market dynamics and raising regulatory considerations, as highlighted by NVIDIA’s recent concerns regarding export policies.
Context: Understanding MLPerf Inference
MLPerf Inference is a widely recognized benchmark suite used to measure the performance of AI inference systems. It provides a standardized way to compare different hardware and software configurations across a range of AI models and workloads. Results are submitted by various companies and independently verified, offering a transparent view of performance capabilities.

The Arc Pro B70’s ability to handle LLMs with 120 billion parameters opens up possibilities for more complex AI applications, offering a viable option for developers and businesses requiring powerful inference solutions.
Will Intel’s combined hardware and software approach be enough to significantly challenge NVIDIA’s dominance in the professional AI computing space?
