The Battle of AI Tokens NVIDIA AMD and Intel Performed in MLPerf 6.0.
MLCommons has officially released the results for MLPerf 6.0, the industry-standard benchmark for AI performance. This round focused heavily on Large Language Model (LLM) inference capabilities, featuring popular models such as DeepSeek-R1, GPT-OSS-120B, Llama2-70B, and Qwen3-VL-235B-A22B. The benchmarks saw record participation from leading chipmakers and server manufacturers worldwide.
NVIDIA: Software-Driven Dominance
As the market leader, NVIDIA showcased the sheer scale of its ecosystem and its industry-leading "price-per-token" metrics. The highlight for NVIDIA was a massive software optimization through TensorRT-LLM, which boosted DeepSeek-R1 performance by 2.7x on existing NVIDIA GB300 NVL72 hardware.
These gains were achieved through the NVIDIA Dynamo optimization suite, utilizing advanced techniques such as:
Disaggregated Prefill and Decode: Separating the initial prompt processing from token generation.
Parallelized Mixture-of-Experts (MoE): Increasing throughput for sparse models.
Multi-token Prediction & Intelligent Worker Scheduling: Enhancing efficiency in high-concurrency environments.
AMD: Closing the Gap with MI355X
AMD made a significant statement with its Instinct MI355X accelerator. The chip demonstrated a 3.1x performance increase in running Llama2-70B compared to the previous MI325X. Notably, AMD has now achieved performance parity with the NVIDIA B200, though it still trails the flagship B300.
AMD's ROCm software stack played a crucial role, enabling high-performance FP4 precision, improved inter-node communication, and optimized cluster load balancing. Looking ahead, AMD confirmed that the next-generation MI400 is slated for release later this year.
Intel: An Enterprise Focus with Arc Pro
In a strategic shift, Intel pivoted away from showcasing Gaudi chips to highlight its newly released Arc Pro B70 GPUs. By utilizing a quad-card configuration (4x B70), Intel achieved a combined 128GB of VRAM, sufficient to run the massive GPT-OSS-120B model outperforming the previous B60 by 18%.
Intel is positioning the Arc Pro B70/B65 as the ideal choice for corporate environments, emphasizing enterprise-grade features such as:
ECC Memory Support for data reliability.
Enterprise Management Systems and robust Container Support.
We're clearly seeing the transition from FP8 to FP4 (4-bit Floating Point). Both NVIDIA and AMD are competing to optimize software to run large models on 4-bit without sacrificing accuracy. This results in 2x RAM savings and dramatically faster execution, which is key to reducing the cost per token.
The technique of separating Prefill (receiving instructions) and Decode (discharging the answer) that NVIDIA is introducing is a major trend this year. Normally, these two parts use different resources. Separating their work prevents the servers from "waiting" for each other, reducing latency for users to the second level.
Intel's shift from showcasing Gaudi to Arc Pro reflects its search for the "on-premise AI" market for companies that want to run their own AI in the office. (Edge/Workstation) Instead of leasing expensive cloud services, having 128GB of RAM on a workstation-grade graphics card is a key selling point that allows mid-sized organizations to run 100B+ level models themselves.
The integration of DeepSeek-R1 (a highly regarded Chinese model known for its logic) and Qwen3-VL (a model excelling in image and video processing) into MLPerf 6.0 demonstrates that the world is increasingly prioritizing "smart and cost-effective" models over simply "large" ones.
Intel Reclaims 100% of Irish Chip Fab Buying Out Apollo in a $14.2 Billion Deal.

Comments
Post a Comment