NVIDIA Releases Nemotron-3 Super with Full Data Transparency.

- March 11, 2026

NVIDIA Unveils Nemotron-3 Super: A 120B Open Model Optimized for Blackwell with Record-Breaking Efficiency

NVIDIA has introduced the latest addition to its open-model family: Nemotron-3 Super. This 120B-A12B architecture is specifically engineered for high-velocity inference, particularly when paired with the NVIDIA Blackwell GPU architecture, making it a formidable tool for enterprise-level deployments.

Redefining High-Performance Inference

While its size is comparable to industry giants like GPT-OSS-120B and Qwen3.5-122B, Nemotron-3 Super sets itself apart through several cost-saving and performance-enhancing techniques:

Native NVFP4 Training: Trained directly using low-precision floating-point formats supported by Blackwell. This ensures that the model maintains high accuracy even when running in FP4 mode.
Latent MoE: Implements token compression prior to Expert selection within the Mixture-of-Experts (MoE) architecture.
Multi-Token Prediction (MTP): Capable of generating multiple tokens simultaneously without needing to feed each output token back as an input immediately.
Hybrid Mamba-Transformer Architecture: Combines the resource efficiency of Mamba with the proven capabilities of Transformers, similar to the approach seen in IBM’s Granite 4.0.

Open Ecosystem and Deployment

NVIDIA has taken a transparent approach by releasing the complete training datasets (including both curated and synthetic data) and the full training pipeline. The model is compatible with NVIDIA’s software stack, such as NeMo RL and NeMo Evaluator, and supports immediate fine-tuning via Unsloth.

While several API providers like Cloudflare, DeepInfra, and Lightning.AI have begun offering Nemotron-3 Super, Inference.net is currently the only provider supporting the full 1-million-token context window, albeit at a premium price point.

NVIDIA's decision to train its models with NVFP4 from the start is a key advantage. Normally, quantizing models from FP16 to FP4 reduces intelligence, but the Nemotron-3 Super is designed to "think FP4" from the beginning, resulting in dramatically faster processing speeds while maintaining the same intelligence on the Blackwell chip.

This hybrid architecture addresses the bottleneck of traditional Transformer architectures, which often consume massive amounts of VRAM with long contexts. The integration of Mamba allows models to remember data from much further back using fewer resources, making support for 1M Context Windows commercially viable.

NVIDIA's disclosure of the use of AI-generated datasets (synthetic data) and the entire process outline sets a new standard for transparency. This makes it easier for enterprise developers to "backtrack" and check for bias or data security, crucial for legal and financial applications.

Unsloth support means independent developers or mid-sized companies can fine-tune massive 120B models using only a few graphics cards. It significantly lowers the price barrier for hardware.

Cyber Breach at the FBI New York Server Hack Threatens to Unmask the Uncensored Epstein Files

Source: NVIDIA

💬 AI Content Assistant

Ask me anything about this article. No data is stored for your question.