News World That's Worth: Alibaba Launches Qwen3-Coder-Next: High-Speed AI Coding with Unbeatable Pricing.

Wednesday, February 4, 2026

Alibaba Launches Qwen3-Coder-Next: High-Speed AI Coding with Unbeatable Pricing.

Alibaba Cloud Unveils Qwen3-Coder-Next: The Lean, High-Performance Open Model for Developers

Alibaba Cloud has officially launched Qwen3-Coder-Next, a specialized Large Language Model (LLM) fine-tuned for programming. Built upon the powerful Qwen3-Next-80B-A3B-Base, this new iteration has been meticulously trained to excel in code generation and complex software development tasks.

The "Self-Correction" Training Advantage

The strength of Qwen3-Coder-Next lies in its training methodology. By leveraging vast repositories of source code alongside verifiable programming challenges, the model utilizes an iterative learning process. This allows the AI to "write, test, and verify" its own code, refining its output until the logic is flawless.

This rigorous training places Qwen3-Coder-Next in direct competition with top-tier open models such as DeepSeek V3.2, GLM-4.7, and Kimi-K2.5.

Unmatched Efficiency: 80B Power, 3B Execution

The true differentiator for Qwen3-Coder-Next is its MoE (Mixture of Experts) architecture. While it is an 80B parameter model, it only activates 3B parameters during inference. This allows it to:

Run on Smaller Servers: Lowering hardware requirements significantly compared to monolithic models.
Instant Response (Non-Thinking): Unlike "Reasoning" or "Thinking" models that require a deliberation phase, Qwen3-Coder-Next provides immediate output, drastically reducing latency for developers.

Disrupting the Market with Lower Costs

The model’s efficiency translates directly into lower operational costs. While providers like Together.AI maintain a similar input price of $0.5 per million tokens, the output pricing tells a different story:

Model	Output Price (per 1M Tokens)
Qwen3-Coder-Next	$1.2
GLM-4.7	$2.0
Kimi-K2.5	$2.8

Alibaba Cloud is making the model accessible to everyone, releasing weights ranging from 4-bit quantized versions for local deployment to the full 16-bit precision model.

The fact that the model is 80B in size but only runs at 3B means it uses "Sparse Activation" technique, which selects only the specialized parts of the AI to answer the question. This saves a huge amount of computing power while maintaining the intelligence of a large-scale model.

By 2026, the AI market will begin to clearly split into two branches: "Thinking" models (such as o1 or DeepSeek-R1) that focus on solving complex but slow mathematical problems, and "Non-Thinking" models like Qwen3-Coder-Next that emphasize real-time coding speed, suitable for programmers who need immediate results while typing (auto-completion).

In addition to its proficiency in programming languages (Python, C++, Java), Qwen is also known for understanding Thai and other Asian languages better than its Western competitors, making it a top choice for developers in the region.

The distribution of 4-bit quantization models means that general developers can run these 80B-level models on consumer-grade graphics cards (such as the RTX 4090 or 5090) directly within their own machines, without constantly relying on the cloud, which enhances the security of company source code.

China Bans Hidden Door Handles Starting 2027: Prioritizing Rescue Access Over Aesthetics

Source - Qwen