Qwen3.6-35B-A3B Outperforms Gemma 4 in Latest Benchmarks.

- April 17, 2026

Alibaba Cloud Launches Qwen3.6-35B-A3B: The "Lightweight" Coding Powerhouse

Alibaba Cloud has officially introduced Qwen3.6-35B-A3B, a new compact model designed for local deployment. Despite its smaller operational footprint, the model boasts coding capabilities that rival the much larger Qwen3.5-27B, all while requiring significantly less computational power.

Efficiency Meets Performance

The "A3B" in its name signifies that while the model has a 35-billion parameter architecture, it only utilizes 3 billion active parameters during inference. This efficiency is achieved through a Mixture-of-Experts (MoE) architecture, allowing it to deliver high-level performance without the massive hardware overhead.

Benchmark results show that while Qwen3.6-35B-A3B trails slightly behind the Qwen3.5-27B in general tasks, it vastly outperforms the previous Qwen3.5-35B-A3B and significantly outpaces Google’s Gemma 4-31B.

The Pricing Paradox

Alibaba Cloud will offer this model under two different branding and pricing structures:

Qwen3.6-Flash: Marketed as a high-speed cloud service, priced at $0.165 per million input tokens and $0.99 per million output tokens.
Qwen3.6-35B-A3B: Available for dedicated deployment, priced higher at $0.248 per million input tokens and $1.485 per million output tokens.

The discrepancy in pricing for essentially the same model architecture has sparked interest within the developer community, suggesting different optimization or support tiers between the two versions.

The reason this model runs so fast is due to its MoE (Moment of Engineering) architecture. Instead of the computer processing all 35 billion parameters, it only uses the relevant "experts"—about 3 billion parameters resulting in massive energy savings and reduced latency. This is ideal for developers who want to run AI on their own machines (local hosting).

This version of Qwen3.6 is specifically tuned for coding. Its performance, close to larger models like the 27B, demonstrates Alibaba's success in intensively injecting programming knowledge into small-scale models.

Alibaba's pricing of Qwen3.6-Flash lower than the full-code version is likely intended to attract more users to Alibaba Cloud's Serverless API (which offers easier resource management), while the full-code version may be designed for organizations requiring greater stability or more complex fine-tuning.

Outperforming Google's Gemma 4-31B in a smaller scale clearly signals that Qwen is becoming a leader in open-weighted models for technical applications, a category highly valued by developers worldwide today.

Google Chrome Adds Multi-Tab Context and Split-Screen Links.

Source: Qwen

💬 AI Content Assistant

Ask me anything about this article. No data is stored for your question.