Qwen 3.5 Debuts with Massive Reinforcement Learning Upgrades and Native Multimodality.

- February 17, 2026

Alibaba Qwen 3.5 Debuts: Massive 1-Million Token Context and Disruptive Pricing

The Qwen team at Alibaba has officially launched Qwen 3.5, the latest generation of its Large Language Model (LLM). This new iteration introduces native multimodal capabilities, allowing the model to process image inputs directly. Furthermore, the proprietary version of the model now supports a massive 1-million token context window, all while significantly slashing operational costs.

The Efficiency of MoE Architecture

The flagship open-weight release, Qwen 3.5-397B-A17B, showcases a remarkable feat of engineering. Although the model houses a total of 397 billion parameters, its Mixture-of-Experts (MoE) architecture ensures that only 17 billion parameters are activated during inference.

Advanced Training: The model was refined through intensive Reinforcement Learning (RL), with expanded task categories specifically designed for RL-based optimization.
Open Model Capacity: This version supports an input capacity of up to 263,100 tokens.

Disruptive Pricing and the "Plus" Variant

Alibaba is positioning Qwen 3.5 as a high-performance yet affordable solution. The Qwen 3.5-397B-A17B is priced at:

Input: $0.6 per million tokens
Output: $3.6 per million tokens

Additionally, the team introduced the proprietary Qwen 3.5 Plus (2026-02-15). The standout feature of this closed-source model is its ability to handle a 1-million token context, placing it in direct competition with top-tier global models for long-form data processing.

The 397B-A17B architecture signifies a model with a massive "brain" but intelligent enough to selectively select only a few relevant "experts" (17B) to answer questions. This saves enormous computing power while maintaining the knowledge level of a 400B model.

Alibaba's strong emphasis on RL in this version reflects the 2026 trend where models improve not just through "reading a lot" (pre-training), but through "reasoning" via trial and error. This makes Qwen 3.5 exceptionally strong in coding and mathematics tasks.

Unlike previous versions that often simply "pasted" visual models onto language models, Qwen 3.5 uses Native Visual Processing, meaning it understands the connections between images and text more deeply. Examples include analyzing complex stock charts or understanding engineering diagrams.

The increase to 1 million tokens in the Plus version is a direct declaration of war on Gemini 1.5 Pro and Claude 3.5, allowing users to "throw" entire documents into it. Or, you can input the entire project's codebase so the AI can summarize or continue working immediately.

Inside the Crypto Powered Markets of Modern Slavery.

Source: Qwen

Search This Blog

News World That's Worth