qwen3-vl by Alibaba Cloud

Version timeline

Qwen3 VL 235B A22B InstructFlagshipOpen-weight
unknown
Version
—
Context
262K
Parameters
—
License
—
Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...
Qwen3 VL 235B A22B ThinkingOpen-weight
unknown
Version
—
Context
131K
Parameters
—
License
—
Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math....
Qwen3 VL 30B A3B InstructOpen-weight
unknown
Version
—
Context
131K
Parameters
—
License
—
Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...
Qwen3 VL 30B A3B ThinkingOpen-weight
unknown
Version
—
Context
131K
Parameters
—
License
—
Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels...
Qwen3 VL 32B InstructOpen-weight
unknown
Version
—
Context
131K
Parameters
—
License
—
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...
Qwen3 VL 8B InstructOpen-weight
unknown
Version
—
Context
131K
Parameters
—
License
—
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...
Qwen3 VL 8B ThinkingOpen-weight
unknown
Version
—
Context
131K
Parameters
—
License
—
Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and...

Version timeline

About Alibaba Cloud