Chinese multimodal (vision) models

Accepts image + text inputs and reasons over them. Chinese multimodal leaders: Qwen-VL, GLM-4V, InternVL. Typical uses: OCR, screenshot understanding, chart / diagram reasoning.

No multimodal (vision) models indexed yet.

We're actively cataloguing Chinese multimodal (vision) models. In the meantime try our full model index or submit one you think we're missing.

Other capabilities