Accepts image + text inputs and reasons over them. Chinese multimodal leaders: Qwen-VL, GLM-4V, InternVL. Typical uses: OCR, screenshot understanding, chart / diagram reasoning.
No multimodal (vision) models indexed yet.
We're actively cataloguing Chinese multimodal (vision) models. In the meantime try our full model index or submit one you think we're missing.