InternVLA-M1

Shanghai AI Lab (InternRobotics) · 上海人工智能实验室

Embodied / robotics model — runs on robot hardware, not a token API. No per-1M pricing. For API-priced LLMs see the model catalog.

At a glance

CreatorShanghai AI Lab (InternRobotics) (上海人工智能实验室)
ArchitectureVision-Language-Action (VLA)
EmbodimentManipulation

Overview

InternVLA-M1 is a vision-language-action model from Shanghai AI Lab's InternRobotics group — the same lab behind the widely-used InternVL multimodal models. It is one of several embodied checkpoints the lab publishes (alongside InternVLA-A1-3B and the InternVLA-N1 system), extending the InternVL multimodal foundation toward downstream robot manipulation. Open weights on Hugging Face.

What it's used for

Manipulation policy research building on the InternVL multimodal stack; academic embodied-AI benchmarking.

Primary sources

Other Chinese embodied models