Vision-Language-Action (VLA) and embodied foundation models from Chinese labs and robotics companies. These run on robots, not behind a token API — so this is a research-and-capability map, not a pricing comparison.
Not an API catalog. Unlike the LLM API models, these are open-weight or hardware-bound research models. You download weights and run them on robot hardware — there is no per-token pricing. Every spec below links to its primary source.
Open VLA from the humanoid-hardware leader, tuned for embodied control.
VLA from the InternVL team's embodied research line.
Narrow-remit VLAs (grasping, tracking, retail) behind the G1 robot.
Hardware-model-data framework for high-DoF dexterous manipulation.
Tencent Hunyuan's entry into embodied foundation models.