
Physical AI is technology where AI analyzes and makes decisions based on information obtained from a robot’s sensors and cameras, and external systems. This enables a robot to flexibly carry out complex movements that correspond to AI analysis.
Core to the Physical AI concept are two foundational AI models: VLM (Vision-Language Model) and VLA (Vision-Language-Action).
A VLM is an AI model that understands visual inputs—such as camera images and sensor data—and interprets what actions should be taken. A VLM breaks high-level, abstract instructions into multiple sub-tasks.
A VLA model receives these sub-tasks and converts them into concrete robotic actions. By utilizing visual feedback, a VLA model continuously optimizes motion trajectories and object manipulations in real time, executing them as seamless, continuous actions.
Physical AI requires advanced decision-making, which inevitably leads to increasingly large AI models. That means a single robot cannot process all computations on its own. And while cloud-based processing provides high computational performance, it brings latency and variability in communications.
How is SoftBank Corp. (TOKYO: 9434) surmounting these obstacles to bring Physical AI to the real world? Learn more in this post from the SoftBank Research Institute of Advanced Technology.
(Posted on March 31, 2026)
by SoftBank News Editors


