NVIDIA Unveils New Physical AI Tools for Autonomous Vehicles, Robotics, And Vision AI

New physical AI agent skills accelerate the development of autonomous vehicles, robots and vision AI systems

At the Conference on Computer Vision and Pattern Recognition, NVIDIA unveils new AI agent skills targeting autonomous vehicles, robotics and vision AI. Image courtesy of NVIDIA.

Latest Simulate News

Latest Simulate Resources

May 2026 Special Focus: Artificial Intelligence in Design and Simulation

In this Special Focus Issue, learn about the latest developments in the integration of artificial intelligence into engineering workflows.
Design & Simulation Software Guide 2025

In this Special Issue, Digital Engineering presents its second annual guide to design and simulation software vendors.
More Resources

By Kenneth Wong

June 3, 2026

This week, at the Conference on Computer Vision and Pattern Recognition (CVPR, June 3-4, Denver, CO), NVIDIA released NVIDIA COSMOS 3, an open frontier model for physical AI vision reasoning, world generation, and action generation. The announcement says the world foundation model provides core capabilities -- or agent skills -- to advance research and development in autonomous vehicles (AV), robotics, and vision AI.

For AV Research

For AV development, NVIDIA points out that rare interactions, data from unusual road geometry, lighting changes, and edge-case scenarios are difficult to collect, but critical for training and validation. "With NVIDIA autonomous vehicle skills, researchers and developers can task AI agents to automate workflows for scene reconstruction from fleet data and generate synthetic scenarios," says NVIDIA. "Neural Reconstruction skills help AI agents turn fleet-captured data into editable 3D scenes for simulation and synthetic data generation ..."

NVIDIA is also bolstering AV research with NVIDIA AlpaGym, an open source closed-loop reinforcement learning framework, which "extends the approach by connecting policy rollouts and high-fidelity simulation with agent skills, scaling across thousands of GPUs, to help researchers move through setup, rollout and evaluation. NVIDIA OmniDreams, an action-conditioned generative world model, adds photorealistic rendering to the simulation loop, generating camera frames that respond directly to policy actions in real time," the announcement says.

The is augmented by the release of NVIDIA Alpamayo 2 Super, an open 32-billion-parameter reasoning vision language action (VLA) model. In other words, the AI model can reason, plan, and take action across the full driving scenario.

This week, NVIDIA Research released LCDrive. The company says the model "replaces expensive text-based reasoning with compact latent representations, letting autonomous vehicles think faster on embedded hardware."

For Vision AI

With vision AI, NVIDIA points out the challenge is to generate "enough controlled examples to study how models behave when visual conditions, object states or temporal events change."

To that end, the company is releasing New NVIDIA Metropolis skills, designed to help researchers and developers generate synthetic visual scenarios, including anomalies. The agents can also support pseudo-labeling. "These skills benefit from Cosmos 3’s mixture-of-transformers architecture, which uses a reasoning transformer to analyze observations and feed instructions to a generation tower, helping scale physically grounded virtual worlds," NVIDIA says.

Also this week, NVIDIA Research is releasing GraspGen-X, a foundation model for grasping action in robots. a vision-language-action policy trained for a two-finger gripper. "GraspGen-X applies its understanding of geometry and contact to any robotic gripper it encounters. Given the geometry of a new gripper and an unknown object it’s never seen before, the model generates reliable grasp pose proposals to enable the robot to grasp the object," says NVIDIA.

For Robotics

With AI-based robot training, the key to developing navigating or manipulating is in iteration, says NVIDIA. However, it's time-consuming and difficult to build enough controlled environments and policy rollouts to study how robot behavior changes across tasks. Such work typically requires stitching together simulation environments, task variations, policy training, and evaluation by hand, according to NVIDIA.

NVIDIA proposes researchers use the AI agents to automate the common steps in scene preparation, simulation, and robot learning with NVIDIA Omniverse libraries, Isaac Sim and Isaac Lab frameworks. "Agents can help launch simulation sessions, author scenes, control simulation, capture data and validate environments in Isaac Sim, while Isaac Lab skills support reinforcement learning setup, training, evaluation and custom environment development," the company says.

This week, NVIDIA Research released NitroGen, described as "a generalized gameplay AI foundation model that harnesses the NVIDIA Isaac GR00T robot foundation model architecture to help train embodied agents in virtual environments across tens of thousands of hours of interaction."

More about NVIDIA

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and…

Cut Retrieval-Augmented Generation (RAG) Hallucinations by 50%

Most teams hit the same wall with enterprise AI: LLMs that hallucinate, pipelines that don’t scale, and infrastructure that’s harder to design than the models themselves.

Latest in NVIDIA

Latest in Physical AI

About Kenneth Wong

Kenneth Wong is Digital Engineering's resident blogger and senior editor. Email him at [email protected] or share your thoughts or suggestions at digitaleng.news/facebook.

Follow DE
on Facebook
on Linkedin

NVIDIA Unveils New Physical AI Tools for Autonomous Vehicles, Robotics, And Vision AI