Meta’s new architecture helps robots interact in environments they’ve never seen before

Thanks largely to AI, robotics has come a long way in a short period of time, but robots continue to struggle in certain scenarios that they haven’t been trained for and need to adjust to.
This week, Meta said it has overcome some of these major hurdles with its new open-source Video Joint Embedding Predictive Architecture 2 (V-JEPA 2), the first world model trained primarily on video. V-JEPA 2 can predict next actions and respond to environments it hasn’t interacted with before.
“Meta’s recent unveiling of V-JEPA 2 marks a quiet but significant shift in the evolution of AI vision systems, and it’s one enterprise leaders can’t afford to overlook,” said Ankit Chopra, a director at Neo4j. “Built on self-supervised learning and optimized for agentic, low-supervision use, V-JEPA 2 moves beyond the confines of traditional computer vision, introducing a model that is both leaner and more predictive.”
*Trained for predictive tasks on more than 1 million hours of video*
Meta says
|
||||
|
||||
You Might Like |