VentureBeat July 31, 2023
Google’s DeepMind has announced Robotics Transformer 2 (RT-2), a first-of-its-kind vision-language-action (VLA) model that can enable robots to perform novel tasks without specific training.
Just like how language models learn general ideas and concepts from web-scale data, RT-2 uses text and images from the web to understand different real-world concepts and translate that knowledge into generalized instructions for robotic actions.
When improved, this technology can lead to context-aware, adaptable robots that could perform different tasks in different situations and environments — with far less training than currently required.
What makes DeepMind’s RT-2 unique?
Back in 2022, DeepMind debuted RT-1, a multi-task model that trained on 130,000 demonstrations and enabled Everyday Robots to perform 700-plus tasks with a 97% success...