This week on the GeekWire Podcast: Newly unsealedΒ court documents reveal the behind-the-scenes history of Microsoft and OpenAI, including a surprise: Amazon Web Services was OpenAIβs original partner. We tell the story behind the story, explaining how it all came to light.
AIM Intelligent Machines (AIM), a Seattle-area startup developing software that lets bulldozers and excavators operate on their own,Β announced $4.9 million in new contracts with the U.S. Air Force to build and repair military bases and airfields.
Founded in 2021, AIM got its start in mining and construction, and is now expanding to defense applications. AIMβs technology works with existing equipment and is designed for dangerous or hard-to-reach places, including areas where equipment might be dropped in by parachute. One person can remotely manage an entire site of working vehicles.
For airfield repairs, the companyβs tech can scan the area using sensors to create a 3D map of damage. Then autonomous machines clear debris and can repair the runway β all remotely and without people on the ground. Military advisors say the approach could speed up construction, reduce risk to personnel, and make it easier to deploy equipment in tough conditions.
Founded in 2021 and led by longtime engineers, AIM raised $50 million last year from investors including Khosla Ventures, General Catalyst, Human Capital. The company is led by CEOΒ Adam Sadilek, who previously spent nine years at Google working on confidential projects.
In a LinkedIn post this week, Sadilek wrote that βweβre asking the wrong questions about AI and work,β arguing that automation will enable construction companies to build more with their existing teams.
βThe top line grows, but the bottom line doesnβt get βoptimizedβ into oblivion,β he wrote. βFor example, each autonomous dozer we deploy uncovers, depending on the mineral type and current market price, between $3 million and $17 million in additional ore each season. Rather than replacing people, that gives them leverage. And yes, cost savings show up β fuel, maintenance, wear β but theyβre not the main event.β
He added: βInstead of focusing on whether AI removes jobs, we should be focusing on whether weβll use it to finally do more of the things weβve always wanted but never had enough capacity to build.β
Google DeepMind has built a new video-game-playing agent called SIMA 2 that can navigate and solve problems in a wide range of 3D virtual worlds. The company claims itβs a big step toward more general-purpose agents and better real-world robots.Β Β Β
Google DeepMind first demoed SIMA (which stands for βscalable instructable multiworld agentβ) last year. But SIMA 2 has been built on top of Gemini, the firmβs flagship large language model, which gives the agent a huge boost in capability.
The researchers claim that SIMA 2 can carry out a range of more complex tasks inside virtual worlds, figure out how to solve certain challenges by itself, and chat with its users. It can also improve itself by tackling harder tasks multiple times and learning through trial and error.
βGames have been a driving force behind agent research for quite a while,β Joe Marino, a research scientist at Google DeepMind, said in a press conference this week. He noted that even a simple action in a game, such as lighting a lantern, can involve multiple steps: βItβs a really complex set of tasks you need to solve to progress.β
The ultimate aim is to develop next-generation agents that are able to follow instructions and carry out open-ended tasks inside more complex environments than a web browser. In the long run, Google DeepMind wants to use such agents to drive real-world robots. Marino claimed that the skills SIMA 2 has learned, such as navigating an environment, using tools, and collaborating with humans to solve problems, are essential building blocks for future robot companions.
Unlike previous work on game-playing agents such as AlphaZero, which beat a Go grandmaster in 2016, or AlphaStar, which beat 99.8% of ranked human competition players at the video game StarCraft 2 in 2019, the idea behind SIMA is to train an agent to play an open-ended game without preset goals. Instead, the agent learns to carry out instructions given to it by people.
Humans control SIMA 2 via text chat, by talking to it out loud, or by drawing on the gameβs screen. The agent takes in a video gameβs pixels frame by frame and figures out what actions it needs to take to carry out its tasks.
Like its predecessor, SIMA 2 was trained on footage of humans playing eight commercial video games, including No Manβs Sky and Goat Simulator 3, as well as three virtual worlds created by the company. The agent learned to match keyboard and mouse inputs to actions.
Hooked up to Gemini, the researchers claim, SIMA 2 is far better at following instructions (asking questions and providing updates as it goes) and figuring out for itself how to perform certain more complex tasks.Β Β
Google DeepMind tested the agent inside environments it had never seen before. In one set of experiments, researchers asked Genie 3, the latest version of the firmβs world model, to produce environments from scratch and dropped SIMA 2 into them. They found that the agent was able to navigate and carry out instructions there.
The researchers also used Gemini to generate new tasks for SIMA 2. If the agent failed, at first Gemini generated tips that SIMA 2 took on board when it tried again. Repeating a task multiple times in this way often allowed SIMA 2 to improve by trial and error until it succeeded, Marino said.
Git gud
SIMA 2 is still an experiment. The agent struggles with complex tasks that require multiple steps and more time to complete. It also remembers only its most recent interactions (to make SIMA 2 more responsive, the team cut its long-term memory). Itβs also still nowhere near as good as people at using a mouse and keyboard to interact with a virtual world.
Julian Togelius, an AI researcher at New York University who works on creativity and video games, thinks itβs an interesting result. Previous attempts at training a single system to play multiple games havenβt gone too well, he says. Thatβs because training models to control multiple games just by watching the screen isnβt easy: βPlaying in real time from visual input only is βhard mode,ββ he says.
In particular, Togelius calls out GATO, a previous system from Google DeepMind, whichβdespite being hyped at the timeβcould not transfer skills across a significant number of virtual environments.Β Β
Still, he is open-minded about whether or not SIMA 2 could lead to better robots. βThe real world is both harder and easier than video games,β he says. Itβs harder because you canβt just press A to open a door. At the same time, a robot in the real world will know exactly what its body can and canβt do at any time. Thatβs not the case in video games, where the rules inside each virtual world can differ.
Others are more skeptical. Matthew Guzdial, an AI researcher at the University of Alberta, isnβt too surprised that SIMA 2 can play many different video games. He notes that most games have very similar keyboard and mouse controls: Learn one and you learn them all. βIf you put a game with weird input in front of it, I donβt think itβd be able to perform well,β he says.
Guzdial also questions how much of what SIMA 2 has learned would really carry over to robots. βItβs much harder to understand visuals from cameras in the real world compared to games, which are designed with easily parsable visuals for human players,β he says.
Still, Marino and his colleagues hope to continue their work with Genie 3 to allow the agent to improve inside a kind of endless virtual training dojo, where Genie generates worlds for SIMA to learn in via trial and error guided by Geminiβs feedback. βWeβve kind of just scratched the surface of whatβs possible,β he said at the press conference. Β