Researchers teach robots to solve manipulation problems in seconds

Thursday, 12 June, 2025

When going on a trip, we normally need to pack all the items we need to take with us into a suitcase, making sure everything fits securely without crushing anything fragile. And because we possess strong visual and geometric reasoning skills, this is usually a straightforward problem, even if it may take a bit of fiddling to do it.

To a robot however, this is an extremely complex planning challenge that requires thinking simultaneously about many actions, constraints and mechanical capabilities. Finding an effective solution could take the robot a very long time — if it can even come up with one.

Researchers from MIT and NVIDIA Research have developed a novel algorithm that dramatically speeds up the robot’s planning process. Their approach enables a robot to ‘think ahead’ by evaluating thousands of possible solutions in parallel and then refining the best ones to meet the constraints of the robot and its environment.

Instead of testing each potential action one at a time, like many existing approaches, this new method considers thousands of actions simultaneously, solving multistep manipulation problems in a matter of seconds. The researchers harness the massive computational power of NVIIDIA GPUs to enable this speed-up.

The researchers say that in a factory or warehouse, their technique could enable robots to rapidly determine how to manipulate and tightly pack items that have different shapes and sizes without damaging them, knocking anything over, or colliding with obstacles, even in a narrow space.

“This would be very helpful in industrial settings where time really does matter and you need to find an effective solution as fast as possible,” said MIT graduate student William Shen, lead author of the paper on this technique. “If your algorithm takes minutes to find a plan, as opposed to seconds, that costs the business money.”

Planning in parallel

The researchers’ algorithm is designed for what is called task and motion planning (TAMP). The goal of a TAMP algorithm is to come up with a task plan for a robot, which is a high-level sequence of actions, along with a motion plan, which includes low-level action parameters, like joint positions and gripper orientation, that complete that high-level plan.

To create a plan for packing items in a box, a robot needs to reason about many variables, such as the final orientation of packed objects so they fit together, as well as how it is going to pick them up and manipulate them using its arm and gripper. It must do this while determining how to avoid collisions and achieve any user-specified constraints, such as a certain order in which to pack items.

With so many potential sequences of actions, sampling possible solutions at random and trying one at a time could take an extremely long time.

“It is a very large search space, and a lot of actions the robot does in that space don’t actually achieve anything productive,” added fellow author Caelan Garrett, a senior research scientist at NVIDIA Research.

Instead, the researchers’ algorithm, called cuTAMP, which is accelerated using a parallel computing platform called CUDA, simulates and refines thousands of solutions in parallel. It does this by combining two techniques: sampling and optimisation.

Sampling involves choosing a solution to try. But rather than sampling solutions randomly, cuTAMP limits the range of potential solutions to those most likely to satisfy the problem’s constraints. This modified sampling procedure allows cuTAMP to broadly explore potential solutions while narrowing down the sampling space.

“Once we combine the outputs of these samples, we get a much better starting point than if we sampled randomly. This ensures we can find solutions more quickly during optimisation,” Shen said.

Once cuTAMP has generated that set of samples, it performs a parallelised optimisation procedure that computes a cost, which corresponds to how well each sample avoids collisions and satisfies the motion constraints of the robot, as well as any user-defined objectives. It updates the samples in parallel, chooses the best candidates, and repeats the process until it narrows them down to a successful solution.

Harnessing accelerated computing

“Using GPUs, the computational cost of optimising one solution is the same as optimising hundreds or thousands of solutions,” Shen explained.

When they tested their approach on Tetris-like packing challenges in simulation, cuTAMP took only a few seconds to find successful, collision-free plans that might take sequential planning approaches much longer to solve. And when deployed on a real robotic arm, the algorithm always found a solution in under 30 seconds.

The system works across robots and has been tested on a robotic arm at MIT and a humanoid robot at NVIDIA. Since cuTAMP is not a machine-learning algorithm, it requires no training data, which could enable it to be readily deployed in many situations.

The algorithm is generalisable to situations beyond packing, like a robot using tools. A user could incorporate different skill types into the system to expand a robot’s capabilities automatically.

Image: The robot planning approach considers thousands of possible actions simultaneously, enabling it to rapidly determine how to manipulate and tightly pack items without damaging them, like these blocks. Credit: Courtesy of the researchers.

Researchers teach robots to solve manipulation problems in seconds

Planning in parallel

Harnessing accelerated computing

Motion analysis platform selected as finalist in AI and Robotics Sprint

Konica Minolta and MiR launching two autonomous robots at CeMat

Australian researchers create 'cyborg' beetles

Content from other channels on our network