Ever wonder why robots sometimes struggle to manipulate objects that humans can pick up with ease? Manipulation tasks need to be abstracted into feature representations before machines can use them to learn policies (i.e., skills), and these representations usually need to be manually predefined — a challenging undertaking in complex tasks involving deformable objects, for instance, or varying material properties.
A viable alternative is deep learning methods, which provide a means for robots to acquire representations autonomously from experience. Toward that end, researchers at Carnegie Mellon University describe in a preprint paper (“Learning Semantic Embedding Spaces for Slicing Vegetables“) a method for combining prior task knowledge and experience-based learning to acquire representations, focusing on the task of cutting cucumbers and tomatoes into slices.
“Learning to slice vegetables is a complex task, as it involves manipulating deformable objects into different shapes as well as the creation of new objects in the form of slices,” the researchers wrote. “Introducing meaningful auxiliary tasks while training allows our model to learn a semantically rich embedding space that encodes useful priors and properties, such as the thickness of the vegetable being cut, in our state representation.”
The team’s experimental setup consists of two 7-DOF Franka Emika Panda Research Arms and a side-mounted Intel RealSense camera that collects raw pixel information from the scene. The right arm — the “holding arm” — is used to pick, place, and hold vegetables being cut on a cutting board using tongs attached to its fingers. Meanwhile, the left arm — the “cutting arm” — grasps a 3D-printed tool holder with a knife it uses to slice the vegetables held by the other arm.
Cutting vegetables into slices of different thickness requires the robot arms to perform multiple varying cutting actions. They first need to detect the end of the vegetables, move up and a certain distance toward the vegetables to make a slice, and perform a cut.
The researchers collected 10 trajectories of humans using the robot arm to perform cutting actions to establish parameters, and used the cutting sequence described above as the main parameter. To create a vegetable slicing data set, they randomly sampled the number of slices to cut at the beginning of each demonstration and recorded the slice thickness for each slice.
Next, the team trained a novel embedding network, which they say enabled their proposed model to capture helpful task-specific attributes. “By introducing the auxiliary task of predicting the thickness of the cut vegetable slice,” they wrote, “we force our embedding network to model object-centric properties important for the task of slicing vegetables.”
So how efficacious was the approach, in the end? In experiments, the researchers say they saw evidence that the learned representations could be generalized across different shapes and sizes, and that they “afford[ed] a rich representation” for learning models for manipulation. “Our [tests] show that the learned model learns a continuous understanding on important attributes such as thickness of the cut slice,” the paper’s authors wrote.