CHOICE: Coordinated Human-Object Interaction in Cluttered Environments for Pick-and-Place Actions

Jintao Lu1, He Zhang2, Yuting Ye3, Takaaki Shiratori3, Sebastian Starke3, Taku Komura1,
1University of Hong Kong 2Tencent Robotics X 3Meta Reality Lab

Abstract

Animating human-scene interactions such as pick-and-place tasks in cluttered, complex layouts is a challenging task, with objects of a wide variation of geometries and articulation under scenarios with various obstacles. The main difficulty lies in the sparsity of the motion data compared to the wide variation of the objects and environments as well as the poor availability of transition motions between different tasks, increasing the complexity of the generalization to arbitrary conditions. To cope with this issue, we develop a system that tackles the interaction synthesis problem as a hierarchical goal-driven task. Firstly, we develop a bimanual scheduler that plans a set of keyframes for simultaneously controlling the two hands to efficiently achieve the pick-and-place task from an abstract goal signal such as the target object selected by the user. Next, we develop a neural implicit planner that generates guidance hand trajectories under diverse object shape/types and obstacle layouts. Finally, we propose a linear dynamic model for our DeepPhase controller that incorporates a Kalman filter to enable smooth transitions in the frequency domain, resulting in a more realistic and effective multi-objective control of the this system can produce a wide range of natural pick-and-place movements with respect to the geometry of objects, the articulation of containers and the layout of the objects in the scene.

Overview

We split the interaction synthesis task into three major sub-tasks:

  • Hand-object trajectory planning, managed by the trajectory planning sub-system (brown) shown in Fig. 2 and Sec. 4.
  • Bimanual interaction scheduling, managed by the goal matching sub-system (grey) and the goal coordination (cyan) shown in Fig. 2 and Sec. 5.
  • Fully-body motion control, managed by the goal-driven control sub-system (blue) shown in Fig. 2 and Sec.6.

When the user clicks the object to pick or place, the path to walk towards the object is first planned, then the series of keyframes for the key joints (left/right wrist and hip) based on contacts are scheduled and finally, collision-free trajectories of the arms to conduct the motion is computed. The DeepPhase-based controller then synthesizes the motion to follow the scheduled sub-goals and planned trajectories in real-time in the phase manifold.



Long-term Interaction Performing under Diverse Interaction Targets




Bimanual Operations Adaptive to Diverse Layouts






Dataset

We extensively captured and post-processed 150 motion sequences using the Vicon Shógun 1 motion capture system. The dataset comprises long-term, delicate interactions in lifelike scenes, such as a corridor with a drinking bar, and a well-equipped kitchen room. Our dataset will be released together with our system implementation.