Achieving animal-like agility is a longstanding goal in quadrupedal robotics. While recent studies have successfully demonstrated imitation of specific behaviors, enabling robots to replicate a broader range of natural behaviors in real-world environments remains an open challenge. Here we propose an integrated controller comprising a Basic Behavior Controller (BBC) and a Task-Specific Controller (TSC) which can effectively learn diverse natural quadrupedal behaviors in an enhanced simulator and efficiently transfer them to the real world. Specifically, the BBC is trained using a novel semi-supervised generative adversarial imitation learning algorithm to extract diverse behavioral styles from raw motion capture data of real dogs, enabling smooth behavior transitions by adjusting discrete and continuous latent variable inputs. The TSC, trained via privileged learning with depth images as input, coordinates the BBC to efficiently perform various tasks. Additionally, we employ evolutionary adversarial simulator identification to optimize the simulator, aligning it closely with reality. After training, the robot exhibits diverse natural behaviors, successfully completing the quadrupedal agility challenge at an average speed of 1.1 m/s and achieving a peak speed of 3.2 m/s during hurdling. This work represents a substantial step toward animal-like agility in quadrupedal robots, opening avenues for their deployment in increasingly complex real-world environments.
Over hundreds of millions of years, quadrupedal animals have evolved diverse behavioral patterns to cope with survival challenges, allowing them to navigate a wide variety of obstacle-filled environments.
To showcase these animals' remarkable agility, numerous competitions are held globally each year, one of the most renowned being the dog agility at the Crufts dog show, where handlers guide their dogs through different obstacle courses in a race for both time and accuracy.
Inspired by dog agility, we constructed the quadrupedal agility challenge in a 7 × 10 m space, including six types of obstacles: A-frame, bar-jump, poles, seesaw, tire-jump, and tunnel The order of obstacles, as well as their positions and yaw angles, are randomly configured. The robot's task is to start from the beginning, navigate all obstacles in the shortest time, and reach the end without knocking over any obstacles. Our controller relies solely on simple depth perception, without requiring precise global localization, and remains robust to random environmental changes and disturbances in long-horizon tasks. The robot completed the entire agility challenge in 26 seconds, achieving an average speed of 1.1 m/s.
Our method consists of three key parts: Basic Behavior Controller (BBC) training, Evolutionary Adversarial Simulator Identification (EASI), and Task-specific Controller (TSC) training.
The BBC acts like low-level neural regions, using commands and proprioception as inputs to control the robot’s behavior, replicating that of a real dog. Its goal is to generate states that fool the discriminator while remaining distinguishable by the predictor. We leverage a semi-supervised generative adversarial imitation learning architecture, using a small amount of labeled data to guide the disentanglement of the target’s multimodal behaviors. The latent skill variable is used to represent five common dog behavior modes: walk, pace, trot, canter, and jump, and the latent shifting variable is used to model continuous style variations. By maximizing the variational lower bound of the mutual information between the latent variables and state transitions, a single policy is able to learn diverse controllable behaviors.
After training, our controller can generate diverse and realistic behaviors, smoothly switching between different behaviors while accurately responding to command inputs such as speed and body height.
EASI helps bridge the gap between simulation and reality by using limited real-world data. The evolutionary strategy aims to maximize the scores assigned by the discriminator, identifying the optimal simulator parameter distribution to closely align with the real world. The entire parameter search process takes less than 10 minutes and utilizes only 80 seconds of real-world data. Once fine-tuned in the enhanced simulator, the BBC can be deployed directly. Compared to simple domain randomization, EASI achieves joint frequency spectra that more closely match the real world.
The TSC functions like high-level neural regions and is trained within a privileged learning framework to control the BBC for different tasks. The TSC-teacher uses privileged information as input, which is distilled into the TSC-student that relies solely on depth perception.
The robot is trained in parallelized simulations with randomly placed obstacles.
We compared our controller with the Go2 default controller and the state-of-the-art baseline, showing higher success rates and better agility.
With our controller, the robot can quickly jump onto a box of similar height. In the hurdling task, the robot reached a peak speed of 3.2 m/s with natural behaviors. We further tested our controller in outdoor environments, demonstrating its natural and robust multimodal behaviors.
@article{fu2025learning,
title={Learning Diverse Natural Behaviors for Enhancing the Agility of Quadrupedal Robots},
author={Huiqiao Fu and Haoyu Dong and Wentao Xu and Zhehao Zhou and Guizhou Deng and Kaiqiang Tang and Daoyi Dong and Chunlin Chen},
year={2025},
eprint={2505.09979},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2505.09979}
}