View on GitHub

SoccerDiffusion

Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings

Florian Vahl, Jörn Griepenburg, Jan Gutsche, Jasper Güldenstein, and Jianwei Zhang

University of Hamburg, Germany

📄 Paper 💻 Code 🗃️ Dataset

Abstract

This paper introduces SoccerDiffusion, a transformer-based diffusion model designed to learn end-to-end control policies for humanoid robot soccer directly from real-world gameplay recordings. Using data collected from RoboCup competitions, the model predicts joint command trajectories from multi-modal sensor inputs, including vision, proprioception, and game state. We employ a distillation technique to enable real-time inference on embedded platforms that reduces the multi-step diffusion process to a single step. Our results demonstrate the model’s ability to replicate complex motion behaviors such as walking, kicking, and fall recovery both in simulation and on physical robots. Although high-level tactical behavior remains limited, this work provides a robust foundation for subsequent reinforcement learning or preference optimization methods.

Walking

Fall and Standup

Architecture

image

Dataset

The dataset file is compressed using gzip. You can get the dateset in two ways:

  1. Download and immediately uncompress dataset with:

     wget https://data.bit-bots.de/SoccerDiffusion/dataset/robocup_2024_german_open_2025.sqlite3.gz -O - | gzip -d > robocup_2024_german_open_2025.sqlite3
    
  2. Download the compressed dataset, then uncompress: robocup_2024_german_open_2025.sqlite3.gz and uncompress file manually with:

     gzip -d robocup_2024_german_open_2025.sqlite3.gz
    

File: robocup_2024_german_open_2025.sqlite3.gz
Size: 266 GB
SHA256SUM:

21ba0fe6ff39298f678bb59b2f85e6cfa5351d77d0695f73d9f4bb69a2427d7c

MD5SUM:

ecd6b5a5adeef7a688e281afe7fa91c8

File: robocup_2024_german_open_2025.sqlite3
Size: 340 GB
SHA256SUM:

c39d10b9c5533f8d04a2c58e3d522b2134cda7fe64e9eabca9363c9ebfd2b1e4

MD5SUM:

de6997b4f18e701e3d7730e3e1151ae2

Acknowledgements

We gratefully acknowledge funding and support from the project Digital and Data Literacy in Teaching Lab (DDLitLab) at the University of Hamburg and the Stiftung Innovation in der Hochschullehre foundation. We extend our special thanks to the members of the Hamburg Bit-Bots RoboCup team for their continuous support and for providing data and computational resources. We also thank the RoboCup teams B-Human and HULKs for generously sharing their data for this research. Additionally, we are grateful to the Technical Aspects of Multimodal Systems (TAMS) research group at the University of Hamburg for providing computational resources. This research was partially funded by the Ministry of Science, Research and Equalities of Hamburg, as well as the German Research Foundation (DFG) and the National Science Foundation of China (NSFC) through the project Crossmodal Learning (TRR-169).