TUMTraf V2X Cooperative Perception Dataset

1Technical University of Munich, 2Fraunhofer IVI

Overview

TUMTraf-V2X is the first high-quality real-world V2X dataset for the cooperative 3D object detection and tracking task in autonomous driving.

It contains:
  • data collected by 9 sensors simultaneously from onboard and roadside sensors.
  • 2,000 labeled point clouds and 5,000 labeled images.
  • 30k 3D bounding boxes with track IDs.
  • Challenging scenarios: near-miss and traffic violation events, overtaking and U-turn maneuvers.
  • HD maps of the driving domains.
  • Labels in OpenLABEL standard.
  • A dev kit to load, preprocess, visualize, convert labels, and to evaluate perception methods.

Abstract

Cooperative perception offers several benefits for enhancing the capabilities of autonomous vehicles and improving road safety. Using roadside sensors in addition to onboard sensors increases reliability and extends the sensor range. They offer a higher situational awareness for automated vehicles and prevent occlusions. We propose CoopDet3D, a cooperative multi-modal fusion model, and TUMTraf-V2X, a perception dataset, for the cooperative 3D object detection and tracking task. Our dataset contains 2,000 labeled point clouds and 5,000 labeled images from five roadside and four onboard sensors. It includes 30k 3D boxes with track IDs and precise GPS and IMU data. We labeled eight categories and covered occlusion scenarios with challenging driving maneuvers, like traffic violations, near-miss events, overtaking, and U-turns. Through multiple experiments, we show that our CoopDet3D camera-LiDAR fusion model achieves an increase of +14.36 3D mAP compared to a vehicle camera-LiDAR fusion model. Finally, we make our model, dataset, labeling tool, and development kit publicly available to advance in connected and automated driving.

Sensor Setup

On the infrastructure, the following roadside sensors were used:                                                                                                                              
  • 1x Ouster LiDAR OS1-64 (generation 2), 64 vertical layers, 360° FOV,
    below horizon configuration, 10 cm accuracy @120 m range
  • 4x Basler ace acA1920-50gc, 1920×1200, Sony IMX174 with 8 mm lenses

On the vehicle, the following onboard sensors were used:
                                                                     
  • 1x Robosense RS-LiDAR-32, 32 vert. layers, 360° FOV, 3 cm accuracy @200 m range
  • 1x Basler ace acA1920-50gc, 1920×1200, Sony IMX174 with 16 mm lens
  • 1x Emlid Reach RS2+ multi-band RTK GNSS receiver
  • 1x XSENS MTi-30-2A8G4 IMU
  infrastructure_sensors

Visualization of roadside sensors used to record the TUMTraf-V2X Cooperative Perception Dataset from infrastructure perspective.

Dataset Labeling

Dataset Statistics

Benchmark

Config BEV mAP 3D mAP
Domain Modality   Easy Moderate Hard Average
Vehicle Camera 46.83 31.47 37.82 30.77 30.36
Vehicle LiDAR 85.33 85.22 76.86 69.04 80.11
Vehicle Camera + LiDAR 84.90 77.60 72.08 73.12 76.40
Infrastructure Camera 61.98 31.19 46.73 40.42 35.04
Infrastructure LiDAR 92.86 86.17 88.07 75.73 84.88
Infrastructure Camera + LiDAR 92.92 87.99 89.09 81.69 87.01
Cooperative Camera 68.94 45.41 42.76 57.83 45.74
Cooperative LiDAR 93.93 92.63 78.06 73.95 85.86
Cooperative Camera + LiDAR 94.22 93.42 88.17 79.94 90.76
Evaluation results (BEV mAP and 3D mAP) of CoopDet3D on our
TUMTraf-V2X Cooperative Perception test set in south2 FOV.

Acknowledgements

Our CoopDet3D model is build on top of BEVFusion and PillarGrid. This research was supported by the Federal Ministry of Education and Research in Germany within the AUTOtech.agil project, Grant Number: 01IS22088U.

BibTeX

@article{zimmer2024tumtraf,
    title={TUMTraf V2X Cooperative Perception Dataset},
    author={Zimmer, Walter and Wardana, Gerhard Arya and Sritharan, Suren and Zhou, Xingcheng and Song, Rui and Knoll, Alois},
    journal={arXiv preprint arXiv:2403.01316},
    year={2024}
}