In order to track dynamic objects in a robot's environment, one must first segment the scene into a collection of separate objects. Most real-time robotic vision systems today rely on simple spatial relations to segment the scene into separate objects. However, such methods fail under a variety of real-world situations such as occlusions or crowds of closely-packed objects. We propose a probabilistic 3D segmentation method that combines spatial, temporal, and semantic information to make better-informed decisions about how to segment a scene. We begin with a coarse initial segmentation. We then compute the probability that a given segment should be split into multiple segments or that multiple segments should be merged into a single segment, using spatial, semantic, and temporal cues. Our probabilistic segmentation framework enables us to significantly reduce both undersegmentations and oversegmentations on the KITTI dataset while still running in real-time. By combining spatial, temporal, and semantic information, we are able to create a more robust 3D segmentation system that leads to better overall perception in crowded dynamic environments.
Publications
An extended version of the paper with a more detailed derivation of the method can be found here: [Extended version]
Unfortunately we are unable to make the code for this work available at this time.
If you have any further questions about the segmentation method, please email me at davheld -at- cs -dot- stanford -dot- edu.
Evaluation
To help others who are attempting to evaluate their own segmentation methods and compare to our method, we recommend to read this document, which further clarifies our evaluation method: [Evaluation] and download these files, which contain the output of our method and some Matlab files to process them.
Videos
Here you can see a visualization of our segmentation results:
Bibtex
@INPROCEEDINGS{Held-RSS-16,
AUTHOR = {David Held AND Devin Guillory AND Brice Rebsamen AND Sebastian Thrun AND Silvio Savarese},
TITLE = {A Probabilistic Framework for Real-time 3D Segmentation using Spatial, Temporal, and Semantic Cues},
BOOKTITLE = {Proceedings of Robotics: Science and Systems},
YEAR = {2016},
}
FAQ
Q: When computing the position probability (equation 10 in the original paper or equation 21 in the extended version), what do you use for the mean and covariance of the velocity if the data association indicates that we are matching to two previous segments?
A: In this case, we currently have a heuristic of just using the velocity distribution from the larger of the two previous segments. Of course, other approaches can be used here.
Questions?
If you have any further questions about our method, please email me at davheld -at- cs -dot- stanford -dot- edu.