We could not release the code ourselves, but Rui Zhou from RWTH Aachen reimplemented it as part of his thesis and kindly made it available at this GitHub repo. Thanks a lot to him! Please consider that this is not the original code used for this ICCV paper, and we are not responsible for it. Nevertheless, Rui's implementation should not be too far from ours and achieves even better scores on the Lost&Found dataset.
Segmentation methods assign a known class to each pixel given in input. Even for state-of-the-art approaches, this inevitably enforces decisions that systematically lead to wrong predictions for objects outside the training categories. However, robustness against out-of-distribution samples and corner cases is crucial to avoid dangerous consequences in safety-critical settings, such as in autonomous driving. Since real-world datasets cannot contain enough data points to adequately sample the long tail of the underlying distribution, models must be able to deal with completely unseen and unknown scenarios. This motivates the need for a method capable of working with training categories, known objects, and especially unseen, unknown objects outside the training data distribution.
Previous methods either ignored this problem (panoptic segmentation) or targeted it by either re-identifying already-seen unlabeled objects (open-set panoptic segmentation) or introducing external knowledge, e.g., via vision-language models (zero-shot learning and open-vocabulary approaches). In this work, we propose to extend segmentation with a new setting which we term holistic segmentation. Holistic segmentation aims to identify and separate objects of unseen, unknown categories into instances without prior knowledge while performing panoptic segmentation of known classes.
We tackle this new holistic segmentation problem with U3HS. U3HS finds unknowns as highly uncertain regions and clusters their corresponding instance-aware embeddings into individual objects. For the first time in panoptic segmentation with unknown objects, our U3HS is trained without unknown categories, reducing assumptions, simplifying the training data collection, and leaving the settings as unconstrained as in real-life scenarios.
The figure shows example predictions of U3HS on out-of-distribution data from MS COCO. Remarkably, the model had never seen images containing bears or frisbees (part of the held-out classes), nor had any information about them. Still, the proposed U3HS could robustly segment the unseen, unknown objects despite their high inter-class similarity. Simultaneously, U3HS performed a reasonable panoptic segmentation of known classes. In our ICCV paper, we demonstrate the effectiveness of U3HS for this new, challenging, and assumptions-free setting called holistic segmentation with extensive experiments on public data from MS COCO, Cityscapes, and Lost&Found.
@inproceedings{gasperini2023holistic_seg,
title={Segmenting Known Objects and Unseen Unknowns without Prior Knowledge},
author={Gasperini, Stefano and Marcos-Ramiro, Alvaro and Schmidt, Michael and Navab, Nassir and Busam, Benjamin and Tombari, Federico},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2023},
pages={19321-19332}
}