A Panoptic Segmentation for Indoor Environments using MaskDINO: An Experiment on the Impact of Contrast
Abstract
Robot perception involves recognizing the surrounding environment, particularly in indoor spaces like kitchens, classrooms, and dining areas. This recognition is crucial for tasks such as object identification. Objects in indoor environments can be categorized into "things," with fixed and countable shapes (e.g., tables, chairs), and "stuff," which lack a fixed shape and cannot be counted (e.g., sky, walls). Object detection and instance segmentation methods excel in identifying "things," with instance segmentation providing more detailed representations than object detection. However, semantic segmentation can identify both "things" and "stuff" but lacks segmentation at the object level. Panoptic segmentation, a fusion of both methods, offers comprehensive object and stuff identification and object-level segmentation. Considerations need to be made regarding the variabilities of room conditions in contrast to implementing panoptic segmentation indoors. High or low contrast in the room potentially reduces the clarity of the shape of an object, thus affecting the segmentation results of that object. We experimented with how contrast varieties impact the panoptic segmentation performance using the MaskDINO model, the first on the panoptic quality (PQ) leaderboard. We then improved the model generalization on the various contrasts by re-optimizing it using a contrast-augmented dataset.
Keywords
MaskDINO, Indoor Environment, Panoptic SegmentationReferences
Lin, T., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., & Dollar, P., "Microsoft COCO: Common Objects in Context" arXiv:1405.0312v3. 2015.
Li, F., Zhang, Hao., Xu, H., Liu, S., Zhang, L., Ni, LM, & Shum, H., "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation" arXiv:2206.02777v1. 2022.
https://paperswithcode.com/task/panoptic-segmentation
Yu, Q., Wang, H., Qiao, S., Collins, M., Zhu, Y., Adam, H., Yuille, A., & Chen, L., "k-means Mask Transformer" arXiv:2207.04044v1. 2022.
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., & Girdhar, R., "Masked-attention Mask Transformer for Universal Image Segmentation" arXiv:2112.01527v3. 2022.
Li, Z., Wang, W., Xie, E., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P., & Lu, T., "Panoptic Segformer: Delving Deeper into Panoptic Segmentation with Transformers'
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I., "Attention is all you need" NeurIPS. 2017.
Kirillov, A., Girshick, R., He, K., & Dollar, P., "Panoptic Feature Pyramid Networks" arXiv:1901.02446v2. 2019.
Cheng, B., Collinns, M. D., Zhu, Y., Liu, T., Huang, T.S., Adam, H., & Chen, L., "Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation" arXiv:1911.10194v3. 2020
Mohan, R. & Valada, A., “EfficientPS: Efficient Panoptic Segmentation” arXiv:2004.02307v3. 2021
Ren, J., Yu, C., Cai, Z., Zhang, M., Chen, C., Zhao, H., Yi, S., & Li, H., "REFINE: Prediction Fusion Network for Panoptic Segmentation" Association for the Advancement of Artificial Intelligence (www.aaai.org). 2021
Li, Y., Zhao, H., Qi, X., Wang, L., Li, Z., Sun, J., & Jia, J., "Fully Convolutional Networks for Panoptic Segmentation" arXiv:2012.00720v2. 2021.
Zhang, W., Pang, J., Chen, K., & Loy, C.C., "K_Net: Towards Unified Image Segmentation" arXiv:2106.14855v2. 2021
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S., “End-to-End Object Detection with Transformer” arXiv:2005.12872v3. 2020.
Cheng, B., Schwing, A.G., Kirillov, A., "Per-Pixel Classification is Not All You Need for Semantic Segmentation" arXiv:2107.06278v2. 2021.
Jain, J., Li, J., Chiu, M., Hassani, A., Orlov, N., & Shi, H., "OneFormer: One Transformer to Rule Universal Image Segmentation" arXiv:2211.06220v1. 2022.
https://samirkhanal35.medium.com/contrast-stretching-f25e7c4e8e33
https://www.geeksforgeeks.org/adaptive-histogram-equalization-in-image-processing-using-matlab/
Dataset ADE20k
Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, LM, & Shum, H.Y., DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. arXiv:2203.03605v4. 2022.
Samsudin, S., 2021. Introduction to Histogram Equalization for Digital Image Enhancement. [online] Available at: https://levelup.gitconnected.com/introduction-to-histogram-equalization-for-digital-image-enhancement-420696db9e43 [Accessed 23 May 2023]
Sodano, M, Magistro, F., Guadagnino, T., Behley, J., Stachniss, C, "Robust Double-Encoder Network for RGB-D Panoptic Segmentation". arXiv:2210.02834v2.
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Niebner, M. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. [online] Available at: http://www.scan-net.org/ [Accessed 7 March 2024]
Roberts, M., Volodin, A., Germer, T., Niklaus, S., 2020. Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding. [online] Available at: https://github.com/apple/ml-hypersim [Accessed 7 March 2024]
Published
Downloads
Copyright (c) 2025 Khalisha Putri, Ika Candradewi
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.