A Panoptic Segmentation for Indoor Environments using MaskDINO: An Experiment on the Impact of Contrast

Authors

  • Khalisha Putri
  • Ika Candradewi - Universitas Gadjah Mada

Abstract

Robot perception involves recognizing the surrounding environment, particularly in indoor spaces like kitchens, classrooms, and dining areas. This recognition is crucial for tasks such as object identification. Objects in indoor environments can be categorized into "things," with fixed and countable shapes (e.g., tables, chairs), and "stuff," which lack a fixed shape and cannot be counted (e.g., sky, walls). Object detection and instance segmentation methods excel in identifying "things," with instance segmentation providing more detailed representations than object detection. However, semantic segmentation can identify both "things" and "stuff" but lacks segmentation at the object level. Panoptic segmentation, a fusion of both methods, offers comprehensive object and stuff identification and object-level segmentation. Considerations need to be made regarding the variabilities of room conditions in contrast to implementing panoptic segmentation indoors. High or low contrast in the room potentially reduces the clarity of the shape of an object, thus affecting the segmentation results of that object. We experimented with how contrast varieties impact the panoptic segmentation performance using the MaskDINO model, the first on the panoptic quality (PQ) leaderboard. We then improved the model generalization on the various contrasts by re-optimizing it using a contrast-augmented dataset.

Keywords

MaskDINO, Indoor Environment, Panoptic Segmentation

References

Lin, T., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., & Dollar, P., "Microsoft COCO: Common Objects in Context" arXiv:1405.0312v3. 2015.

Li, F., Zhang, Hao., Xu, H., Liu, S., Zhang, L., Ni, LM, & Shum, H., "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation" arXiv:2206.02777v1. 2022.

https://paperswithcode.com/task/panoptic-segmentation

Yu, Q., Wang, H., Qiao, S., Collins, M., Zhu, Y., Adam, H., Yuille, A., & Chen, L., "k-means Mask Transformer" arXiv:2207.04044v1. 2022.

Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., & Girdhar, R., "Masked-attention Mask Transformer for Universal Image Segmentation" arXiv:2112.01527v3. 2022.

Li, Z., Wang, W., Xie, E., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P., & Lu, T., "Panoptic Segformer: Delving Deeper into Panoptic Segmentation with Transformers'

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I., "Attention is all you need" NeurIPS. 2017.

Kirillov, A., Girshick, R., He, K., & Dollar, P., "Panoptic Feature Pyramid Networks" arXiv:1901.02446v2. 2019.

Cheng, B., Collinns, M. D., Zhu, Y., Liu, T., Huang, T.S., Adam, H., & Chen, L., "Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation" arXiv:1911.10194v3. 2020

Mohan, R. & Valada, A., “EfficientPS: Efficient Panoptic Segmentation” arXiv:2004.02307v3. 2021

Ren, J., Yu, C., Cai, Z., Zhang, M., Chen, C., Zhao, H., Yi, S., & Li, H., "REFINE: Prediction Fusion Network for Panoptic Segmentation" Association for the Advancement of Artificial Intelligence (www.aaai.org). 2021

Li, Y., Zhao, H., Qi, X., Wang, L., Li, Z., Sun, J., & Jia, J., "Fully Convolutional Networks for Panoptic Segmentation" arXiv:2012.00720v2. 2021.

Zhang, W., Pang, J., Chen, K., & Loy, C.C., "K_Net: Towards Unified Image Segmentation" arXiv:2106.14855v2. 2021

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S., “End-to-End Object Detection with Transformer” arXiv:2005.12872v3. 2020.

Cheng, B., Schwing, A.G., Kirillov, A., "Per-Pixel Classification is Not All You Need for Semantic Segmentation" arXiv:2107.06278v2. 2021.

Jain, J., Li, J., Chiu, M., Hassani, A., Orlov, N., & Shi, H., "OneFormer: One Transformer to Rule Universal Image Segmentation" arXiv:2211.06220v1. 2022.

https://samirkhanal35.medium.com/contrast-stretching-f25e7c4e8e33

https://www.geeksforgeeks.org/adaptive-histogram-equalization-in-image-processing-using-matlab/

Dataset ADE20k

Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, LM, & Shum, H.Y., DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. arXiv:2203.03605v4. 2022.

Samsudin, S., 2021. Introduction to Histogram Equalization for Digital Image Enhancement. [online] Available at: https://levelup.gitconnected.com/introduction-to-histogram-equalization-for-digital-image-enhancement-420696db9e43 [Accessed 23 May 2023]

Sodano, M, Magistro, F., Guadagnino, T., Behley, J., Stachniss, C, "Robust Double-Encoder Network for RGB-D Panoptic Segmentation". arXiv:2210.02834v2.

Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Niebner, M. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. [online] Available at: http://www.scan-net.org/ [Accessed 7 March 2024]

Roberts, M., Volodin, A., Germer, T., Niklaus, S., 2020. Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding. [online] Available at: https://github.com/apple/ml-hypersim [Accessed 7 March 2024]

Published

2025-01-28

Downloads

Download data is not yet available.