Improving Slow-Moving Object Detection in Complex Environments Using a Feature Pooling Enhanced Encoder-Decoder Model: EDM-SMOD

Prabodh Kumar Sahoo; Upasana Panigrahi; Manoj Kumar Panda; Ganapati Panda

doi:10.5565/rev/elcvia.2023

Improving Slow-Moving Object Detection in Complex Environments Using a Feature Pooling Enhanced Encoder-Decoder Model

EDM-SMOD

Authors

Prabodh Kumar Sahoo Parul Institute of Technology, Parul University
Upasana Panigrahi C V Raman Global University, Odisha, India
Manoj Kumar Panda GIET University
Ganapati Panda C V Raman Global University

PDF

Abstract

The ability to detect moving objects is of great importance in a wide range of visual surveillance systems, playing a vital role in maintaining security and ensuring effective monitoring. However, the primary aim of such systems is to detect objects in motion and tackle real-world challenges effectively. Despite the existence of numerous methods, there remains room for improvement, particularly in slowly moving video sequences and unfamiliar video environments. In videos where slow-moving objects are confined to a small area, it can cause many traditional methods to fail to detect the entire object. However, an effective solution is the spatial-temporal framework. Additionally, the selection of temporal, spatial, and fusion algorithms is crucial for effectively detecting slow-moving objects. This article presents a notable effort to address the detection of slowly moving objects in challenging videos by leveraging an encoder-decoder architecture incorporating a modified VGG-16 model with a feature pooling framework. Several novel aspects characterize the proposed algorithm: it utilizes a pre-trained modified VGG-16 network as the encoder, employing transfer learning to enhance model efficacy. The encoder is designed with a reduced number of layers and incorporates skip connections to extract essential fine and coarse-scale features crucial for local change detection. The feature pooling framework (FPF) utilizes a combination of different layers including max pooling, convolutional, and numerous atrous convolutional with varying rates of sampling. This integration enables the preservation of features at different scales with various dimensions, ensuring their representa tion across a wide range of scales. The decoder network comprises stacked convolutional layers effectively mapping features to image space. The performance of the developed technique is assessed in comparison to various existing methods, including those by CMRM, Hybrid algorithm, Fast valley, EPMCB, and MODCVS, showcasing its effectiveness through both subjective and objective analyses. It demonstrates superior performance, with an average F-measure (AF) value of 98.86% and a lower average misclassification error (AMCE) value of 0.85. Furthermore, the algorithm’s effectiveness is validated on Imperceptible Video Configuration video setups, where it exhibits superior performance.

Keywords

Background subtraction, Deep neural network, Transfer learning, Slow moving object, Feature pooling framework, Encoder-Decoder type network

References

[1] M. K. Panda, B. N. Subudhi, T. Veerakumar, V. Jakhetiya, ”Modified resnet-152 network with hybridpyramidal pooling for local change de- tection”, IEEE Transactions on Artificial Intelligence 5 (4) (2023)1599 –1612, https://doi.org/10.1109/TAI.2023.3299903.

[2] P. K. Sahoo, M. K. Panda, U. Panigrahi, G. Panda, P. Jain, M. S. Islam, M. T. Islam, ”An improved VGG-19network induced enhanced feature pooling for precise moving object detection in complex video scenes”,IEEE Access 12 (2024) 45847 – 45864, https://doi.org/10.1109/ACCESS.2024.3381612.

[3] U. Panigrahi, P. K. Sahoo, M. K. Panda, G. Panda, ”A ResNet-101 deep learning framework inducedtransfer learning strategy for moving object detection”, Image and Vision Computing 146 (2024) 105021,https://doi.org/10.1016/j.imavis.2024.105021.

[4] R. Poppe, ”A survey on vision-based human action recognition”, Image and Vision Computing 28 (6)(2010) 976–990, https://doi.org/10.1016/j.imavis.2009.11.014.

[5] J. Hsieh, S. Yu, Y. Chen, W. Hu, ”Automatic traffic surveillance system for vehicle trackingand classification”, IEEE Transactions on Intelligent Transportation Systems 7 (2) (2006) 175–187,https://doi.org/10.1109/TITS.2006.874722.

[6] W. Hu, T. Tan, L. Wang, S. Maybank, ”A survey on visual surveillance of object motion and behaviors”,IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 34 (3) (2004)334–352, https://doi.org/10.1109/TSMCC.2004.829274.Panigrahi et al. / Electronic Letters on Computer Vision and Image Analysis 24(2):49-69, 202567

[7] K. Rout, B. N. Subudhi, T. Veerakumar, S. Chaudhury, ”Spatio-contextual Gaussian mixture model forlocal change detection in under- water video”, Expert Systems with Applications 97 (2018) 117–136,https://doi.org/10.1016/j.eswa.2017.12.009.

[8] B. N. Subudhi, M. K. Panda, T. Veerakumar, V. Jakhetiya, S. Esakkira-jan, ”Kernel-induced possibilisticfuzzy associate background subtraction for video scene”, IEEE Transactions on Computational SocialSystems 10 (3) (2022) 1314 – 1325, https://doi.org/10.1109/TCSS.2021.3137306.

[9] S. Sreelakshmi, G. Malu, E. Sherly, R. Mathew, ”M-Net: An encoder-decoder architecture formedical image analysis using ensemble learning”, Results in Engineering 17 (2023) 100927,https://doi.org/10.1016/j.rineng.2023.100927.

[10] T.-T. Nguyen, C. V. Duy, ”Grasping moving objects with incomplete information in a low-cost robotproduction line using contour matching based on the hu moments”, Results in Engineering 23 (2024)102414, https://doi.org/10.1016/j.rineng.2024.102414.

[11] H. Ranjbar, P. Forsythe, A. A. F. Fini, M. Maghrebi, T. S. Waller, ”Addressing practical challenge of usingautopilot drone for asphalt surface monitoring: Road detection, segmentation, and following, Results inEngineering 18 (2023) 101130, https://doi.org/10.1016/j.rineng.2023.101130.

[12] Y. Lai, ”Optimization of urban and rural ecological spatial planning based on deep learn-ing under the concept of sustainable development”, Results in Engineering 19 (2023) 101343,https://doi.org/10.1016/j.rineng.2023.101343.

[13] T. Bouwmans, S. Javed, M. Sultana, S. K. Jung, ”Deep neural network concepts for backgroundsubtraction: A systematic review and compar- ative evaluation”, Neural Networks 117 (2019) 8–66,https://doi.org/10.1016/j.neunet.2019.04.024.

[14] M. K. Panda, A. Sharma, V. Bajpai, B. N. Subudhi, V. Thangaraj, V. Jakhetiya, ”Encoder and decodernetwork with resnet-50 and global average feature pooling for local change detection”, Computer Visionand Image Understanding 222 (2022) 103501, https://doi.org/10.1016/j.cviu.2022.103501.

[15] M. K. Panda, B. N. Subudhi, T. Bouwmans, V. Jakheytiya, T. Veerakumar, ”An end to end encoder-decodernetwork with multi-scale feature pulling for detecting local changes from video scene”, in: 2022 18thIEEE International Conference on Advanced Video and Signal Based Surveil-lance (AVSS), 2022, pp.1–8, https://doi.org/10.1109/AVSS56176.2022.9959141.

[16] S. Pavithra, et al., ”An efficient approach to detect and segment underwater images using swin trans-former”, Results in Engineering 23 (2024) 102460, https://doi.org/10.1016/j.rineng.2024.102460.

[17] K. Simonyan, A. Zisserman, ”Very deep convolutional networks for large-scale image recognition”, arXivpreprint arXiv:1409.1556 (2014), https://doi.org/10.48550/arXiv.1409.1556.

[18] M. Reisslein, L. Karam, P. Seeling, F. Fitzek, Yuv video sequences,[Accessed:2016] (2000).http://trace.eas.asu.edu/yuv/.

[19] C. Montgomery, Xiph.org video test media [derf’s collection],https://media.xiph.org/video/derf/.[Accessed:2016] (2004),

[20] P. K. Sahoo, P. Kanungo, K. Parvathi, ”Three frame based adaptive background subtraction”, in: Pro-ceedings of the International Conferenceon High Performance Computing and Applications, 2014, pp. 1–5,https://doi.org/10.1109/ICHPCA.2014.7045375.68Panigrahi et al. / Electronic Letters on Computer Vision and Image Analysis 24(2):49-69, 2025

[21] J. H. Duncan, T.-C. Chou, ”On the detection of motion and the computation of opticalflow”, IEEE Transactions on Pattern Analysis & Machine Intelligence 14 (03) (1992) 346–352,https://doi.ieeecomputersociety.org/10.1109/34.120329.

[22] S. K. Choudhury, P. K. Sa, S. Bakshi, B. Majhi, ”An evaluation of background subtractionfor object detection vis-a-vis mitigating challenging scenarios”, IEEE Access 4 (2016) 6133–6150,https://doi.org/10.1109/ACCESS.2016.2608847.

[23] P. Viola, M. Jones, ”Rapid object detection using a boosted cascade of simple features”, in: Proceedingsof the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, 2001, pp.I–511–I–518, https://doi.org/10.1109/CVPR.2001.990517.

[24] P. Dollar, C. Wojek, B. Schiele, P. Perona, ”Pedestrian detection: An evaluation of the state ofthe art”, IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (4) (2012) 743–761,https://doi.org/10.1109/TPAMI.2011.155.

[25] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, ”You only look once: Unified, real-time object detection,in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788,https://doi.org/10.1109/CVPR.2016.91.

[26] P. K. Sahoo, P. Kanungo, S. Mishra, ”A fast valley-based segmentation for detection of slowly movingobjects, Signal”, Image and Video Processing 12 (2018) 1265–1272, https://doi.org/10.1007/s11760-018-1278-9.

[27] R. Cucchiara, C. Grana, M. Piccardi, A. Prati, ”Detecting moving objects, ghosts, and shadows in videostreams”, IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (10) (2003) 1337–1342,https://doi.org/10.1109/TPAMI.2003.1233909.

[28] P. Kanungo, A. Narayan, P. Sahoo, S. Mishra, ”Neighborhood based codebook model for moving objectsegmentation”, in:Proceedings of the 2nd International Conference on Man and Machine Interfacing, 2017,pp. 1–6, https://doi.org/10.1109/MAMI.2017.8308009.

[29] P. KaewTraKulPong, R. Bowden, ”An Improved Adaptive Background Mixture Model forReal-time Tracking with Shadow Detection”, Springer US, Boston, MA, 2002, pp. 135–144,https://doi.org/10.1007/978-1-4615-0913-4 11.

[30] J. Huang, W. Zou, Z. Zhu, J. Zhu, ”An efficient optical flow based motion detection method fornon-stationary scenes”, in: Proceedings of the Chinese Control And Decision Conference, 2019, pp.5272–5277, https://doi.org/10.1109/CCDC.2019.8833206.

[31] L. Fan, T. Zhang, W. Du, ”Optical-flow-based framework to boost video object detec-tion performance with object enhancement”, Expert Systems with Applications 170 (2021) 1–8,https://doi.org/10.1016/j.eswa.2020.114544.

[32] J. Guo, J. Wang, R. Bai, Y. Zhang, Y. Li, ”A new moving object detection method based on frame-differenceand background subtraction”,IOP Conference Series: Materials Science and Engineering 242 (1) (2017)012115, https://dx.doi.org/10.1088/1757-899X/242/1/012115.

[33] S. S. Sengar, S. Mukhopadhyay,” Moving object detection based on frame difference and w4, Signal”,Image and Video Processing 11 (2017) 1357–1364, https://doi.org/10.1007/s11760-017-1093-8.

[34] J.-D. Shi, J.-Z. Wang, ”Moving objects detection and tracking in dynamic scene”, Transactions of Beijinginstitute of Technology, 29 (10) (2009) 858–861.Panigrahi et al. / Electronic Letters on Computer Vision and Image Analysis 24(2):49-69, 202569

[35] X. Huang, F. Wu, P. Huang, ”Moving-object detection based on sparse representation and dictionarylearning”, AASRI Procedia 1 (2012) 492– 497, aASRI Conference on Computational intelligence andBioinformatics, https://doi.org/10.1016/j.aasri.2012.06.077.

[36] M. Sava ̧s, H. Demirel, B. Erkal, ”Moving object detection using an adaptive background sub-traction method based on block-based structure indynamic scene”, Optik 168 (2018) 605–618,https://doi.org/10.1016/j.ijleo.2018.04.047.

[37] Q. Zhang, T. Xiao, N. Huang, D. Zhang, J. Han, ”Revisiting feature fusion for rgb-t salient object de-tection”, IEEE Transactions on Circuits and Systems for Video Technology 31 (5) (2021) 1804–1818,https://doi.org/10.1109/TCSVT.2020.3014663.

[38] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-a. Fu, A. C.Berg, Ssd: Single shot multiboxdetector, in: B. Leibe, J. Matas, 25 N. Sebe, M. Welling (Eds.), Computer Vision – ECCV 2016, SpringerInternational Publishing, Cham, 2016, pp. 21–37, https://doi.org/10.1007/978-3-319-46448-0 2.

[39] S. Ren, K. He, R. Girshick, J. Sun, ”Faster r-cnn: Towards real-time object detection with region proposalnetworks (2016)”. arXiv:1506.01497, https://doi.org/10.1109/TPAMI.2016.2577031.

[40] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, ”Focal loss for dense object detection”, in: Pro-ceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988,https://doi.org/10.1109/ICCV.2017.324 .

[41] C. W. Corsel, M. van Lier, L. Kampmeijer, N. Boehrer, E. M. Bakker, ”Exploiting temporal context fortiny object detection”, in: Proceedings of the IEEE/CVF Winter Conference on Applications of ComputerVision, 2023, pp. 79–89, https://doi.org/10.1109/WACVW58289.2023.00013.

[42] H. Law, J. Deng, Cornernet: Detecting objects as paired keypoints, in: Proceedings of the European con-ference on computer vision (ECCV), 2018, pp. 734–750, https://doi.org/10.1007/978-3-030-01264-9 45.

[43] S.-H. Lee, S.-H. Bae, Afi-gan: ”Improving feature interpolation of feature pyramid net-works via adversarial training for object detection”, Pattern Recognition 138 (2023) 109365,https://doi.org/10.1016/j.patcog.2023.109365.

[44] B. N. Subudhi, P. K. Nanda, ”Detection of slow moving video objects using compound markovrandom field model”, in: TENCON 2008-2008 IEEE Region 10 Conference, 2008, pp. 1–6,https://doi.org/10.1109/TENCON.2008.4766385.

[45] Z. Zhu, Y. Wang, ”A hybrid algorithm for automatic segmentation of slowly moving ob-jects”, AEU-International Journal of Electronics and Communications 66 (3) (2012) 249–254,https://doi.org/10.1016/j.aeue.2011.07.009.

[46] P. K. Sahoo, P. Kanungo, S. Mishra, B. P. Mohanty, ”Entropy feature and peak-means clus-tering based slowly moving object detection in head and shoulder video sequences”, Jour-nal of King Saud University-Computer and Information Sciences 34 (8) (2022) 5296–5304,https://doi.org/10.1016/j.jksuci.2020.12.019.

[47] Y. Wang, P.-M. Jodoin, F. Porikli, J. Konrad, Y. Benezeth, P. Ishwar, ”CDnet 2014: An expanded changedetection benchmark dataset”, IEEE conference on computer vision and pattern recognition workshops387–394, 2014, https://doi.org/10.1109/CVPRW.2014.126.

Author Biographies

Upasana Panigrahi, C V Raman Global University, Odisha, India

Ph.D Scholar, Department of Electronics and Communication Engineering

C V Raman Global University

Bhubaneswar, Odisha, India,

752054

Manoj Kumar Panda, GIET University

Department of Electronics and Communication Engineering,

GIET University. Gunupur,

Rayagada, Odisha, India

765022

Ganapati Panda, C V Raman Global University

Department of Electronics and Communication Engineering,

C V Raman Global University, Bhubaneswar, Odisha, India

752054