ELCVIA Electronic Letters on Computer Vision and Image Analysis

A Panoptic Segmentation for Indoor Environments using MaskDINO: An Experiment on the Impact of Contrast

Khalisha Putri — 2025-01-28

Robot perception involves recognizing the surrounding environment, particularly in indoor spaces like kitchens, classrooms, and dining areas. This recognition is crucial for tasks such as object identification. Objects in indoor environments can be categorized into "things," with fixed and countable shapes (e.g., tables, chairs), and "stuff," which lack a fixed shape and cannot be counted (e.g., sky, walls). Object detection and instance segmentation methods excel in identifying "things," with instance segmentation providing more detailed representations than object detection. However, semantic segmentation can identify both "things" and "stuff" but lacks segmentation at the object level. Panoptic segmentation, a fusion of both methods, offers comprehensive object and stuff identification and object-level segmentation. Considerations need to be made regarding the variabilities of room conditions in contrast to implementing panoptic segmentation indoors. High or low contrast in the room potentially reduces the clarity of the shape of an object, thus affecting the segmentation results of that object. We experimented with how contrast varieties impact the panoptic segmentation performance using the MaskDINO model, the first on the panoptic quality (PQ) leaderboard. We then improved the model generalization on the various contrasts by re-optimizing it using a contrast-augmented dataset.

Supervised Deep Learning Approaches For Anomaly Detection And Recognition In Crowd Scenes

Kinjal Joshi — 2025-02-25

These days consciousness about public safety increases and CCTV cameras are installed at almost all public places. But generally automatic smart surveillance systems are not available. In this manuscript, emphasis is given to detect and classify abnormal events in surveillance video especially in crowd environments. Abnormal event detection is a challenging task because the definition of abnormality is subjective. A normal event in one situation can be considered an abnormal event in another case. In the surveillance video with a dense crowd, automatic anomaly detection becomes very difficult because of clutter and severe occlusion.

This manuscript represents CNN (Convolutional Neural Network) and CNN-LSTM (Convolutional Neural Network-Long Short-Term Memory) based approaches for detection and classification of abnormal events. The CNN architecture is developed from scratch and can be used for spatial domains. LSTM architecture is developed for the temporal domain. Feature sequences are generated using CNN model and given as input to LSTM model. Experiments are carried out using five different publicly available benchmark datasets. The performance is measured by accuracy and area under the ROC (receiver operating characteristic) curve (AUC). CNN-LSTM approach works better than only CNN.

An Inclusive review on deep learning techniques and their scope in handwriting recognition

Sukhdeep Singh — 2025-05-03

Deep learning expresses a category of machine learning algorithms that have the capability to combine raw inputs into intermediate features layers. These deep learning algorithms have demonstrated great results in different fields. Deep learning has particularly witnessed for a great achievement of human level performance across a number of domains in computer vision and pattern recognition. For the achievement of state-of-the-art performances in diverse domains, the deep learning used different architectures and these architectures used activation functions to perform various computations between hidden and output layers of any architecture. This paper presents a survey on the existing studies of deep learning in handwriting recognition field. Even though the recent progress indicates that the deep learning methods has provided valuable means for speeding up or proving accurate results in handwriting recognition, but following from the extensive literature survey, the present study finds that the deep learning has yet to revolutionize more and has to resolve many of the most pressing challenges in this field, but promising advances have been made on the prior state of the art. Additionally, an inadequate availability of labelled data to train presents problems in this domain. Nevertheless, the present handwriting recognition survey foresees deep learning enabling changes at both bench and bedside with the potential to transform several domains as image processing, speech recognition, computer vision, machine translation, robotics and control, medical imaging, medical information processing, bio-informatics, natural language processing, cyber security, and many others.

Implementation of Explainable Ai in Deep Learning Methods for Multiclass Classification of Plant Diseases in Mango Leaves

Menaka Radhakrishnan — 2025-05-21

Maintaining optimal yield plays a crucial role in the prosperity of agriculture and in turn the economy of the country. One way to optimize this yield is by early and accurate detection and diagnosis of crop diseases. Traditional methods that involve manual inspection or the like tend to be tedious and often inaccurate. Hence the use of machine learning and convolutional neural networks have proven to be of great advantage in terms of accuracy, reliability, ease of implementation etc. This paper explores various deep learning models such as AlexNet, ResNet, Swin Transformer, Vgg-16, vit model for plant leaf disease detection and classification on a dataset of mango leaves and compares aspects such as accuracy and loss. Further the models have been combined using feature fusion, and their accuracies compared. Finally, a combination of ResNet and AlexNet has been proposed with an impressive accuracy of 99.97%. Further, Grad-CAM (Gradient-weighted Class Activation Mapping) has been implemented to highlight important regions in the leaf images which improves visualization. This can potentially provide an accurate identification and classification of plant diseases based on leaf images.

Enhanced Bird Species Image Recognition and Classification using MobileNet and InceptionV3 Transfer learning Architectures

Sakthi Priya G — 2025-05-21

The proposed study explores the application of transfer learning techniques in bird species image classification, specifically focusing on the MobileNet and InceptionV3 models. Utilizing the CUB-200-2011 dataset, the transfer learning is employed to enhance classification accuracy. The MobileNet model achieved an impressive accuracy of 74.60%, outperforming InceptionV3, which recorded an accuracy of 64.00%. The corresponding loss values were 0.8685 for MobileNet and 1.128 for InceptionV3, highlighting MobileNet's superior alignment with actual class labels. Additionally, MobileNet demonstrated a precision range of 0.45 to 0.93, while InceptionV3's precision ranged from 0.65 to 0.81. The F1-scores revealed MobileNet's performance ranged from 0.40 to 0.91, in contrast to InceptionV3’s lower F1-scores, indicating a more stable but less effective classification ability. These findings underscore the potential of MobileNet as a lightweight, efficient alternative for wildlife image classification tasks, making it particularly suitable for deployment in resource-constrained environments. The developed user interface allows for seamless interaction, enabling users to upload images and receive immediate classification results, further demonstrating the practical application of these models in conservation and biodiversity preservation efforts.

Deep Learning-Based Video Anomaly Detection Using Optimised Attention-Enhanced Autoencoders

Anjali S — 2025-05-21

Anomaly detection in video is essential for applications like surveillance, healthcare, and industrial monitoring. Through the reconstruction of normal patterns and the computation of reconstruction error in relation to ground truth, convolutional autoencoders detect anomalies. Frames with errors above a threshold are flagged as abnormal. Existing approaches rely on fixed thresholds, which may not adapt well to varying lighting conditions, leading to false positives or missed anomalies. A novel autoencoder (SESAA) is proposed in this work that combines self-attention with squeeze-and-excitation (SE) blocks and improves video anomaly detection by using a thresholding technique for optimal threshold identification. Our adaptive thresholding technique leverages reconstruction cost, peak signal-to-noise ratio (PSNR) and frame brightness for optimal threshold identification, enhancing adaptability to different scenarios. Comparing with dynamic threshold methods, we assess our model using ROC and AUC metrics. Experiments on three benchmark datasets validate the efficacy of our method in precise anomaly detection through optimal thresholding.