ELCVIA Electronic Letters on Computer Vision and Image Analysis https://elcvia.cvc.uab.cat/ Electronic Journal on Computer Vision and Image Analysis CVC Press en-US ELCVIA Electronic Letters on Computer Vision and Image Analysis 1577-5097 Authors who publish with this journal agree to the following terms:<br /><ol type="a"><li>Authors retain copyright.</li><li>The texts published in this journal are – unless indicated otherwise – covered by the Creative Commons Spain <a href="http://creativecommons.org/licenses/by-nc-nd/4.0">Attribution-NonComercial-NoDerivatives 4.0</a> licence. You may copy, distribute, transmit and adapt the work, provided you attribute it (authorship, journal name, publisher) in the manner specified by the author(s) or licensor(s). The full text of the licence can be consulted here: <a href="http://creativecommons.org/licenses/by-nc-nd/4.0">http://creativecommons.org/licenses/by-nc-nd/4.0</a>.</li><li>Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.</li><li>Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See <a href="http://opcit.eprints.org/oacitation-biblio.html" target="_new">The Effect of Open Access</a>).</li></ol> Pre-trained CNNs as Feature-Extraction Modules for Image Captioning https://elcvia.cvc.uab.cat/article/view/1436 <p>In this work, we present a thorough experimental study about feature extraction using Convolutional Neural<br>Networks (CNNs) for the task of image captioning in the context of deep learning. We perform a set of 72<br>experiments on 12 image classification CNNs pre-trained on the ImageNet [29] dataset. The features are<br>extracted from the last layer after removing the fully connected layer and fed into the captioning model. We use<br>a unified captioning model with a fixed vocabulary size across all the experiments to study the effect of changing<br>the CNN feature extractor on image captioning quality. The scores are calculated using the standard metrics in<br>image captioning. We find a strong relationship between the model structure and the image captioning dataset<br>and prove that VGG models give the least quality for image captioning feature extraction among the tested<br>CNNs. In the end, we recommend a set of pre-trained CNNs for each of the image captioning evaluation metrics<br>we want to optimise, and show the connection between our results and previous works. To our knowledge, this<br>work is the most comprehensive comparison between feature extractors for image captioning.</p> Muhammad Abdelhadie Al-Malla Assef Jafar Nada Ghneim Copyright (c) 2022 Muhammad Abdelhadie Al-Malla, Muhammad Abdelhadie Al-Malla, Assef Jafar, Nada Ghneim https://creativecommons.org/licenses/by-nc-nd/4.0 2022-05-10 2022-05-10 21 1 1 16 10.5565/rev/elcvia.1436 Retinal Blood Vessels Segmentation using Fréchet PDF and MSMO Method https://elcvia.cvc.uab.cat/article/view/1453 <p>Blood vessels of retina contain information about many severe diseases like glaucoma, hypertension, obesity, diabetes etc. Health professionals use this information to detect and diagnose these diseases. Therefore, it is necessary to segment retinal blood vessels. Quality of retinal image directly affects the accuracy of segmentation. Therefore, quality of image must be as good as possible. Many researchers have proposed various methods to segment retinal blood vessels. Most of the researchers have focused only on segmentation process and paid less attention on pre processing of image even though pre processing plays vital role in segmentation. The proposed method introduces a novel method called multi-scale switching morphological (MSMO) for pre processing and Fréchet match filter for retinal vessel segmentation. We have experimentally tested and verified the proposed method on DRIVE, STARE and HRF data sets. Obtained outcome demonstrate that performance of the proposed method has improved substantially. The cause of improved performance is the better pre processing and segmentation methods.</p> Sushil Kumar saroj Rakesh Kumar Nagendra Pratap Singh Copyright (c) 2022 sushil kumar saroj, Rakesh Kumar, Nagendra Pratap Singh https://creativecommons.org/licenses/by-nc-nd/4.0 2022-04-28 2022-04-28 21 1 27 46 10.5565/rev/elcvia.1453 Object Detection and Statistical Analysis of Microscopy Image Sequences https://elcvia.cvc.uab.cat/article/view/1482 <p>Confocal microscope images are wide useful in medical diagnosis and research. The automatic interpretation of this type of images is very important but it is a challenging endeavor in image processing area, since these images are heavily contaminated with noise, have low contrast and low resolution. <br>This work deals with the problem of analyzing the penetration velocity of a chemotherapy drug in an ocular tumor called retinoblastoma. The primary retinoblastoma cells cultures are exposed to topotecan drug and the penetration evolution is documented by producing sequences of microscopy images. It is possible to quantify the penetration rate of topotecan drug because it produces fluorescence emission by laser excitation which is captured by the camera.<br>In order to estimate the topotecan penetration time in the whole retinoblastoma cell culture, a procedure based on an active contour detection algorithm, a neural network classifier and a statistical model and its validation, is proposed.<br>This new inference model allows to estimate the penetration time. <br>Results show that the penetration mean time strongly depends on tumorsphere size and on chemotherapeutic treatment that the patient has previously received.</p> Juliana Gambini Sasha Hurovitz Debora Chan Rodrigo Ramele Copyright (c) 2022 Juliana Gambini, Sasha Hurovitz, Debora Chan, Rodrigo Ramele https://creativecommons.org/licenses/by-nc-nd/4.0 2022-04-28 2022-04-28 21 1 47 58 10.5565/rev/elcvia.1482 Material Classification with a Transfer Learning based Deep Model on an imbalanced Dataset using an epochal Deming-Cycle-Methodology https://elcvia.cvc.uab.cat/article/view/1517 <p>This work demonstrates that a transfer learning-based deep learning model can perform unambiguous classification based on microscopic images of material surfaces with a high degree of accuracy. A transfer learning-enhanced deep learning model was successfully used in combination with an innovative approach for eliminating noisy data based on automatic selection using pixel sum values, which was refined over different epochs to develop and evaluate an effective model for classifying microscopy images. The deep learning model evaluated achieved 91.54% accuracy with the dataset used and set new standards with the method applied. In addition, care was taken to achieve a balance between accuracy and robustness with respect to the model. Based on this scientific report, a means of identifying microscopy images could evolve to support material identification, suggesting a potential application in the domain of materials science and engineering.&nbsp;</p> Marco Klaiber Copyright (c) 2022 Marco Klaiber https://creativecommons.org/licenses/by-nc-nd/4.0 2022-06-14 2022-06-14 21 1 59 77 10.5565/rev/elcvia.1517 A neural network with competitive layers for character recognition https://elcvia.cvc.uab.cat/article/view/1392 <p class="AbstractBodytext">A structure and functioning mechanisms of a neural network with competitive layers are described. The network is intended to solve the character recognition task. The network consists of several competitive layers of neurons. Each layer is a neural network consisting of a number of neurons represented as a layer. The number of neural layers is equal to the number of recognized classes. All neural layers have one-to-one correspondence with one another and with the input raster. The neurons of every layer have mutual lateral learning connections, which weights are modified during the learning process. There is a competitive (inhibitory) relationship between all neural layers. This competitive interaction is realized by means of a “winner-take-all” (WTA) procedure which aim is to select the layer with the highest level of neural activity.</p><p class="AbstractBodytext">Validation of the network has been done in experiments on recognition of handwritten digits of the MNIST database. The experiments have demonstrated that its error rate is few less than 2%, which is not a high result, but it is compensated by rather fast data processing and a very simple structure and functioning mechanisms.</p><p class="AbstractBodytext"> </p> Alexander Goltsev Vladimir Gritsenko Copyright (c) 2022 Alexander Goltsev, Vladimir Gritsenko https://creativecommons.org/licenses/by-nc-nd/4.0 2022-06-28 2022-06-28 21 1 102 110 10.5565/rev/elcvia.1392 A multiple secret image embedding in dynamic ROI keypoints based on hybrid Speeded Up Scale Invariant Robust Features (h-SUSIRF) algorithm https://elcvia.cvc.uab.cat/article/view/1470 <p>This paper presents a robust and high-capacity video steganography framework using a hybrid Speeded Up Scale Invariant Robust Features (h-SUSIRF) keypoints detection algorithm. There are two main objectives in this method: (1) determining the dynamic Region of Interest (ROI) keypoints in video scenes and (2) embedding the appropriate secret data into the identified regions. In this work, the h-SUSIRF keypoints detection scheme is proposed to find keypoints within the scenes. These identified keypoints are dilated to form the dynamic ROI keypoints. Finally, the secret images are embedded into the dynamic ROI keypoints’ locations of the scenes using the substitution method. The performance of the proposed method (PM) is evaluated using standard metrics Structural Similarity Index Measure (SSIM), Capacity (C<sub>p</sub>), and Bit Error Rate (BER). The standard of the video is ensured by Video Quality Measure (VQM). To examine the efficacy of the PM some recent steganalysis schemes are applied to calculate the detection ratio and the Receiver Operating Characteristics (ROC) curve is analyzed. From the experimental analysis, it is deduced that the PM surpasses the contemporary methods by achieving significant results in terms of imperceptibility, capacity, robustness with lower computational complexity.</p> Suganthi Kumar Rajkumar Soundrapandiyan Copyright (c) 2022 Suganthi Kumar, Rajkumar Soundrapandiyan https://creativecommons.org/licenses/by-nc-nd/4.0 2022-07-19 2022-07-19 21 1 78 100 10.5565/rev/elcvia.1470 Feature selection based on discriminative power under uncertainty for computer vision applications https://elcvia.cvc.uab.cat/article/view/1361 Feature selection is a prolific research field, which has been widely studied in the last decades and has been successfully applied to numerous computer vision systems. It mainly aims to reduce the dimensionality and thus the system complexity. Features have not the same importance within the different classes. Some of them perform for class representation while others perform for class separation. In this paper, a new feature selection method based on discriminative power is proposed to select the relevant features under an uncertain framework, where the uncertainty is expressed through a possibility distribution. In an uncertain context, our method shows its ability to select features that can represent and discriminate between classes. Marwa Chakroun Sonda Ammar Bouhamed Imene Khanfir Kallel Basel Solaiman Houda Derbel Copyright (c) 2022 Marwa Chakroun, Sonda Ammar Bouhamed, Imene Khanfir Kallel, Basel Solaiman, Houda Derbel https://creativecommons.org/licenses/by-nc-nd/4.0 2022-06-28 2022-06-28 21 1 111 120 10.5565/rev/elcvia.1361 Attention-based CNN-ConvLSTM for Handwritten Arabic Word Extraction https://elcvia.cvc.uab.cat/article/view/1433 <p>Word extraction is one of the most critical steps in handwritten recognition systems. It is challenging for many reasons, such as the variability of handwritten writing styles, touching and overlapping characters, skewness problems, diacritics, ascenders, and descenders' presence. In this work, we propose a deep-learning-based approach for handwritten Arabic word extraction. We used an Attention-based CNN-ConvLSTM (Convolutional Long Short-term Memory) followed by a CTC (Connectionist Temporal Classification) function. Firstly, the text-line input image's essential features are extracted using Attention-based Convolutional Neural Networks (CNN). The extracted features and the text line's transcription are then passed to a ConvLSTM to learn a mapping between them. Finally, we used a CTC to learn the alignment between text-line images and their transcription automatically. We tested the proposed model on a complex dataset known as KFUPM Handwritten Arabic Text (KHATT \cite{khatt}). It consists of complex patterns of handwritten Arabic text-lines. The experimental results show an apparent efficiency of the used combination, where we ended up with an extraction success rate of 91.7\%.</p> takwa Ben Aicha Afef Kacem Echi Copyright (c) 2022 takwa Ben Aicha, Afef Kacem Echi https://creativecommons.org/licenses/by-nc-nd/4.0 2022-06-28 2022-06-28 21 1 121 129 10.5565/rev/elcvia.1433