ELCVIA Electronic Letters on Computer Vision and Image Analysis https://elcvia.cvc.uab.cat/ Electronic Journal on Computer Vision and Image Analysis en-US Authors who publish with this journal agree to the following terms:<br /><ol type="a"><li>Authors retain copyright.</li><li>The texts published in this journal are – unless indicated otherwise – covered by the Creative Commons Spain <a href="http://creativecommons.org/licenses/by-nc-nd/4.0">Attribution-NonComercial-NoDerivatives 4.0</a> licence. You may copy, distribute, transmit and adapt the work, provided you attribute it (authorship, journal name, publisher) in the manner specified by the author(s) or licensor(s). The full text of the licence can be consulted here: <a href="http://creativecommons.org/licenses/by-nc-nd/4.0">http://creativecommons.org/licenses/by-nc-nd/4.0</a>.</li><li>Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.</li><li>Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See <a href="http://opcit.eprints.org/oacitation-biblio.html" target="_new">The Effect of Open Access</a>).</li></ol> elcvia@cvc.uab.cat (Electronic Letters on Computer Vision and Image Analysis) elcvia@cvc.uab.cat (ELCVIA) Tue, 23 Apr 2024 10:14:40 +0200 OJS 3.2.1.4 http://blogs.law.harvard.edu/tech/rss 60 Off-line identifying Script Writers by Swin Transformers and ResNeSt-50 https://elcvia.cvc.uab.cat/article/view/1787 <p>In this work, we present two advanced models for identifying script writers, leveraging the power of deep learning. The proposed systems utilize the new vision Swin Transformer and ResNeSt-50. Swin Transformer is known for its robustness to variations and ability to model long-range dependencies, which helps capture context and make robust predictions. Through extensive training on large datasets of handwritten text samples, the Swin Transformer operates on sequences of image patches and learns to establish a robust representation of each writer’s unique style. On the other hand, ResNeSt-50 (Residual Neural Network with Squeeze-and-Excitation (SE) and Next Stage modules), with its multiple layers, helps in learning complex representations of a writer’s unique style and distinguishing between different writing styles with high precision. The SE module within ResNeSt helps the model focus on distinctive handwriting characteristics and reduce noise. The experimental results demonstrate exceptional performance, achieving an accuracy of 98.50% (at patch level) by the Swin Transformer on the CVL database, which consists of images with<br>cursively handwritten German and English texts, and an accuracy of 96.61% (at page level) by ResNeSt-50 on the same database. This research advances writer identification by showcasing the effectiveness of the Swin Transformer and ResNeSt-50. The achieved accuracy underscores the potential of these models to process and understand complex handwriting effectively.</p> Afef Kacem Echi, Takwa Ben Aïcha Gader Copyright (c) 2024 Afef Kacem Echi, Takwa Ben Aïcha Gader https://creativecommons.org/licenses/by-nc-nd/4.0 https://elcvia.cvc.uab.cat/article/view/1787 Mon, 03 Jun 2024 00:00:00 +0200 Multi-Biometric System Based On The Fusion Of Fingerprint And Finger-Vein https://elcvia.cvc.uab.cat/article/view/1822 <p>Biometrics is the process of measuring the unique biological traits of an individual for identification and verification purposes. Multiple features are used to enhance the security and robustness of the system. This study concentrates exclusively on the finger and employs two modalities - fingerprint and finger vein. The proposed system utilizes feature extraction for finger vein and two matching algorithms, namely ridge-based matching, and minutiae-based matching, to derive matching scores for both biometrics. The scores from the two modalities are combined using four fusion approaches: holistic fusion, non-linear fusion, sum rule-based fusion, and Dempster-Shafer theory. The ultimate decision is made by the performance metrics and the Receiver Operating Characteristics (ROC) curve of the fusion technique with the best results. The proposed technique is tested on images collected from the “Nanjing University Posts and Telecommunications- Fingerprint and Finger vein dataset (NUPT-FPV).” According to the results, which were obtained for 840 input images the proposed system accomplishes the Equal Error Rate (EER) of 0% while using Dempster Shafer-based fusion and 14% while using the other three fusion techniques. Also, the False Acceptance Rate (FAR) is very low at 0% for all the fusion techniques which are crucial for security and preventing unauthorized access.</p> <p>&nbsp;</p> Jeyalakshmi Vijayarajan Copyright (c) 2024 Jeyalakshmi Vijayarajan https://creativecommons.org/licenses/by-nc-nd/4.0 https://elcvia.cvc.uab.cat/article/view/1822 Thu, 04 Jul 2024 00:00:00 +0200 Classification of radiological patterns of tuberculosis with a Convolutional neural network in x-ray images https://elcvia.cvc.uab.cat/article/view/1561 <p class="AbstractBodytext"><span lang="EN">In this paper we propose the classification of radiological patterns with the presence of tuberculosis in X-ray images, it was observed that two to six patterns (consolidation, fibrosis, opacity, opacity, pleural, nodules and cavitations) are present in the radiographs of the patients. It is important to mention that species specialists consider the type of TB pattern in order to provide appropriate treatment. It should be noted that not all medical centres have specialists who can immediately interpret radiological patterns. Considering the above, the aim is to classify patterns by means of a convolutional neural network to help make a more accurate diagnosis on X-rays, so that doctors can recommend immediate treatment and thus avoid infecting more people. For the classification of tuberculosis patterns, a proprietary convolutional neural network (CNN) was proposed and compared against the VGG16, InceptionV3 and ResNet-50 architectures, which were selected based on the results of other radiograph classification research <span style="color: black;">[1]–[3]</span> . The results obtained for the Macro-averange AUC-SVM metric for the proposed architecture and InceptionV3 were 0.80, and for VGG16 it was 0.75, and for the ResNet-50 network it was 0.79. The proposed architecture has better classification results, as does InceptionV3.</span></p> Adrian Trueba Espinosa, Jessica Sanchez -Arrazola, Jair Cervantes, Farid Garcia-Lamont, José Sergio Ruiz Castilla Copyright (c) 2024 Adrian Trueba Espinosa, Jessica Sanchez -Arrazola, Jair Cervantes, Farid Garcia-Lamont, José Sergio Ruiz Castilla https://creativecommons.org/licenses/by-nc-nd/4.0 https://elcvia.cvc.uab.cat/article/view/1561 Tue, 09 Jul 2024 00:00:00 +0200 A Multimodal Biometric Authentication System Using of Autoencoders and Siamese Networks for Enhanced Security https://elcvia.cvc.uab.cat/article/view/1811 <p>Ensuring secure and reliable identity verification is crucial, and biometric authentication plays a significant role in achieving this. However, relying on a single biometric trait, unimodal authentication, may have accuracy and attack vulnerability limitations. On the other hand, multimodal authentication, which combines multiple biometric traits, can enhance accuracy and security by leveraging their complementary strengths. In the literature, different biometric modalities, such as face, voice, fingerprint, and iris, have been studied and used extensively for user authentication. Our research introduces a highly effective multimodal biometric authentication system with a deep learning approach. Our study focuses on two of the most user-friendly safety mechanisms: face and voice recognition. We employ a convolutional autoencoder for face images and an LSTM autoencoder for voice data to extract features. These features are then combined through concatenation to form a joint feature representation. A Siamese network carries out the final step of user identification. We evaluated our model’s efficiency using the OMG-Emotion and RAVDESS datasets. We achieved an accuracy of 89.79% and 95% on RAVDESS and OMG-Emotion datasets, respectively. These results are obtained using a combination of face and voice modality.</p> Théo Gueuret, Leila Kerkeni Copyright (c) 2024 Leila Kerkeni, Théo Gueuret https://creativecommons.org/licenses/by-nc-nd/4.0 https://elcvia.cvc.uab.cat/article/view/1811 Tue, 23 Apr 2024 00:00:00 +0200