ELCVIA Electronic Letters on Computer Vision and Image Analysis https://elcvia.cvc.uab.cat/ Electronic Journal on Computer Vision and Image Analysis en-US Authors who publish with this journal agree to the following terms:<br /><ol type="a"><li>Authors retain copyright.</li><li>The texts published in this journal are – unless indicated otherwise – covered by the Creative Commons Spain <a href="http://creativecommons.org/licenses/by-nc-nd/4.0">Attribution-NonComercial-NoDerivatives 4.0</a> licence. You may copy, distribute, transmit and adapt the work, provided you attribute it (authorship, journal name, publisher) in the manner specified by the author(s) or licensor(s). The full text of the licence can be consulted here: <a href="http://creativecommons.org/licenses/by-nc-nd/4.0">http://creativecommons.org/licenses/by-nc-nd/4.0</a>.</li><li>Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.</li><li>Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See <a href="http://opcit.eprints.org/oacitation-biblio.html" target="_new">The Effect of Open Access</a>).</li></ol> elcvia@cvc.uab.cat (Electronic Letters on Computer Vision and Image Analysis) elcvia@cvc.uab.cat (ELCVIA) Tue, 23 Apr 2024 10:14:40 +0200 OJS 3.2.1.4 http://blogs.law.harvard.edu/tech/rss 60 Off-line identifying Script Writers by Swin Transformers and ResNeSt-50 https://elcvia.cvc.uab.cat/article/view/1787 <p>In this work, we present two advanced models for identifying script writers, leveraging the power of deep learning. The proposed systems utilize the new vision Swin Transformer and ResNeSt-50. Swin Transformer is known for its robustness to variations and ability to model long-range dependencies, which helps capture context and make robust predictions. Through extensive training on large datasets of handwritten text samples, the Swin Transformer operates on sequences of image patches and learns to establish a robust representation of each writer’s unique style. On the other hand, ResNeSt-50 (Residual Neural Network with Squeeze-and-Excitation (SE) and Next Stage modules), with its multiple layers, helps in learning complex representations of a writer’s unique style and distinguishing between different writing styles with high precision. The SE module within ResNeSt helps the model focus on distinctive handwriting characteristics and reduce noise. The experimental results demonstrate exceptional performance, achieving an accuracy of 98.50% (at patch level) by the Swin Transformer on the CVL database, which consists of images with<br>cursively handwritten German and English texts, and an accuracy of 96.61% (at page level) by ResNeSt-50 on the same database. This research advances writer identification by showcasing the effectiveness of the Swin Transformer and ResNeSt-50. The achieved accuracy underscores the potential of these models to process and understand complex handwriting effectively.</p> Afef Kacem Echi, Takwa Ben Aïcha Gader Copyright (c) 2024 Afef Kacem Echi, Takwa Ben Aïcha Gader https://creativecommons.org/licenses/by-nc-nd/4.0 https://elcvia.cvc.uab.cat/article/view/1787 Mon, 03 Jun 2024 00:00:00 +0200 A Multimodal Biometric Authentication System Using of Autoencoders and Siamese Networks for Enhanced Security https://elcvia.cvc.uab.cat/article/view/1811 <p>Ensuring secure and reliable identity verification is crucial, and biometric authentication plays a significant role in achieving this. However, relying on a single biometric trait, unimodal authentication, may have accuracy and attack vulnerability limitations. On the other hand, multimodal authentication, which combines multiple biometric traits, can enhance accuracy and security by leveraging their complementary strengths. In the literature, different biometric modalities, such as face, voice, fingerprint, and iris, have been studied and used extensively for user authentication. Our research introduces a highly effective multimodal biometric authentication system with a deep learning approach. Our study focuses on two of the most user-friendly safety mechanisms: face and voice recognition. We employ a convolutional autoencoder for face images and an LSTM autoencoder for voice data to extract features. These features are then combined through concatenation to form a joint feature representation. A Siamese network carries out the final step of user identification. We evaluated our model’s efficiency using the OMG-Emotion and RAVDESS datasets. We achieved an accuracy of 89.79% and 95% on RAVDESS and OMG-Emotion datasets, respectively. These results are obtained using a combination of face and voice modality.</p> Théo Gueuret, Leila Kerkeni Copyright (c) 2024 Leila Kerkeni, Théo Gueuret https://creativecommons.org/licenses/by-nc-nd/4.0 https://elcvia.cvc.uab.cat/article/view/1811 Tue, 23 Apr 2024 00:00:00 +0200