Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.11851/6659
Title: | Estimating Uniqueness of I-Vector Representation of Human Voice | Authors: | Tandoğan, Sinan Erkam Sencar, Hüsrev Taha |
Keywords: | Biometrics (access control) Biological system modeling Authentication Entropy Speaker recognition Iris recognition Human voice Biometrics speaker recognition i-vector uniqueness estimation distinctiveness of a modality |
Publisher: | IEEE-Inst Electrical Electronics Engineers Inc | Abstract: | We study the individuality of the human voice with respect to a widely used feature representation of speech utterances, namely, the i-vector model. As a first step toward this goal, we compare and contrast uniqueness measures proposed for different biometric modalities. Then, we introduce a new uniqueness measure that evaluates the entropy of i-vectors while taking into account speaker level variations. Our measure operates in the discrete feature space and relies on accurate estimation of the distribution of i-vectors. Therefore, i-vectors are quantized while ensuring that both the quantized and original representations yield similar speaker verification performance. Uniqueness estimates are obtained from two newly generated datasets and the public VoxCeleb dataset. The first custom dataset contains more than one and a half million speech samples of 20,741 speakers obtained from TEDx Talks videos. The second one includes over twenty one thousand speech samples from 1,595 actors that are extracted from movie dialogues. Using this data, we analyzed how several factors, such as the number of speakers, number of samples per speaker, sample durations, and diversity of utterances affect uniqueness estimates. Most notably, we determine that the discretization of i-vectors does not cause a reduction in speaker recognition performance. Our results show that the degree of distinctiveness offered by i-vector-based representation may reach 43-70 bits considering 5-second long speech samples; however, under less constrained variations in speech, uniqueness estimates are found to reduce by around 30 bits. We also find that doubling the sample duration increases the distinctiveness of the i-vector representation by around 20 bits. | URI: | https://doi.org/10.1109/TIFS.2021.3071574 https://hdl.handle.net/20.500.11851/6659 |
ISSN: | 1556-6013 1556-6021 |
Appears in Collections: | Bilgisayar Mühendisliği Bölümü / Department of Computer Engineering Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection |
Show full item record
CORE Recommender
WEB OF SCIENCETM
Citations
1
checked on Oct 5, 2024
Page view(s)
106
checked on Dec 16, 2024
Google ScholarTM
Check
Altmetric
Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.