Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11851/6659
Title: Estimating Uniqueness of I-Vector-Based Representation of Human Voice
Authors: Tandoğan, Sinan Erkam
Sencar, Hüsrev Taha
Keywords: Biometrics (access control)
Biological system modeling
Authentication
Entropy
Speaker recognition
Iris recognition
Human voice
Biometrics
speaker recognition
i-vector
uniqueness estimation
distinctiveness of a modality
Publisher: IEEE-Inst Electrical Electronics Engineers Inc
Abstract: We study the individuality of the human voice with respect to a widely used feature representation of speech utterances, namely, the i-vector model. As a first step toward this goal, we compare and contrast uniqueness measures proposed for different biometric modalities. Then, we introduce a new uniqueness measure that evaluates the entropy of i-vectors while taking into account speaker level variations. Our measure operates in the discrete feature space and relies on accurate estimation of the distribution of i-vectors. Therefore, i-vectors are quantized while ensuring that both the quantized and original representations yield similar speaker verification performance. Uniqueness estimates are obtained from two newly generated datasets and the public VoxCeleb dataset. The first custom dataset contains more than one and a half million speech samples of 20,741 speakers obtained from TEDx Talks videos. The second one includes over twenty one thousand speech samples from 1,595 actors that are extracted from movie dialogues. Using this data, we analyzed how several factors, such as the number of speakers, number of samples per speaker, sample durations, and diversity of utterances affect uniqueness estimates. Most notably, we determine that the discretization of i-vectors does not cause a reduction in speaker recognition performance. Our results show that the degree of distinctiveness offered by i-vector-based representation may reach 43-70 bits considering 5-second long speech samples; however, under less constrained variations in speech, uniqueness estimates are found to reduce by around 30 bits. We also find that doubling the sample duration increases the distinctiveness of the i-vector representation by around 20 bits.
URI: https://doi.org/10.1109/TIFS.2021.3071574
https://hdl.handle.net/20.500.11851/6659
ISSN: 1556-6013
1556-6021
Appears in Collections:Bilgisayar Mühendisliği Bölümü / Department of Computer Engineering
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Show full item record



CORE Recommender

WEB OF SCIENCETM
Citations

1
checked on Oct 5, 2024

Page view(s)

106
checked on Nov 11, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.