Estimating Uniqueness of I-Vector Representation of Human Voice

Tandoğan, Sinan Erkam; Sencar, Hüsrev Taha

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11851/6659

Title:	Estimating Uniqueness of I-Vector Representation of Human Voice
Authors:	Tandoğan, Sinan Erkam Sencar, Hüsrev Taha
Keywords:	Biometrics (access control) Biological system modeling Authentication Entropy Speaker recognition Iris recognition Human voice Biometrics speaker recognition i-vector uniqueness estimation distinctiveness of a modality
Publisher:	IEEE-Inst Electrical Electronics Engineers Inc
Abstract:	We study the individuality of the human voice with respect to a widely used feature representation of speech utterances, namely, the i-vector model. As a first step toward this goal, we compare and contrast uniqueness measures proposed for different biometric modalities. Then, we introduce a new uniqueness measure that evaluates the entropy of i-vectors while taking into account speaker level variations. Our measure operates in the discrete feature space and relies on accurate estimation of the distribution of i-vectors. Therefore, i-vectors are quantized while ensuring that both the quantized and original representations yield similar speaker verification performance. Uniqueness estimates are obtained from two newly generated datasets and the public VoxCeleb dataset. The first custom dataset contains more than one and a half million speech samples of 20,741 speakers obtained from TEDx Talks videos. The second one includes over twenty one thousand speech samples from 1,595 actors that are extracted from movie dialogues. Using this data, we analyzed how several factors, such as the number of speakers, number of samples per speaker, sample durations, and diversity of utterances affect uniqueness estimates. Most notably, we determine that the discretization of i-vectors does not cause a reduction in speaker recognition performance. Our results show that the degree of distinctiveness offered by i-vector-based representation may reach 43-70 bits considering 5-second long speech samples; however, under less constrained variations in speech, uniqueness estimates are found to reduce by around 30 bits. We also find that doubling the sample duration increases the distinctiveness of the i-vector representation by around 20 bits.
URI:	https://doi.org/10.1109/TIFS.2021.3071574 https://hdl.handle.net/20.500.11851/6659
ISSN:	1556-6013 1556-6021
Appears in Collections:	Bilgisayar Mühendisliği Bölümü / Department of Computer Engineering Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Show full item record

CORE Recommender

SCOPUS^TM
Citations

3

checked on Sep 6, 2025

WEB OF SCIENCE^TM
Citations

1

checked on Sep 6, 2025

Page view(s)

182

checked on Sep 8, 2025

Google Scholar^TM

Check

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM