A Preliminary Examination Technique for Audio Evidence To Distinguish Speech From Non-Speech Using Objective Speech Quality Measures

Uzun, Erkam; Sencar, Hüsrev Taha

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11851/1137

Title:	A Preliminary Examination Technique for Audio Evidence To Distinguish Speech From Non-Speech Using Objective Speech Quality Measures
Authors:	Uzun, Erkam Sencar, Hüsrev Taha
Keywords:	Preliminary Analysis Of Audio Evidence Speech And Non-Speech Discrimination Objective Speech Quality Assessment Audio Encoding Audio Effects Surveillance
Publisher:	Elsevier
Source:	Uzun, E., & Sencar, H. T. (2014). A preliminary examination technique for audio evidence to distinguish speech from non-speech using objective speech quality measures. Speech Communication, 61, 1-16.
Abstract:	Forensic practitioners are faced more and more with large volumes of data. Therefore, there is a growing need for computational techniques to aid in evidence collection and analysis. With this study, we introduce a technique for preliminary analysis of audio evidence to discriminate between speech and non-speech. The novelty of our approach lies in the use of well-established speech quality measures for characterizing speech signals. These measures rely on models of human perception of speech to provide objective and reliable measurements of changes in characteristics that influence speech quality. We utilize this capability to compute quality scores between an audio and its noise-suppressed version and to model variations of these scores in speech as compared to those in non-speech audio. Tests performed on 11 datasets with widely varying characteristics show that the technique has a high discrimination capability, achieving an identification accuracy of 96 to 99% in most test cases, and offers good generalization properties across different datasets. Results also reveal that the technique is robust against encoding at low bit-rates, application of audio effects and degradations due to varying degrees of background noise. Performance comparisons made with existing studies show that the proposed method improves the state-of-the-art in audio content identification. (C) 2014 Elsevier B.V. All rights reserved.
URI:	https://www.sciencedirect.com/science/article/pii/S016763931400017X?via%3Dihub https://hdl.handle.net/20.500.11851/1137
ISSN:	0167-6393
Appears in Collections:	Bilgisayar Mühendisliği Bölümü / Department of Computer Engineering Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Show full item record

CORE Recommender

SCOPUS^TM
Citations

5

checked on Sep 6, 2025

WEB OF SCIENCE^TM
Citations

3

checked on Sep 6, 2025

Page view(s)

178

checked on Sep 8, 2025

Google Scholar^TM

Check

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM