Phoneme class based feature adaptation for mismatch acoustic modeling and recognition of distant noisy speech

ULUSKAN, SEÇKİN; Sangwan, Abhijeet; Hansen, John

doi:10.1007/s10772-017-9449-6

Phoneme class based feature adaptation for mismatch acoustic modeling and recognition of distant noisy speech

Atıf İçin Kopyala

ULUSKAN S., Sangwan A., Hansen J. H. L.

International Journal of Speech Technology, cilt.20, sa.4, ss.799-811, 2017 (ESCI)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 20 Sayı: 4
Basım Tarihi: 2017
Doi Numarası: 10.1007/s10772-017-9449-6
Dergi Adı: International Journal of Speech Technology
Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Scopus
Sayfa Sayıları: ss.799-811
Anahtar Kelimeler: Phoneme class, Distant noisy speech, Mismatch acoustic modeling, Feature adaptation
Anadolu Üniversitesi Adresli: Evet

Özet

© 2017, Springer Science+Business Media, LLC.Distant speech capture in lecture halls and auditoriums offers unique challenges in algorithm development for automatic speech recognition. In this study, a new adaptation strategy for distant noisy speech is created by the means of phoneme classes. Unlike previous approaches which adapt the acoustic model to the features, the proposed phoneme-class based feature adaptation (PCBFA) strategy adapts the distant data features to the present acoustic model which was previously trained on close microphone speech. The essence of PCBFA is to create a transformation strategy which makes the distributions of phoneme-classes of distant noisy speech similar to those of a close talk microphone acoustic model in a multidimensional MFCC space. To achieve this task, phoneme-classes of distant noisy speech are recognized via artificial neural networks. PCBFA is the adaptation of features rather than adaptation of acoustic models. The main idea behind PCBFA is illustrated via conventional Gaussian mixture model–Hidden Markov model (GMM–HMM) although it can be extended to new structures in automatic speech recognition (ASR). The new adapted features together with the new and improved acoustic models produced by PCBFA are shown to outperform those created only by acoustic model adaptations for ASR and keyword spotting. PCBFA offers a new powerful understanding in acoustic-modeling of distant speech.