Total for the last 12 months
number of access : ?
number of downloads : ?
ID 110586
Author
Tamura, Satoshi Gifu University
Ninomiya, Hiroshi Nagoya University
Osuga, Shin Aisin Seiki Co., Ltd.
Iribe, Yurie Aichi Prefectural University
Takeda, Kazuya Nagoya University
Hayamizu, Satoru Gifu University
Keywords
audio-visual speech recognition
deep neural network
Deep Bottleneck Feature
multi-stream HMM
Content Type
Journal Article
Description
Audio-Visual Speech Recognition (AVSR) is one of techniques to enhance robustness of speech recognizer in noisy or real environments. On the other hand, Deep Neural Networks (DNNs) have recently attracted a lot of attentions of researchers in the speech recognition field, because we can drastically improve recognition performance by using DNNs. There are two ways to employ DNN techniques for speech recognition: a hybrid approach and a tandem approach; in the hybrid approach an emission probability on each Hidden Markov Model (HMM) state is computed using a DNN, while in the tandem approach a DNN is composed into a feature extraction scheme. In this paper, we investigate and compare several DNN-based AVSR methods to mainly clarify how we should incorporate audio and visual modalities using DNNs. We carried out recognition experiments using a corpus CENSREC-1-AV, and we discuss the results to find out the best DNN-based AVSR modeling. Then it turns out that a tandem-based method using audio Deep Bottle-Neck Features (DBNFs) and visual ones with multi-stream HMMs is the most suitable, followed by a hybrid approach and another tandem scheme using audio-visual DBNFs.
Journal Title
IEICE Transactions on Information and Systems
ISSN
17451361
NCID
AA11510321
Publisher
The Institute of Electronics, Information and Communication Engineers
Volume
E99-D
Issue
10
Start Page
2444
End Page
2451
Sort Key
2444
Published Date
2016-10-01
Remark
(c)2016 The Institute of Electronics, Information and Communication Engineers
IEICE Transactions Online TOP:http://search.ieice.org/
EDB ID
URL ( Publisher's Version )
FullText File
language
eng
TextVersion
Publisher
departments
Science and Technology