Investigation of DNN-Based Audio-Visual Speech Recognition (Special Section on Recent Advances in Machine Learning for Spoken Language Processing)

Tamura, Satoshi; Ninomiya, Hiroshi; Kitaoka, Norihide; Osuga, Shin; Iribe, Yurie; Takeda, Kazuya; Hayamizu, Satoru

直近一年間の累計

アクセス数 : ? 件

ダウンロード数 : ? 件

この文献の参照には次のURLをご利用ください : https://repo.lib.tokushima-u.ac.jp/110586

ID	110586
著者	タムラ, サトシ Gifu University ニノミヤ, ヒロシ Nagoya University 北岡, 教英 Tokushima University KAKEN研究者をさがすオオスガ, シン Aisin Seiki Co., Ltd. イリベ, ユリエ Aichi Prefectural University タケダ, カズヤ Nagoya University ハヤミズ, サトル Gifu University
キーワード	audio-visual speech recognition deep neural network Deep Bottleneck Feature multi-stream HMM
資料タイプ	学術雑誌論文
抄録	Audio-Visual Speech Recognition (AVSR) is one of techniques to enhance robustness of speech recognizer in noisy or real environments. On the other hand, Deep Neural Networks (DNNs) have recently attracted a lot of attentions of researchers in the speech recognition field, because we can drastically improve recognition performance by using DNNs. There are two ways to employ DNN techniques for speech recognition: a hybrid approach and a tandem approach; in the hybrid approach an emission probability on each Hidden Markov Model (HMM) state is computed using a DNN, while in the tandem approach a DNN is composed into a feature extraction scheme. In this paper, we investigate and compare several DNN-based AVSR methods to mainly clarify how we should incorporate audio and visual modalities using DNNs. We carried out recognition experiments using a corpus CENSREC-1-AV, and we discuss the results to find out the best DNN-based AVSR modeling. Then it turns out that a tandem-based method using audio Deep Bottle-Neck Features (DBNFs) and visual ones with multi-stream HMMs is the most suitable, followed by a hybrid approach and another tandem scheme using audio-visual DBNFs.
掲載誌名	IEICE Transactions on Information and Systems
ISSN	17451361
cat書誌ID	AA11510321
出版者	The Institute of Electronics, Information and Communication Engineers
巻	E99-D
号	10
開始ページ	2444
終了ページ	2451
並び順	2444
発行日	2016-10-01
備考	(c)2016 The Institute of Electronics, Information and Communication Engineers IEICE Transactions Online TOP：http://search.ieice.org/
EDB ID	315662
出版社版URL	http://search.ieice.org/bin/summary.php?id=e99-d_10_2444&category=D&year=2016&lang=E&abst=
フルテキストファイル	ieice_trans_e99-d_10_2444.pdf 808 KB
言語	eng
著者版フラグ	出版社版
部局	理工学系