End-to-end recognition of streaming Japanese speech using CTC and local attention

Chen, Jiahao; Nishimura, Ryota; Kitaoka, Norihide

doi:10.1017/ATSIP.2020.23

直近一年間の累計

アクセス数 : ? 件

ダウンロード数 : ? 件

この文献の参照には次のURLをご利用ください : https://repo.lib.tokushima-u.ac.jp/115877

ID	115877
タイトル別表記	E2E SPEECH RECOGNITION WITH CTC AND LOCAL ATTENTION
著者	Chen, Jiahao Tokushima University 西村, 良太 Tokushima University 徳島大学教育研究者総覧北岡, 教英 Toyohashi University of Technology KAKEN研究者をさがす
キーワード	CTC Local attention Speech recognition Streaming recognition
資料タイプ	学術雑誌論文
抄録	Many end-to-end, large vocabulary, continuous speech recognition systems are now able to achieve better speech recognition performance than conventional systems. Most of these approaches are based on bidirectional networks and sequence-to-sequence modeling however, so automatic speech recognition (ASR) systems using such techniques need to wait for an entire segment of voice input to be entered before they can begin processing the data, resulting in a lengthy time-lag, which can be a serious drawback in some applications. An obvious solution to this problem is to develop a speech recognition algorithm capable of processing streaming data. Therefore, in this paper we explore the possibility of a streaming, online, ASR system for Japanese using a model based on unidirectional LSTMs trained using connectionist temporal classification (CTC) criteria, with local attention. Such an approach has not been well investigated for use with Japanese, as most Japanese-language ASR systems employ bidirectional networks. The best result for our proposed system during experimental evaluation was a character error rate of 9.87%.
掲載誌名	APSIPA Transactions on Signal and Information Processing
ISSN	20487703
出版者	Cambridge University Press
巻	9
開始ページ	e25
発行日	2020-11-23
権利情報	This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
EDB ID	372885
出版社版DOI	10.1017/ATSIP.2020.23
出版社版URL	https://doi.org/10.1017/ATSIP.2020.23
フルテキストファイル	atsip_9_e25.pdf 421 KB
言語	eng
著者版フラグ	出版社版
部局	理工学系