直近一年間の累計
アクセス数 : ?
ダウンロード数 : ?
ID 118515
著者
Liu, Zheng Tokushima University
任, 福継 University of Electronic Science and Technology of China 徳島大学 教育研究者総覧 KAKEN研究者をさがす
キーワード
Speech emotion recognition
affective computing
speech representation learning
feature fusion transformer
資料タイプ
学術雑誌論文
抄録
Speech emotion recognition has always been one of the topics that have attracted a lot of attention from many researchers. In traditional feature fusion methods, the speech features used only come from the data set, and the weak robustness of features can easily lead to overfitting of the model. In addition, these methods often use simple concatenation to fuse features, which will cause the loss of speech information. In this article, to solve the above problems and improve the recognition accuracy, we utilize self-supervised learning to enhance the robustness of speech features and propose a feature fusion model(Dual-TBNet) that consists of two 1D convolutional layers, two Transformer modules and two bidirectional long short-term memory (BiLSTM) modules. Our model uses 1D convolution to take features of different segment lengths and dimension sizes as input, uses the attention mechanism to capture the correspondence between the two features, and uses the bidirectional time series module to enhance the contextual information of the fused features. We designed a total of four fusion models to fuse five pre-trained features and acoustic features. In the comparison experiments, the Dual-TBNet model achieved a recognition accuracy and F1 score of 95.7% and 95.8% on the CASIA dataset, 66.7% and 65.6% on the eNTERFACE05 dataset, 64.8% and 64.9% on the IEMOCAP dataset, 84.1% and 84.3% on the EMO-DB dataset and 83.3% and 82.1% on the SAVEE dataset. The Dual-TBNet model effectively fuses acoustic features of different lengths and dimensions with pre-trained features, enhancing the robustness of the features, and achieved the best performance.
掲載誌名
IEEE/ACM Transactions on Audio, Speech, and Language Processing
ISSN
23299290
23299304
cat書誌ID
AA12669539
出版者
IEEE
31
開始ページ
2193
終了ページ
2203
発行日
2023-06-01
備考
論文本文は2025-06-01以降公開予定
権利情報
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
EDB ID
出版社版DOI
出版社版URL
言語
eng
著者版フラグ
その他
部局
理工学系