直近一年間の累計
アクセス数 : ?
ダウンロード数 : ?
ID 119348
著者
Wang, Linhuang Tokushima University
Ding, Fei Tokushima University
Nakagawa, Satoshi The University of Tokyo
任, 福継 University of Electronic Science and Technology of China 徳島大学 教育研究者総覧 KAKEN研究者をさがす
キーワード
Dynamic facial expression recognition
Affective computing
Transformer
Convolution neural network
資料タイプ
学術雑誌論文
抄録
Unlike conventional video action recognition, Dynamic Facial Expression Recognition (DFER) tasks exhibit minimal spatial movement of objects. Addressing this distinctive attribute, we propose an innovative CNN-Transformer model, named LSGTNet, specifically tailored for DFER tasks. Our LSGTNet comprises three stages, each composed of a spatial CNN (Spa-CNN) and a temporal transformer (T-Former) in sequential order. The Spa-CNN extracts spatial features from images, yielding smaller-sized feature maps to alleviate the computational complexity for subsequent T-Former. The T-Former integrates global temporal information from the same spatial positions across different time frames while retaining the feature map dimensions. The alternating interplay between Spa-CNN and T-Former ensures a continuous fusion of spatial and temporal information, leading our model to excel across various real-world datasets. To the best of our knowledge, this is the first method to address the DFER challenge by focusing on capturing the temporal changes in muscles within local spatial regions. Our method has achieved state-of-the-art results on multiple in-the-wild datasets and datasets under laboratory conditions.
掲載誌名
Applied Soft Computing
ISSN
15684946
18729681
cat書誌ID
AA11644645
AA11926126
出版者
Elsevier
161
開始ページ
111680
発行日
2024-05-09
備考
論文本文は2026-05-09以降公開予定
権利情報
© 2024. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
https://creativecommons.org/licenses/by-nc-nd/4.0/
EDB ID
出版社版DOI
出版社版URL
言語
eng
著者版フラグ
その他
部局
理工学系