A study on Human Emotion Recognition in Video Images using Deep Learning

Jargalsaikhan, Orgil

直近一年間の累計

アクセス数 : ? 件

ダウンロード数 : ? 件

この文献の参照には次のURLをご利用ください : https://repo.lib.tokushima-u.ac.jp/116967

ID	116967
タイトル別表記	深層学習を用いたビデオ画像における人間の感情認識に関する研究
著者	ジャルガルサイハン, オリギル徳島大学大学院先端技術科学教育部（システム創生工学専攻）
キーワード	Emotion recognition EVM-Transformer network emotion classification video to sequence facial emotion recognition
資料タイプ	学位論文
抄録	From the beginning of this century, Artificial Intelligence (AI) has evolved to handle problems in image recognition, classification, segmentation, etc. AI learning is categorized by supervised, semi-supervised, unsupervised or reinforcement learning. Some researchers have said that the future of AI is selfawareness, which is based on reinforcement learning by rewards based on task success. Moreover, it is said that the reward would be harvested from human reactions, specially emotion recognition. On the other hand, emotion recognition is a new inspiring field, but the lack of enough amount of data for training an AI system is the major problem. Fortunately, in the near future, it will be necessary to correctly recognize human emotions because image and video dataset availability is rapidly increasing. Emotions are mental reactions (such as anger, fear, etc.) marked by relatively strong feelings and usually causing physical reactions to previous actions in a short time duration focused on specific objects. In this Work, we are focusing on emotion recognition using face, body part, and intonation. As stated earlier, automatic understanding of human emotion in a wild setting using audiovisual signals is extremely challenging. Latent continuous dimensions can be used to accomplish the analysis of human emotional states, behaviors, and reactions displayed in real-world settings. Moreover, Valence and Arousal combinations constitute well-known and effective representations of emotions. In this thesis, a new Non-inertial loss function is proposed to train emotion recognition deep learning models. It is evaluated in wild settings using four types of candidate networks with different pipelines and sequence lengths. It is then compared to the Concordance Correlation Coefficient (CCC) and Mean Squared Error (MSE) losses commonly used for training. To prove its effectiveness on efficiency and stability in continuous or non-continuous input data, experiments were performed using the Aff-Wild dataset. Encouraging results were obtained. The contributions of the proposed method Non-Inertial loss function are as follows: 1.The new loss function allows for Valence and Arousal to be viewed together. 2.Ability to train on less data. 3.Better results. 4.Faster training times. The rest of this thesis explains our motivation, the proposed methods and finally presents our results.
発行日	2022-03-01
備考	内容要旨・審査要旨・論文本文の公開
フルテキストファイル	k3579_abstract.pdf 47 KB k3579_review.pdf 55.5 KB k3579_fulltext.pdf 22.7 MB
言語	eng
著者版フラグ	博士論文全文を含む
文科省報告番号	甲第3579号
学位記番号	甲先第421号
学位授与年月日	2022-03-01
学位名	博士（工学）
学位授与機関	徳島大学