Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition

Latif, Siddique and Rana, Rajib ORCID: https://orcid.org/0000-0002-0506-2409 and Khalifa, Sara and Jurdak, Raja and Schuller, Bjorn (2022) Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition. IEEE Transactions on Affective Computing. pp. 1-15.


Abstract

Despite the recent advancement in speech emotion recognition (SER) within a single corpus setting, the performance of these SER systems degrades significantly for cross-corpus and cross-language scenarios. The key reason is the lack of generalisation in SER systems towards unseen conditions, which causes them to perform poorly in cross-corpus and cross-language settings. Recent studies focus on utilising adversarial methods to learn domain generalised representation for improving cross-corpus and cross-language SER to address this issue. However, many of these methods only focus on cross-corpus SER without addressing the cross-language SER performance degradation due to a larger domain gap between source and target language data. This contribution proposes an adversarial dual discriminator (ADDi) network that uses the three-players adversarial game to learn generalised representations without requiring any target data labels. We also introduce a self-supervised ADDi (sADDi) network that utilises self-supervised pre-training with unlabelled data. We propose synthetic data generation as a pretext task in sADDi, enabling the network to produce emotionally discriminative and domain invariant representations and providing complementary synthetic data to augment the system. The proposed model is rigorously evaluated using five publicly available datasets in three languages and compared with multiple studies on cross-corpus and cross-language SER. Experimental results demonstrate that the proposed model achieves improved performance.


Statistics for USQ ePrint 48117
Statistics for this ePrint Item
Item Type: Article (Commonwealth Reporting Category C)
Refereed: Yes
Item Status: Live Archive
Additional Information: Files associated with this item cannot be displayed due to copyright restrictions.
Faculty/School / Institute/Centre: Current – Faculty of Health, Engineering and Sciences - School of Mathematics, Physics and Computing (1 Jan 2022 -)
Faculty/School / Institute/Centre: Current – Faculty of Health, Engineering and Sciences - School of Mathematics, Physics and Computing (1 Jan 2022 -)
Date Deposited: 04 May 2022 23:00
Last Modified: 04 May 2022 23:00
Uncontrolled Keywords: Adaptation models; adversarial learning; Australia; domain adaptation; Emotion recognition; Generators; self-supervised learning; Speech emotion recognition; Speech recognition; Task analysis; Training
Fields of Research (2020): 46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461101 Adversarial machine learning
46 INFORMATION AND COMPUTING SCIENCES > 4608 Human-centred computing > 460802 Affective computing
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461104 Neural networks
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461103 Deep learning
Identification Number or DOI: https://doi.org/10.1109/TAFFC.2022.3167013
URI: http://eprints.usq.edu.au/id/eprint/48117

Actions (login required)

View Item Archive Repository Staff Only