Augmenting generative adversarial networks for speech emotion recognition

Latif, Siddique and Asim, Muhammad and Rana, Rajib ORCID: https://orcid.org/0000-0002-0506-2409 and Khalifa, Sara and Jurdak, Raja and Schuller, Bjorn W. (2020) Augmenting generative adversarial networks for speech emotion recognition. In: 21st Annual Conference of the International Speech Communication Association: Cognitive Intelligence for Speech Processing (INTERSPEECH 2020), 25–29 Oct 2020, Shanghai, China.

[img] Text (Published Version)
3194.pdf
Restricted


Abstract

Generative adversarial networks (GANs) have shown potential in learning emotional attributes and generating new data samples. However, their performance is usually hindered by the unavailability of larger speech emotion recognition (SER) data. In this work, we propose a framework that utilises the mixup data augmentation scheme to augment the GAN in feature learning and generation. To show the effectiveness of the proposed framework, we present results for SER on (i) synthetic feature vectors, (ii) augmentation of the training data with synthetic features, (iii) encoded features in compressed representation. Our results show that the proposed framework can effectively learn compressed emotional representations as well as it can generate synthetic samples that help improve performance in within-corpus and cross-corpus evaluation.


Statistics for USQ ePrint 41410
Statistics for this ePrint Item
Item Type: Conference or Workshop Item (Commonwealth Reporting Category E) (Paper)
Refereed: Yes
Item Status: Live Archive
Additional Information: Copyright © 2020 ISCA.
Faculty/School / Institute/Centre: Current - Faculty of Health, Engineering and Sciences - School of Sciences (6 Sep 2019 -)
Faculty/School / Institute/Centre: Current - Institute for Resilient Regions
Date Deposited: 17 Feb 2021 04:44
Last Modified: 08 Jun 2021 00:29
Uncontrolled Keywords: speech emotion recognition, mixup, data augmentation, generative adversarial networks, feature learning, synthetic feature generation.
Fields of Research (2008): 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080105 Expert Systems
08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080107 Natural Language Processing
Fields of Research (2020): 46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460212 Speech recognition
46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460208 Natural language processing
Socio-Economic Objectives (2008): C Society > 92 Health > 9202 Health and Support Services > 920209 Mental Health Services
C Society > 92 Health > 9202 Health and Support Services > 920203 Diagnostic Methods
C Society > 92 Health > 9202 Health and Support Services > 920299 Health and Support Services not elsewhere classified
Socio-Economic Objectives (2020): 20 HEALTH > 2003 Provision of health and support services > 200310 Primary care
20 HEALTH > 2003 Provision of health and support services > 200303 Health surveillance
Identification Number or DOI: doi:10.21437/Interspeech.2020-3194
URI: http://eprints.usq.edu.au/id/eprint/41410

Actions (login required)

View Item Archive Repository Staff Only