Survey of deep representation learning for speech emotion recognition

Latif, Siddique and Rana, Rajib ORCID: https://orcid.org/0000-0002-0506-2409 and Khalifa, Sara and Jurdak, Raja and Qadir, Junaid and Schuller, Bjorn (2021) Survey of deep representation learning for speech emotion recognition. IEEE Transactions on Affective Computing.


Abstract

Traditionally, speech emotion recognition (SER) research has relied on manually handcrafted acoustic features using feature engineering. However, the design of handcrafted features for complex SER tasks requires significant manual eort, which impedes generalisability and slows the pace of innovation. This has motivated the adoption of representation learning techniques that can automatically learn an intermediate representation of the input signal without any manual feature engineering. Representation learning has led to improved SER performance and enabled rapid innovation. Its effectiveness has further increased with advances in deep learning (DL), which has facilitated \textit{deep representation learning} where hierarchical representations are automatically learned in a data-driven manner. This paper presents the first comprehensive survey on the important topic of deep representation learning for SER. We highlight various techniques, related challenges and identify important future areas of research. Our survey bridges the gap in the literature since existing surveys either focus on SER with hand-engineered features or representation learning in the general setting without focusing on SER.


Statistics for USQ ePrint 43695
Statistics for this ePrint Item
Item Type: Article (Commonwealth Reporting Category C)
Refereed: Yes
Item Status: Live Archive
Additional Information: Published online. Permanent restricted access to ArticleFirst version, in accordance with the copyright policy of the publisher.
Faculty/School / Institute/Centre: Current - Faculty of Health, Engineering and Sciences - School of Sciences (6 Sep 2019 -)
Faculty/School / Institute/Centre: Current - Faculty of Health, Engineering and Sciences - School of Sciences (6 Sep 2019 -)
Date Deposited: 27 Sep 2021 00:15
Last Modified: 04 Nov 2021 02:29
Uncontrolled Keywords: speech emotion recognition, multi task learning, representation learning, domain adaptation, unsupervised learning
Fields of Research (2008): 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080105 Expert Systems
08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080107 Natural Language Processing
Fields of Research (2020): 46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461101 Adversarial machine learning
46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460212 Speech recognition
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461102 Context learning
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461105 Reinforcement learning
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461106 Semi- and unsupervised learning
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461104 Neural networks
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461103 Deep learning
Identification Number or DOI: https://doi.org/10.1109/TAFFC.2021.3114365
URI: http://eprints.usq.edu.au/id/eprint/43695

Actions (login required)

View Item Archive Repository Staff Only