Variational autoencoders for learning latent representations of speech emotion: a preliminary study

Latif, Siddique and Rana, Rajib and Qadir, Junaid and Epps, Julien (2018) Variational autoencoders for learning latent representations of speech emotion: a preliminary study. In: 19th Annual Conference of the International Speech Communication Association: Speech Research for Emerging Markets in Multilingual Societies (INTERSPEECH 2018), 2-6 Sept 2018, Hyderabad, India.


Abstract

Learning the latent representation of data in unsupervised fashion is a very interesting process that provides relevant features for enhancing the performance of a classifier. For speech emotion recognition tasks, generating effective features is crucial. Currently, handcrafted features are mostly used for speech emotion recognition, however, features learned automatically using deep learning have shown strong success in many problems, especially in image processing. In particular, deep generative models such as Variational Autoencoders (VAEs) have gained enormous success in generating features for natural images. Inspired by this, we propose VAEs for deriving the latent representation of speech signals and use this representation to classify emotions. To the best of our knowledge, we are the first to propose VAEs for speech emotion classification. Evaluations on the IEMOCAP dataset demonstrate that features learned by VAEs can produce state-of-the-art results for speech emotion classification.


Statistics for USQ ePrint 35635
Statistics for this ePrint Item
Item Type: Conference or Workshop Item (Commonwealth Reporting Category E) (Paper)
Refereed: Yes
Item Status: Live Archive
Additional Information: Copyright 2018 International Speech Communication Association (ISCA).
Faculty/School / Institute/Centre: Current - Institute for Resilient Regions
Faculty/School / Institute/Centre: Current - Institute for Resilient Regions
Date Deposited: 18 Feb 2019 01:20
Last Modified: 08 Jun 2021 00:28
Uncontrolled Keywords: speech emotion classification, variational auto-encoders, deep learning, feature learning
Fields of Research (2008): 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
Fields of Research (2020): 46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460212 Speech recognition
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461106 Semi- and unsupervised learning
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461104 Neural networks
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461103 Deep learning
Socio-Economic Objectives (2008): E Expanding Knowledge > 97 Expanding Knowledge > 970108 Expanding Knowledge in the Information and Computing Sciences
Identification Number or DOI: doi:10.21437/Interspeech.2018-1568
URI: http://eprints.usq.edu.au/id/eprint/35635

Actions (login required)

View Item Archive Repository Staff Only