Emotion Intensity and its Control for Emotional Voice Conversion

Zhou, Kun and Sisman, Berrak and Rana, Rajib ORCID: https://orcid.org/0000-0002-0506-2409 and Schuller, Bjorn W. and Li, Haizhou (2022) Emotion Intensity and its Control for Emotional Voice Conversion. IEEE Transactions on Affective Computing. pp. 1-18.

[img]
Preview
Text (Published - ArticleFirst Version)
IEEE_trans_on_Affective_Computing__Kun.pdf
Available under License Creative Commons Attribution 4.0.

Download (3MB) | Preview

Abstract

Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity. In EVC, emotions are usually treated as discrete categories overlooking the fact that speech also conveys emotions with various intensity levels that the listener can perceive. In this work, we aim to explicitly characterize and control the intensity of emotion. We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding. We further learn the actual emotion encoder from an emotion-labelled database and study the use of relative attributes to represent fine-grained emotion intensity. To ensure emotional intelligibility, we incorporate emotion classification loss and emotion embedding similarity loss into the training of the EVC network. As desired, the proposed network controls the fine-grained emotion intensity in the output speech. Through both objective and subjective evaluations, we validate the effectiveness of the proposed network for emotional expressiveness and emotion intensity control.


Statistics for USQ ePrint 48525
Statistics for this ePrint Item
Item Type: Article (Commonwealth Reporting Category C)
Refereed: Yes
Item Status: Live Archive
Faculty/School / Institute/Centre: Current – Faculty of Health, Engineering and Sciences - School of Mathematics, Physics and Computing (1 Jan 2022 -)
Faculty/School / Institute/Centre: Current – Faculty of Health, Engineering and Sciences - School of Mathematics, Physics and Computing (1 Jan 2022 -)
Date Deposited: 23 May 2022 05:11
Last Modified: 23 May 2022 05:11
Uncontrolled Keywords: Emotional voice conversion, emotion intensity, sequence-to-sequence, perceptual loss, limited data, relative attribute
Fields of Research (2020): 46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460211 Speech production
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461103 Deep learning
Identification Number or DOI: https://doi.org/10.1109/TAFFC.2022.3175578
URI: http://eprints.usq.edu.au/id/eprint/48525

Actions (login required)

View Item Archive Repository Staff Only