High-fidelity audio generation and representation learning with guided adversarial autoencoder

Haque, Kazi Nazmul and Rana, Rajib ORCID: https://orcid.org/0000-0002-0506-2409 and Schuller, Bjorn W. (2020) High-fidelity audio generation and representation learning with guided adversarial autoencoder. IEEE Access, 8:9272282. pp. 223509-223528.

[img]
Preview
Text (Published Version)
09272282.pdf
Available under License Creative Commons Attribution 4.0.

Download (5MB) | Preview

Abstract

Generating high-fidelity conditional audio samples and learning representation from unlabelled audio data are two challenging problems in machine learning research. Recent advances in the Generative Adversarial Neural Networks (GAN) architectures show great promise in addressing these challenges. To learn powerful representation using GAN architecture, it requires superior sample generation quality, which requires an enormous amount of labelled data. In this paper, we address this issue by proposing Guided Adversarial Autoencoder (GAAE), which can generate superior conditional audio samples from unlabelled audio data using a small percentage of labelled data as guidance. Representation learned from unlabelled data without any supervision does not guarantee its' usability for any downstream task. On the other hand, during the representation learning, if the model is highly biased towards the downstream task, it losses its generalisation capability. This makes the learned representation hardly useful for any other tasks that are not related to that downstream task. The proposed GAAE model also address these issues. Using this superior conditional generation, GAAE can learn representation specific to the downstream task. Furthermore, GAAE learns another type of representation capturing the general attributes of the data, which is independent of the downstream task at hand. Experimental results involving the S09 and the NSynth dataset attest the superior performance of GAAE compared to the state-of-the-art alternatives.


Statistics for USQ ePrint 41405
Statistics for this ePrint Item
Item Type: Article (Commonwealth Reporting Category C)
Refereed: Yes
Item Status: Live Archive
Additional Information: This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Faculty/School / Institute/Centre: Current - Faculty of Health, Engineering and Sciences - School of Sciences (6 Sep 2019 -)
Faculty/School / Institute/Centre: Current - Faculty of Health, Engineering and Sciences - School of Sciences (6 Sep 2019 -)
Date Deposited: 17 Feb 2021 00:04
Last Modified: 11 Nov 2021 06:44
Uncontrolled Keywords: audio generation, representation learning, generative adversarial neural network, guided generative adversarial autoencoder
Fields of Research (2008): 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080199 Artificial Intelligence and Image Processing not elsewhere classified
08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080110 Simulation and Modelling
Fields of Research (2020): 46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460212 Speech recognition
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461104 Neural networks
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461103 Deep learning
Socio-Economic Objectives (2020): 22 INFORMATION AND COMMUNICATION SERVICES > 2204 Information systems, technologies and services > 220403 Artificial intelligence
Identification Number or DOI: https://doi.org/10.1109/ACCESS.2020.3040797
URI: http://eprints.usq.edu.au/id/eprint/41405

Actions (login required)

View Item Archive Repository Staff Only