Haque, Kazi Nazmul and Rana, Rajib ORCID: https://orcid.org/0000-0002-0506-2409 and Liu, Jiajun and Hansen, John H. L. and Cummins, Nicholas and Busso, Carlos and Schuller, Bjorn W.
(2021)
Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation using Fewer Labelled Audio Data.
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29.
pp. 2575-2590.
ISSN 2329-9290
|
Text (Submitted Version)
GGAN_final_submission__PROOF.pdf Download (10MB) | Preview |
Abstract
The Generation power of Generative Adversarial Neural Networks (GANs) has shown great promise to learn representations from unlabelled data while guided by a small amount of labelled data. We aim to utilise the generation power of GANs to learn Audio Representations. Most existing studies are, however, focused on images. Some studies use GANs for speech generation, but they are conditioned on text or acoustic features, limiting their use for other audio, such as instruments, and even for speech where transcripts are limited. This paper proposes a novel GAN-based model that we named Guided Generative Adversarial Neural Network (GGAN), which can learn powerful representations and generate good-quality samples using a small amount of labelled data as guidance. Experimental results based on a speech [Speech Command Dataset (S09)] and a non-speech [Musical Instrument Sound dataset (Nsyth)] dataset demonstrate that using only 5\% of labelled data as guidance, GGAN learns significantly better representations than the state-of-the-art models.
![]() |
Statistics for this ePrint Item |
Item Type: | Article (Commonwealth Reporting Category C) |
---|---|
Refereed: | Yes |
Item Status: | Live Archive |
Faculty/School / Institute/Centre: | Historic - Faculty of Health, Engineering and Sciences - School of Sciences (6 Sep 2019 - 31 Dec 2021) |
Faculty/School / Institute/Centre: | Historic - Faculty of Health, Engineering and Sciences - School of Sciences (6 Sep 2019 - 31 Dec 2021) |
Date Deposited: | 11 Aug 2021 00:53 |
Last Modified: | 02 Nov 2021 01:33 |
Uncontrolled Keywords: | Generators, Generative adversarial networks, Spectrogram, Data models, Training, Task analysis, Speech processing |
Fields of Research (2008): | 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080199 Artificial Intelligence and Image Processing not elsewhere classified |
Fields of Research (2020): | 46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460212 Speech recognition 46 INFORMATION AND COMPUTING SCIENCES > 4603 Computer vision and multimedia computation > 460302 Audio processing |
Identification Number or DOI: | https://doi.org/10.1109/TASLP.2021.3098764 |
URI: | http://eprints.usq.edu.au/id/eprint/43207 |
Actions (login required)
![]() |
Archive Repository Staff Only |