Using back-and-forth translation to create artificial augmented textual data for sentiment analysis models

Body, Thomas and Tao, Xiaohui ORCID: https://orcid.org/0000-0002-0020-077X and Li, Yuefeng and Li, Lin and Zhong, Ning (2021) Using back-and-forth translation to create artificial augmented textual data for sentiment analysis models. Expert Systems with Applications, 178:115033. pp. 1-12. ISSN 0957-4174


Abstract

Sentiment analysis classification models trained using neural networks require large amounts of data, but collecting these datasets requires significant time and resources. Although artificial data has been used successfully in computer vision, there are few effective and generalizable methods for creating artificial augmented text data. In this paper, a text based data augmentation method is proposed called back-and-forth translation that can be used to artificially increase the size of any natural language dataset. By creating augmented text data and adding it to the original dataset, it is demonstrated by empirical experiments that back-and-forth translation data augmentation can reduce the error rate in binary sentiment classification models by up to 3.4%. These results are shown to be statistically significant.


Statistics for USQ ePrint 42111
Statistics for this ePrint Item
Item Type: Article (Commonwealth Reporting Category C)
Refereed: Yes
Item Status: Live Archive
Additional Information: Permanent restricted access to Published version in accordance with the copyright policy of the publisher.
Faculty/School / Institute/Centre: Current - Faculty of Health, Engineering and Sciences - School of Sciences (6 Sep 2019 -)
Faculty/School / Institute/Centre: Current - Faculty of Health, Engineering and Sciences - School of Sciences (6 Sep 2019 -)
Date Deposited: 02 Jun 2021 02:49
Last Modified: 02 Jun 2021 02:49
Uncontrolled Keywords: Natural language processing; Translation; Sentiment analysis; Data augmentation
Fields of Research (2008): 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080107 Natural Language Processing
Fields of Research (2020): 46 INFORMATION AND COMPUTING SCIENCES > 4605 Data management and data science > 460507 Information extraction and fusion
46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460208 Natural language processing
46 INFORMATION AND COMPUTING SCIENCES > 4605 Data management and data science > 460502 Data mining and knowledge discovery
Socio-Economic Objectives (2008): E Expanding Knowledge > 97 Expanding Knowledge > 970108 Expanding Knowledge in the Information and Computing Sciences
Socio-Economic Objectives (2020): 28 EXPANDING KNOWLEDGE > 2801 Expanding knowledge > 280115 Expanding knowledge in the information and computing sciences
22 INFORMATION AND COMMUNICATION SERVICES > 2204 Information systems, technologies and services > 220403 Artificial intelligence
Identification Number or DOI: https://doi.org/10.1016/j.eswa.2021.115033
URI: http://eprints.usq.edu.au/id/eprint/42111

Actions (login required)

View Item Archive Repository Staff Only