Identifying informative tweets during a pandemic via a topic‑aware neural language model

Gao, Wang and Li, Lin and Tao, Xiaohui ORCID: https://orcid.org/0000-0002-0020-077X and Zhou, Jing and Tao, Jun (2022) Identifying informative tweets during a pandemic via a topic‑aware neural language model. World Wide Web. pp. 1-16. ISSN 1386-145X


Abstract

Every epidemic affects the real lives of many people around the world and leads to terrible consequences. Recently, many tweets about the COVID-19 pandemic have been shared publicly on social media platforms. The analysis of these tweets is helpful for emergency response organizations to prioritize their tasks and make better decisions. However, most of these tweets are non-informative, which is a challenge for establishing an automated system to detect useful information in social media. Furthermore, existing methods ignore unlabeled data and topic background knowledge, which can provide additional semantic information. In this paper, we propose a novel Topic-Aware BERT (TABERT) model to solve the above challenges. TABERT first leverages a topic model to extract the latent topics of tweets. Secondly, a flexible framework is used to combine topic information with the output of BERT. Finally, we adopt adversarial training to achieve semi-supervised learning, and a large amount of unlabeled data can be used to improve inner representations of the model. Experimental results on the dataset of COVID-19 English tweets show that our model outperforms classic and state-of-the-art baselines.


Statistics for USQ ePrint 48453
Statistics for this ePrint Item
Item Type: Article (Commonwealth Reporting Category C)
Refereed: Yes
Item Status: Live Archive
Additional Information: Files associated with this item cannot be displayed due to copyright restrictions.
Faculty/School / Institute/Centre: Historic - Faculty of Health, Engineering and Sciences - School of Sciences (6 Sep 2019 - 31 Dec 2021)
Faculty/School / Institute/Centre: Historic - Faculty of Health, Engineering and Sciences - School of Sciences (6 Sep 2019 - 31 Dec 2021)
Date Deposited: 12 May 2022 01:41
Last Modified: 22 Jun 2022 02:52
Uncontrolled Keywords: Adversarial training; Informative tweet identification; Social media; Topic model
Fields of Research (2020): 46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461101 Adversarial machine learning
46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460208 Natural language processing
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461104 Neural networks
Socio-Economic Objectives (2020): 28 EXPANDING KNOWLEDGE > 2801 Expanding knowledge > 280115 Expanding knowledge in the information and computing sciences
22 INFORMATION AND COMMUNICATION SERVICES > 2204 Information systems, technologies and services > 220403 Artificial intelligence
Identification Number or DOI: https://doi.org/10.1007/s11280-022-01034-1
URI: http://eprints.usq.edu.au/id/eprint/48453

Actions (login required)

View Item Archive Repository Staff Only