High average-utility sequential pattern mining based on uncertain databases

Lin, Jerry Chun-Wei and Li, Ting and Pirouz, Matin and Zhang, Ji and Fournier-Viger, Philippe (2019) High average-utility sequential pattern mining based on uncertain databases. Knowledge and Information Systems. ISSN 0219-1377


Abstract

emergence and proliferation of the internet of things (IoT) devices have resulted in the generation of big and uncertain data due to the varied accuracy and decay of sensors and their different sensitivity ranges. Since data uncertainty plays an important role in IoT data, mining the useful information from uncertain dataset has become an important issue in recent decades. Past works focus on mining the high sequential patterns from the uncertain database. However, the utility of a derived sequence increases along with the size of the sequence, which is an unfair measure to evaluate the utility of a sequence since any combination of a high-utility sequence will also be the high-utility sequence, even though the utility of a sequence is merely low. In this paper, we address the limitation of the previous potential high-utility sequential pattern mining and present a potentially high average-utility sequential pattern mining framework for discovering the set of potentially high average-utility sequential patterns (PHAUSPs) from the uncertain dataset by considering the size of a sequence, which can provide a fair measure of the patterns than the previous works. First, a baseline potentially high average-utility sequential pattern algorithm and three pruning strategies are introduced to completely mine the set of the desired PHAUSPs. To reduce the computational cost and accelerate the mining process, a projection algorithm called PHAUP is then designed, which leads to a reduction in the size of candidates of the desired patterns. Several experiments in terms of runtime, number of candidates, memory overhead, number of discovered pattern, and scalability are then evaluated on both real-life and artificial datasets, and the results showed that the proposed algorithm achieves promising performance, especially the PHAUP approach.


Statistics for USQ ePrint 38123
Statistics for this ePrint Item
Item Type: Article (Commonwealth Reporting Category C)
Refereed: Yes
Item Status: Live Archive
Additional Information: Published online: 22 July 2019. Permanent restricted access to ArticleFirst version, in accordance with the copyright policy of the publisher.
Faculty/School / Institute/Centre: Current - Faculty of Health, Engineering and Sciences - School of Sciences (6 Sept 2019 -)
Faculty/School / Institute/Centre: Historic - Institute for Resilient Regions - Centre for Health, Informatics and Economic Research (1 Aug 2018 - 31 Mar 2020)
Date Deposited: 19 Feb 2020 01:47
Last Modified: 24 Feb 2020 04:25
Uncontrolled Keywords: data mining, high average-utility sequential pattern mining, sequential patterns, uncertain database
Fields of Research (2008): 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
Identification Number or DOI: 10.1007/s10115-019-01385-8
URI: http://eprints.usq.edu.au/id/eprint/38123

Actions (login required)

View Item Archive Repository Staff Only