Exploiting highly qualified pattern with frequency and weight occupancy

Gan, Wensheng and Lin, Jerry Chun-Wei and Fournier-Viger, Philippe and Chao, Han-Chieh and Zhan, Justin and Zhang, Ji (2018) Exploiting highly qualified pattern with frequency and weight occupancy. Knowledge and Information Systems, 56 (1). pp. 165-196. ISSN 0219-1377

Abstract

By identifying useful knowledge embedded in the behavior of search engines, users can provide valuable information for web searching and data mining. Numerous algorithms have been proposed to find the desired interesting patterns, i.e., frequent pattern, in real-world applications. Most of those studies use frequency to measure the interestingness of patterns. However, each object may have different importance in these real-world applications, and the frequent ones do not usually contain a large portion of the desired patterns. In this paper, we present a novel method, called exploiting highly qualified patterns with frequency and weight occupancy (QFWO), to suggest the possible highly qualified patterns that utilize the idea of co-occurrence and weight occupancy. By considering item weight, weight occupancy and the frequency of patterns, in this paper, we designed a new highly qualified patterns. A novel Set-enumeration tree called the frequency-weight (FW)-tree and two compact data structures named weight-list and FW-table are designed to hold the global downward closure property and partial downward closure property of quality and weight occupancy to further prune the search space. The proposed method can exploit high qualified patterns in a recursive manner without candidate generation. Extensive experiments were conducted both on real-world and synthetic datasets to evaluate the effectiveness and efficiency of the proposed algorithm. Results demonstrate that the obtained patterns are reasonable and acceptable. Moreover, the designed QFWO with several pruning strategies is quite efficient in terms of runtime and search space.


Statistics for USQ ePrint 36142
Statistics for this ePrint Item
Item Type: Article (Commonwealth Reporting Category C)
Refereed: Yes
Item Status: Live Archive
Additional Information: Permanent restricted access to Published version, in accordance with the copyright policy of the publisher.
Faculty/School / Institute/Centre: Current - Faculty of Health, Engineering and Sciences - School of Agricultural, Computational and Environmental Sciences
Date Deposited: 08 Mar 2019 05:02
Last Modified: 12 Mar 2019 04:26
Uncontrolled Keywords: data mining; association rules; interestingness measures
Fields of Research : 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
Identification Number or DOI: 10.1007/s10115-017-1103-8
URI: http://eprints.usq.edu.au/id/eprint/36142

Actions (login required)

View Item Archive Repository Staff Only