Zhang, Ji and Liu, Han and Ling, Tok Wang and Bruckner, Robert and Tjoa, A. Min (2006) A framework for efficent association rule mining in XML data. Journal of Database Management (JDM), 17 (3). pp. 19-40. ISSN 1063-8016
Metadata
| HTML Citation | EndNote | Dublin Core | Reference Manager |
Full text available as:
| PDF (Accepted Version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader 437Kb |
Abstract
[Abstract]: In this paper, we propose a framework, called XAR-Miner, for mining ARs from XML documents efficiently. In XAR-Miner, raw data in the XML document are first preprocessed to transform to either an Indexed XML Tree (IX-tree) or Multi-relational Databases (Multi-DB), depending on the size of XML document and memory constraint of the system, for efficient data selection and AR mining. Concepts that are relevant to the AR mining task are generalized to produce generalized meta-patterns. A suitable metric is devised for measuring the degree of concept generalization in order to prevent under-generalization or over-generalization. Resulting generalized meta-patterns are used to generate large ARs that meet the support and confidence levels. A greedy algorithm is also presented to integrate data selection and large itemset generation to enhance the efficiency of the AR mining process. The experiments conducted show that XAR-Miner is more efficient in performing a large number of AR mining tasks from XML documents than the state-of-the-art method of repetitively scanning through XML documents in order to perform each of the mining tasks.
| Item Type: | Article (Commonwealth Reporting Category C) |
|---|---|
| Additional Information: | Deposited with blanket permission of publisher. |
| Uncontrolled Keywords: | association rule mining, XML data, data transformation and indexing, concept generalization, meta patterns |
| Fields of Research (FOR2008): | 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining |
| Subjects: | 280000 Information, Computing and Communication Sciences |
| Socio-Economic Objective (SEO2008): | UNSPECIFIED |
| ID Code: | 5629 |
| Deposited By: | |
| Deposited On: | 24 Sep 2009 15:26 |
| Last Modified: | 01 Feb 2012 10:32 |
Archive Staff Only: edit this record
