Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance

Zhang, Ji and Wang, Hai (2006) Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance. Knowledge and Information Systems, 10 (3). pp. 333-355. ISSN 0219-1377

Metadata

HTML CitationEndNoteDublin CoreReference Manager

Full text available as:

[img]
Preview
PDF (Accepted Version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
213Kb

Official URL: http://www.springerlink.com/content/x5153138n2r3/?p=35b6418ceb544734afde005b9d9ed794&pi=31

Abstract

[Abstract]: In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features) in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and its k nearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top–down, bottom–up and random search methods, and the existing outlier detection methods cannot fulfill this new task effectively.

Item Type:Article (Commonwealth Reporting Category C)
Additional Information:Author's version deposited in accordance with the copyright policy of the publisher. The original publication is available at www.springerlink.com. Publisher permission for book chapters in email dated 21/8/07
Uncontrolled Keywords:outlying subspace; high-dimensional data; outlier detection; dynamic subspace search
Fields of Research (FOR2008):08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
Subjects:280000 Information, Computing and Communication Sciences
Socio-Economic Objective (SEO2008):UNSPECIFIED
ID Code:5642
Deposited By:
Deposited On:28 Sep 2009 15:11
Last Modified:23 Nov 2011 09:53

Archive Staff Only: edit this record