Detecting outlying subspaces for high-dimensional data: a heuristic search approach

Zhang, Ji (2005) Detecting outlying subspaces for high-dimensional data: a heuristic search approach. In: 2005 SIAM International Workshop on Feature Selection for Data Mining: Interfacing Machine Learning and Statistics, 23 April 2005, Newport Beach, California, United States.

[img]
Preview
PDF (Published Version)
Zhang_FSDM'05_PV.pdf

Download (4Mb)

Abstract

[Abstract]: In this paper, we identify a new task for studying the out-lying degree of high-dimensional data, i.e. finding the sub-spaces (subset of features) in which given points are out-liers, and propose a novel detection algorithm, called High-D Outlying subspace Detection (HighDOD). We measure the outlying degree of the point using the sum of distances between this point and its k nearest neighbors. Heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been im- plemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top-down, bottom-up and random search methods. Points in these sparse subspaces are assumed to be the outliers. While knowing which data points are the outliers can be useful, in many applications, it is more important to identify the subspaces in which a given point is an outlier, which motivates the proposal of a new technique in this paper to handle this new task.


Statistics for USQ ePrint 5631
Statistics for this ePrint Item
Item Type: Conference or Workshop Item (Commonwealth Reporting Category E) (Paper)
Refereed: Yes
Item Status: Live Archive
Additional Information: No evidence of copyright restrictions.
Depositing User: Dr Ji Zhang
Faculty / Department / School: Historic - Faculty of Sciences - Department of Maths and Computing
Date Deposited: 08 Sep 2009 23:49
Last Modified: 02 Jul 2013 23:23
Uncontrolled Keywords: outlying subspaces, high-dimensional data, Heuristic search, sample-based learning
Fields of Research (FOR2008): 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
URI: http://eprints.usq.edu.au/id/eprint/5631

Actions (login required)

View Item Archive Repository Staff Only