Zhang, Ji (2005) Detecting outlying subspaces for high-dimensional data: a heuristic search approach. In: 2005 SIAM International Workshop on Feature Selection for Data Mining: Interfacing Machine Learning and Statistics, 23 April 2005, Newport Beach, California, United States.
PDF (Published Version)
[Abstract]: In this paper, we identify a new task for studying the out-lying degree of high-dimensional data, i.e. finding the sub-spaces (subset of features) in which given points are out-liers, and propose a novel detection algorithm, called High-D Outlying subspace Detection (HighDOD). We measure the outlying degree of the point using the sum of distances between this point and its k nearest neighbors. Heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been im- plemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top-down, bottom-up and random search methods. Points in these sparse subspaces are assumed to be the outliers. While knowing which data points are the outliers can be useful, in many applications, it is more important to identify the subspaces in which a given point is an outlier, which motivates the proposal of a new technique in this paper to handle this new task.
Statistics for this ePrint Item
|Item Type:||Conference or Workshop Item (Commonwealth Reporting Category E) (Paper)|
|Item Status:||Live Archive|
|Additional Information:||No evidence of copyright restrictions.|
|Depositing User:||Dr Ji Zhang|
|Faculty / Department / School:||Historic - Faculty of Sciences - Department of Maths and Computing|
|Date Deposited:||08 Sep 2009 23:49|
|Last Modified:||02 Jul 2013 23:23|
|Uncontrolled Keywords:||outlying subspaces, high-dimensional data, Heuristic search, sample-based learning|
|Fields of Research (FOR2008):||08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining|
Actions (login required)
|Archive Repository Staff Only|