Li, Jiuyong and Huang, Xiaodi and Selke, Clinton and Yong, Jianming (2007) A fast algorithm for finding correlation clusters in noise data. In: 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining: Advances in Knowledge Discovery and Data Mining (PAKDD2007, 22-25 May 2007, Nanjing, China.
|HTML Citation||EndNote||Dublin Core||Reference Manager|
This is the latest version of this eprint.
Full text available as:
|PDF (Accepted Version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader|
|PDF (Published Version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader|
Official URL: http://www.informatik.uni-trier.de/~ley/db/conf/pakdd/
[Abstract]: Noise significantly affects cluster quality. Conventional clustering methods hardly detect clusters in a data set containing a large amount of noise. Projected clustering sheds light on identifying correlation clusters in such a data set. In order to exclude noise points which are usually scattered in a subspace, data points are projected to form dense areas in the subspace that are regarded as correlation clusters. However, we found that the existing methods for the projected clustering did not work very well with noise data, since they employ randomly generated seeds (micro clusters) to trade-off the clustering quality. In this paper, we propose a divisive method for the projected clustering that does not rely on random seeds. The proposed algorithm is capable of producing higher quality correlation clusters from noise data in a more efficient way than an agglomeration projected algorithm. We experimentally show that our algorithm captures correlation clusters in noise data better than a well-known projected clustering method.
|Item Type:||Conference or Workshop Item (Commonwealth Reporting Category E) (Paper)|
|Additional Information:||Copyright 2007 Springer. This is the author's version of a paper published in the series Lecture Notes in Artificial Intelligence, v. 4425, 2007. Author's version deposited in accordance with the copyright policy of the publisher, Springer.|
|Uncontrolled Keywords:||generalised projected clustering; SVD decomposition|
|Fields of Research (FOR2008):||08 Information and Computing Sciences > 0806 Information Systems > 080604 Database Management|
|Subjects:||280000 Information, Computing and Communication Sciences > 280100 Information Systems > 280108 Database Management|
|Socio-Economic Objective (SEO2008):||UNSPECIFIED|
|Deposited On:||11 Oct 2007 11:15|
|Last Modified:||29 Feb 2012 14:08|
Available Versions of this Item
- A fast algorithm for finding correlation clusters in noise data. (deposited 18 Jun 2007)
- A fast algorithm for finding correlation clusters in noise data. (deposited 11 Oct 2007 11:15) [Currently Displayed]
Archive Staff Only: edit this record