On the effectiveness of gene selection for microarray classification methods

Zhang, Zhongwei and Li, Jiuyong and Hu, Hong and Zhou, Hong (2010) On the effectiveness of gene selection for microarray classification methods. In: 2nd Asian Conference on Intelligent Information and Database Systems (ACIIDS 2010), 24-26 Mar 2010, Hue City, Vietnam.

Text (Accepted Version)

Download (447Kb)


Microarray data usually contains a high level of noisy gene
data, the noisy gene data include incorrect, noise and irrelevant genes. Before Microarray data classification takes place, it is desirable to eliminate as much noisy data as possible. An approach to improving the accuracy and efficiency of Microarray data classification is to make a small selection from the large volume of high dimensional gene expression dataset. An effective gene selection helps to clean up the existing Microarray data and therefore the quality of Microarray data has been improved. In this paper, we study the effectiveness of the gene selection technology for Microarray classification methods. We have conducted some experiments on the effectiveness of gene selection for Microarray classification methods such as two benchmark algorithms: SVMs and C4.5. We observed that although in general the performance of SVMs and C4.5 are improved by using the preprocessed datasets rather than the original data sets in terms of accuracy and efficiency, while an inappropriate choice of gene data can only be detrimental to the power of prediction. Our results also implied that with preprocessing, the number of genes selected affects the classification accuracy.

Statistics for USQ ePrint 7232
Statistics for this ePrint Item
Item Type: Conference or Workshop Item (Commonwealth Reporting Category E) (Paper)
Refereed: Yes
Item Status: Live Archive
Additional Information: Author's version deposited in accordance with the copyright policy of the publisher. Copyright 2010 Springer. This is the author's version of a paper published in the series Lecture Notes in Artificial Intelligence, v. 5991, 2010. Author's version deposited in accordance with the copyright policy of the publisher, Springer.
Faculty / Department / School: Historic - Faculty of Sciences - Department of Maths and Computing
Date Deposited: 13 Jul 2010 00:35
Last Modified: 05 Sep 2014 05:15
Uncontrolled Keywords: classification accuracy; clean up; data sets; gene selection; high-dimensional; microarray classification; microarray data; noisy data
Fields of Research : 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
11 Medical and Health Sciences > 1117 Public Health and Health Services > 111711 Health Information Systems (incl. Surveillance)
06 Biological Sciences > 0604 Genetics > 060405 Gene Expression (incl. Microarray and other genome-wide approaches)
Socio-Economic Objective: C Society > 92 Health > 9204 Public Health (excl. Specific Population Health) > 920413 Social Structure and Health
Identification Number or DOI: 10.1007/978-3-642-12101-2_31
URI: http://eprints.usq.edu.au/id/eprint/7232

Actions (login required)

View Item Archive Repository Staff Only