Hu, Hong and Li, Jiuyong and Wang, Hua and Daggard, Grant and Wang, Li-Zhen (2008) Robustness analysis of diversified ensemble decision tree algorithms for microarray data classification. In: ICMLC 2008: 7th International Conference on Machine Learning and Cybernetics, 12-15 Jul 2008, Kunming, China.
PDF (Accepted Version)
Ensemble classification methods have shown promise for achieving higher classification accuracy for Microarray data classification analysis. As noise values do exist in all Microarray data even after Microarray data preprocessing stage, robustness is therefore another very important criteria in addition to accuracy for evaluating reliable Microarray classification algorithms. In this paper, we conduct experimental comparison of our newly developed MDMT with C4.5, BaggingC4.5, AdaBoostingC4.5, Random Forest and CS4 on four Microarray cancer data sets. We test and evaluate how well a given single or ensemble classifier can tolerate noise data in unseen test data sets, particularly with increasing levels of noise. The experimental results show that MDMT tolerates the noise values in unseen test data sets better than other compared methods do, particularly with increasing levels of noise data. We observe that Random forests is comparable to MDMT in term of resistance to noise. The experimental results also show that ensemble decision tree methods tolerate the noise values better than single tree C4.5 does. We conclude that avoiding overlapping genes exist among the ensemble trees is an intuitive, simple and effective way to achieve higher degree of diversity for ensemble decision tree methods. The algorithm based on this principal is more reliable to deal with Microarray data sets with certain level of noise data.
|Item Type:||Conference or Workshop Item (Commonwealth Reporting Category E) (Paper)|
|Additional Information:||Author's version deposited in accordance with the copyright policy of the publisher. © 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purpose or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.|
|Uncontrolled Keywords:||microarray; cancer; classification; medical computing; decision trees|
|Subjects:||270000 Biological Sciences > 270100 Biochemistry and Cell Biology|
|Depositing User:||Dr Hua Wang|
|Date Deposited:||13 Jul 2009 13:03|
|Last Modified:||02 Jul 2013 23:07|
Actions (login required)
|Archive Repository Staff Only|