Robustness analysis of diversified ensemble decision tree algorithms for microarray data classification

Hu, Hong and Li, Jiuyong and Wang, Hua and Daggard, Grant and Wang, Li-Zhen (2008) Robustness analysis of diversified ensemble decision tree algorithms for microarray data classification. In: ICMLC 2008: 7th International Conference on Machine Learning and Cybernetics, 12-15 Jul 2008, Kunming, China.

[img]
Preview
PDF (Accepted Version)
Hu_Li_Wang_Daggard_Wang_2008_AV.pdf

Download (116Kb)
[img]
Preview
PDF (Documentation)
C__Users_U1007825_Desktop_[Vki-list]_New_submission_dead.pdf

Download (22Kb)

Abstract

Ensemble classification methods have shown promise for achieving higher classification accuracy for Microarray data classification analysis. As noise values do exist in all Microarray data even after Microarray data preprocessing stage, robustness is therefore another very important criteria in addition to accuracy for evaluating reliable Microarray classification algorithms. In this paper, we conduct experimental comparison of our newly developed MDMT with C4.5, BaggingC4.5, AdaBoostingC4.5, Random Forest and CS4 on four Microarray cancer data sets. We test and evaluate how well a given single or ensemble classifier can tolerate noise data in unseen test data sets, particularly with increasing levels of noise. The experimental results show that MDMT tolerates the noise values in unseen test data sets better than other compared methods do, particularly with increasing levels of noise data. We observe that Random forests is comparable to MDMT in term of resistance to noise. The experimental results also show that ensemble decision tree methods tolerate the noise values better than single tree C4.5 does. We conclude that avoiding overlapping genes exist among the ensemble trees is an intuitive, simple and effective way to achieve higher degree of diversity for ensemble decision tree methods. The algorithm based on this principal is more reliable to deal with Microarray data sets with certain level of noise data.


Statistics for USQ ePrint 4467
Statistics for this ePrint Item
Item Type: Conference or Workshop Item (Commonwealth Reporting Category E) (Paper)
Refereed: Yes
Item Status: Live Archive
Additional Information: Author's version deposited in accordance with the copyright policy of the publisher. © 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purpose or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Depositing User: Dr Hua Wang
Faculty / Department / School: Historic - Faculty of Sciences - Department of Maths and Computing
Date Deposited: 13 Jul 2009 13:03
Last Modified: 02 Jul 2013 23:07
Uncontrolled Keywords: microarray; cancer; classification; medical computing; decision trees
Fields of Research (FOR2008): 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
06 Biological Sciences > 0601 Biochemistry and Cell Biology > 060199 Biochemistry and Cell Biology not elsewhere classified
09 Engineering > 0906 Electrical and Electronic Engineering > 090602 Control Systems, Robotics and Automation
Socio-Economic Objective (SEO2008): E Expanding Knowledge > 97 Expanding Knowledge > 970106 Expanding Knowledge in the Biological Sciences
Identification Number or DOI: doi: 10.1109/ICMLC.2008.4620389
URI: http://eprints.usq.edu.au/id/eprint/4467

Actions (login required)

View Item Archive Repository Staff Only