Accurate and robust algorithms for microarray data classification

Hu, Hong (2008) Accurate and robust algorithms for microarray data classification. [Thesis (PhD/Research)] (Unpublished)

[img]
Preview
PDF (Introductory Pages)
Hu_2008_front.pdf

Download (106Kb)
[img]
Preview
PDF (Whole Thesis)
Hu_2008_whole.pdf

Download (1433Kb)

Abstract

[Abstract]Microarray data classification is used primarily to predict unseen data using a model built on categorized existing Microarray data. One of the major challenges is that Microarray data contains a large number of genes with a small number of samples. This high dimensionality problem has prevented many existing classification methods from directly dealing with this type of data. Moreover, the small number of samples increases the overfitting problem of Classification, as a result leading to lower accuracy classification performance. Another major challenge is that of the uncertainty of Microarray data quality. Microarray data contains various levels of noise and quite often high levels of noise, and these data lead to unreliable and low accuracy analysis as well as the high dimensionality problem. Most current classification methods are not robust enough to handle these type of data properly. In our research, accuracy and noise resistance or robustness issues are focused on. Our approach is to design a robust classification method for Microarray data classification. An algorithm, called diversified multiple decision trees (DMDT) is proposed, which makes use of a set of unique trees in the decision committee. The DMDT method has increased the diversity of ensemble committees and therefore the accuracy performance has been enhanced by avoiding overlapping genes among alternative trees. Some strategies to eliminate noisy data have been looked at. Our method ensures no overlapping genes among alternative trees in an ensemble committee, so a noise gene included in the ensemble committee can affect one tree only; other trees in the committee are not affected at all. This design increases the robustness of Microarray classification in terms of resistance to noise data, and therefore reduces the instability caused by overlapping genes in current ensemble methods. The effectiveness of gene selection methods for improving the performance of Microarray classification methods are also discussed. We conclude that the proposed method DMDT substantially outperforms the other well-known ensemble methods, such as Bagging, Boosting and Random Forests, in terms of accuracy and robustness performance. DMDT is more tolerant to noise than Cascading-and-Sharing trees (CS4), particulary with increasing levels of noise in the data. The results also indicate that some classification methods are insensitive to gene selection while some methods depend on particular gene selection methods to improve their performance of classification.


Statistics for USQ ePrint 6221
Statistics for this ePrint Item
Item Type: Thesis (PhD/Research)
Item Status: Live Archive
Additional Information: Doctor of Philosophy (PhD) thesis.
Depositing User: epEditor USQ
Faculty / Department / School: Historic - Faculty of Sciences - Department of Maths and Computing
Date Deposited: 25 Nov 2009 01:04
Last Modified: 02 Jul 2013 23:30
Uncontrolled Keywords: microarray data classification;accuracy; robustness; algorithms
Fields of Research (FOR2008): 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
URI: http://eprints.usq.edu.au/id/eprint/6221

Actions (login required)

View Item Archive Repository Staff Only