Hu, Hong (2008) Accurate and robust algorithms for microarray data classification. [Thesis (PhD/Research)] (Unpublished)
PDF (Introductory Pages)
PDF (Whole Thesis)
[Abstract]Microarray data classification is used primarily to predict unseen data using a model built on categorized existing Microarray data. One of the major challenges is that Microarray data contains a large number of genes with a small number of samples. This high dimensionality problem has prevented many existing classification methods from directly dealing with this type of data. Moreover, the small number of samples increases the overfitting problem of Classification, as a result leading to lower accuracy classification performance. Another major challenge is that of the uncertainty of Microarray
data quality. Microarray data contains various levels of noise and quite often high levels of noise, and these data lead to unreliable and low accuracy analysis as well as the high dimensionality problem. Most current classification methods are not robust enough to handle these type of data properly.
In our research, accuracy and noise resistance or robustness issues are focused on. Our approach is to design a robust classification method for Microarray data classification.
An algorithm, called diversified multiple decision trees (DMDT) is proposed, which makes use of a set of unique trees in the decision committee. The DMDT method has increased the diversity of ensemble committees and
therefore the accuracy performance has been enhanced by avoiding overlapping genes among alternative trees.
Some strategies to eliminate noisy data have been looked at. Our method ensures no overlapping genes among alternative trees in an ensemble committee, so a noise gene included in the ensemble committee can affect one
tree only; other trees in the committee are not affected at all. This design increases the robustness of Microarray classification in terms of resistance to noise data, and therefore reduces the instability caused by overlapping genes in current ensemble methods.
The effectiveness of gene selection methods for improving the performance of Microarray classification methods are also discussed.
We conclude that the proposed method DMDT substantially outperforms the other well-known ensemble methods, such as Bagging, Boosting and Random Forests, in terms of accuracy and robustness performance. DMDT is more tolerant to noise than Cascading-and-Sharing trees (CS4), particulary
with increasing levels of noise in the data. The results also indicate that some classification methods are insensitive to gene selection while some methods
depend on particular gene selection methods to improve their performance of classification.
Statistics for this ePrint Item
|Item Type:||Thesis (PhD/Research)|
|Item Status:||Live Archive|
|Additional Information (displayed to public):||Doctor of Philosophy (PhD) thesis.|
|Depositing User:||epEditor USQ|
|Faculty / Department / School:||Historic - Faculty of Sciences - Department of Maths and Computing|
|Date Deposited:||25 Nov 2009 01:04|
|Last Modified:||02 Jul 2013 23:30|
|Uncontrolled Keywords:||microarray data classification;accuracy; robustness; algorithms|
|Fields of Research (FoR):||08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining|
Actions (login required)
|Archive Repository Staff Only|