Hu, Hong (2008) Accurate and robust algorithms for microarray data classification. [Thesis (_PhD/Research)] (Unpublished)
Metadata
| HTML Citation | EndNote | Dublin Core | Reference Manager |
Full text available as:
| PDF (Introductory Pages) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader 106Kb | |
| PDF (Whole Thesis) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader 1433Kb |
Abstract
[Abstract]Microarray data classification is used primarily to predict unseen data using a model built on categorized existing Microarray data. One of the major challenges is that Microarray data contains a large number of genes with a small number of samples. This high dimensionality problem has prevented many existing classification methods from directly dealing with this type of data. Moreover, the small number of samples increases the overfitting problem of Classification, as a result leading to lower accuracy classification performance. Another major challenge is that of the uncertainty of Microarray data quality. Microarray data contains various levels of noise and quite often high levels of noise, and these data lead to unreliable and low accuracy analysis as well as the high dimensionality problem. Most current classification methods are not robust enough to handle these type of data properly. In our research, accuracy and noise resistance or robustness issues are focused on. Our approach is to design a robust classification method for Microarray data classification. An algorithm, called diversified multiple decision trees (DMDT) is proposed, which makes use of a set of unique trees in the decision committee. The DMDT method has increased the diversity of ensemble committees and therefore the accuracy performance has been enhanced by avoiding overlapping genes among alternative trees. Some strategies to eliminate noisy data have been looked at. Our method ensures no overlapping genes among alternative trees in an ensemble committee, so a noise gene included in the ensemble committee can affect one tree only; other trees in the committee are not affected at all. This design increases the robustness of Microarray classification in terms of resistance to noise data, and therefore reduces the instability caused by overlapping genes in current ensemble methods. The effectiveness of gene selection methods for improving the performance of Microarray classification methods are also discussed. We conclude that the proposed method DMDT substantially outperforms the other well-known ensemble methods, such as Bagging, Boosting and Random Forests, in terms of accuracy and robustness performance. DMDT is more tolerant to noise than Cascading-and-Sharing trees (CS4), particulary with increasing levels of noise in the data. The results also indicate that some classification methods are insensitive to gene selection while some methods depend on particular gene selection methods to improve their performance of classification.
| Item Type: | Thesis (_PhD/Research) |
|---|---|
| Additional Information: | Doctor of Philosophy (PhD) thesis. |
| Uncontrolled Keywords: | microarray data classification;accuracy; robustness; algorithms |
| Fields of Research (FOR2008): | 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining |
| Subjects: | 280000 Information, Computing and Communication Sciences > 280200 Artificial Intelligence and Signal and Image Processing > 280207 Pattern Recognition |
| Socio-Economic Objective (SEO2008): | UNSPECIFIED |
| ID Code: | 6221 |
| Deposited By: | |
| Deposited On: | 25 Nov 2009 11:04 |
| Last Modified: | 02 Dec 2011 14:44 |
Archive Staff Only: edit this record
