Data mining using Matlab

Woolf, Rodney J. (2005) Data mining using Matlab. [USQ Project] (Unpublished)

Metadata

HTML CitationEndNoteDublin CoreReference Manager

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
805Kb

Abstract

Data mining is a relatively new field emerging in many disciplines. It is becoming more popular as technology advances, and the need for efficient data analysis is required. The aim of data mining itself is not to provide strict rules by analysing the full data set, data mining is used to predict with some certainty while only analysing a small portion of the data. This project seeks to compare the efficiency of a decision tree induction method with that of the neural network method. MATLAB has inbuilt data mining toolboxes. However the decision tree induction method is not as yet implemented. Decision tree induction has been implemented in several forms in the past. The greatest contribution to this method has been made by DR John Ross Quinlan, who has brought forward this method in the form of ID3, C4.5 and C5 algorithms. The methodologies used within ID3 and C4.5 are well documented and therefore provide a strong platform for the implementation of this method within a higher level language. The objectives of this study are to fully comprehend two methods of data mining, namely decision tree induction and neural networks. The decision tree induction method is to be implemented within the mathematical computer language MATLAB. The results found when analysing some suitable data will be compared with the results from the neural network toolbox already implemented in MATLAB. The data used to compare and contrast the two methods included voting records from the US House of Representatives, which consists of yes, no and undecided votes on sixteen separate issues. The voters are grouped into categories according to their political party. This can be either republican or democratic. The objective of using this data set is to predict what party a congressman is affiliated with by analysing their voting trends. The findings of this study reveal that the decision tree method can accurately predict outcomes if an ideal data set is used for building the tree. The neural network method has less accuracy in some situations however it is more robust towards unexpected data.

Item Type:USQ Project
Additional Information:Additional files (C4.5 data, Golf verification, Matlab files, Case studies, LaTex source) available on CD-ROM held in USQ library.
Uncontrolled Keywords:data mining, MATLAB, decision tree induction method, artificial neural networks
Fields of Research (FOR2008):08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080108 Neural, Evolutionary and Fuzzy Computation
08 Information and Computing Sciences > 0803 Computer Software > 080309 Software Engineering
08 Information and Computing Sciences > 0807 Library and Information Studies > 080704 Information Retrieval and Web Search
Subjects:280000 Information, Computing and Communication Sciences > 280100 Information Systems > 280103 Information Storage, Retrieval and Management
280000 Information, Computing and Communication Sciences > 280300 Computer Software > 280302 Software Engineering
280000 Information, Computing and Communication Sciences > 280200 Artificial Intelligence and Signal and Image Processing > 280212 Neural Networks, Genetic Alogrithms and Fuzzy Logic
Socio-Economic Objective (SEO2008):UNSPECIFIED
ID Code:58
Deposited By:
Deposited On:11 Oct 2007 10:13
Last Modified:26 Feb 2009 09:15

Archive Staff Only: edit this record