Woolf, Rodney J. (2005) Data mining using Matlab. [USQ Project]
Data mining is a relatively new field emerging in many disciplines. It is becoming more
popular as technology advances, and the need for efficient data analysis is required.
The aim of data mining itself is not to provide strict rules by analysing the full data
set, data mining is used to predict with some certainty while only analysing a small
portion of the data. This project seeks to compare the efficiency of a decision tree
induction method with that of the neural network method.
MATLAB has inbuilt data mining toolboxes. However the decision tree induction
method is not as yet implemented. Decision tree induction has been implemented in
several forms in the past. The greatest contribution to this method has been made by
DR John Ross Quinlan, who has brought forward this method in the form of ID3, C4.5
and C5 algorithms. The methodologies used within ID3 and C4.5 are well documented
and therefore provide a strong platform for the implementation of this method within
a higher level language.
The objectives of this study are to fully comprehend two methods of data mining,
namely decision tree induction and neural networks. The decision tree induction
method is to be implemented within the mathematical computer language MATLAB.
The results found when analysing some suitable data will be compared with the results
from the neural network toolbox already implemented in MATLAB.
The data used to compare and contrast the two methods included voting records from
the US House of Representatives, which consists of yes, no and undecided votes on sixteen
separate issues. The voters are grouped into categories according to their political
party. This can be either republican or democratic. The objective of using this data
set is to predict what party a congressman is affiliated with by analysing their voting
The findings of this study reveal that the decision tree method can accurately predict
outcomes if an ideal data set is used for building the tree. The neural network method
has less accuracy in some situations however it is more robust towards unexpected data.
Statistics for this ePrint Item
|Item Type:||USQ Project|
|Item Status:||Live Archive|
|Additional Information:||Additional files (C4.5 data, Golf verification, Matlab files, Case studies, LaTex source) available on CD-ROM held in USQ library.|
|Faculty / Department / School:||Historic - Faculty of Engineering and Surveying - Department of Mechanical and Mechatronic Engineering|
|Date Deposited:||11 Oct 2007 00:13|
|Last Modified:||02 Jul 2013 22:30|
|Uncontrolled Keywords:||data mining, MATLAB, decision tree induction method, artificial neural networks|
|Fields of Research :||08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080108 Neural, Evolutionary and Fuzzy Computation
08 Information and Computing Sciences > 0803 Computer Software > 080309 Software Engineering
08 Information and Computing Sciences > 0807 Library and Information Studies > 080704 Information Retrieval and Web Search
Actions (login required)
|Archive Repository Staff Only|