Association between work-related features and coronary artery disease: a heterogeneous hybrid feature selection integrated with balancing approach

Nasarian, Elham and Abdar, Moloud and Fahami, Mohammad Amin and Alizadehsani, Roohallah and Hussain, Sadiq and Basiri, Mohammad Ehsan and Zomorodi-Moghadam, Mariam and Zhou, Xujuan and Plawiak, Pawel and Acharya, U. Rajendra and Tan, Ru-San and Sarrafzadegan, Nizal (2020) Association between work-related features and coronary artery disease: a heterogeneous hybrid feature selection integrated with balancing approach. Pattern Recognition Letters, 133. pp. 33-40. ISSN 0167-8655


Abstract

Coronary artery disease (CAD) is a leading cause of death worldwide and is associated with high healthcare expenditure. Researchers are motivated to apply machine learning (ML) for quick and accurate detection of CAD. The performance of the automated systems depends on the quality of features used. Clinical CAD datasets contain different features with varying degrees of association with CAD. To extract such features, we developed a novel hybrid feature selection algorithm called heterogeneous hybrid feature selection (2HFS). In this work, we used Nasarian CAD dataset, in which work place and environmental features are also considered, in addition to other clinical features. Synthetic minority over-sampling technique (SMOTE) and Adaptive synthetic (ADASYN) are used to handle the imbalance in the dataset. Decision tree (DT), Gaussian Naive Bayes (GNB), Random Forest (RF), and XGBoost classifiers are used. 2HFS-selected features are then input into these classifier algorithms. Our results show that, the proposed feature selection method has yielded the classification accuracy of 81.23% with SMOTE and XGBoost classifier. We have also tested our approach with other well-known CAD datasets: Hungarian dataset, Long-beach-va dataset, and Z-Alizadeh Sani dataset. We have obtained 83.94%, 81.58% and 92.58% for Hungarian dataset, Long-beach va dataset, and Z-Alizadeh Sani dataset, respectively. Hence, our experimental results confirm the effectiveness of our proposed feature selection algorithm as compared to the existing state-of-the-art techniques which yielded outstanding results for the development of automated CAD systems.


Statistics for USQ ePrint 38438
Statistics for this ePrint Item
Item Type: Article (Commonwealth Reporting Category C)
Refereed: Yes
Item Status: Live Archive
Additional Information: Permanent restricted access to Published version in accordance with the copyright policy of the publisher.
Faculty/School / Institute/Centre: Current - Faculty of Business, Education, Law and Arts - School of Management and Enterprise (1 July 2013 -)
Faculty/School / Institute/Centre: Current - Faculty of Business, Education, Law and Arts - School of Management and Enterprise (1 July 2013 -)
Date Deposited: 11 Mar 2020 02:39
Last Modified: 08 May 2020 02:48
Uncontrolled Keywords: machine learning; data mining; heart disease; coronary artery disease; feature selection
Fields of Research (2008): 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
Socio-Economic Objectives (2008): E Expanding Knowledge > 97 Expanding Knowledge > 970108 Expanding Knowledge in the Information and Computing Sciences
Identification Number or DOI: 10.1016/j.patrec.2020.02.010
URI: http://eprints.usq.edu.au/id/eprint/38438

Actions (login required)

View Item Archive Repository Staff Only