Testing the Probability of Heart Disease Using Classification and Regression Tree Model
Mohammad Subhi Al-batah *
Department of Computer Science and Software Engineering, Faculty of Science and Information Technology, Jadara University, B.O. Box: 733, Irbid 21110, Jordan.
*Author to whom correspondence should be addressed.
Abstract
The objective of this study is to predict the presence of heart disease with reduced number of attributes using data mining techniques. The term heart disease is related to all diverse diseases affecting the heart. The exposure of heart disease from various factors is an issue which is not free from false presumptions often accompanied by unpredictable effects. Researchers have been using several data mining techniques to help health care professionals in the diagnosis of heart disease. In our work, Classification and Regression Tree (CRT) is proposed to determine the attributes which contribute more towards the diagnosis of heart ailments, which indirectly may reduce the number of tests which are needed to be taken by a patient. The dataset used consists of 270 cases. Originally, thirteen attributes were involved (age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate, exercise induced angina, depression induced by exercise relative to rest, slope, number of vessels colored by fluoroscopy, and exercise thallium scintigraphic defects). To evaluate the performance of the model the sensitivity, specificity, and accuracy are calculated. Comparison with other data mining techniques is presented. The simulation result obtained from the model enables to establish significant patterns and relationships between the medical factors and heart disease.
Keywords: Modeling, Testing, Decision tree, Heart disease, Features reduction, Classification and regression tree.