Informatica logo


Login Register

  1. Home
  2. Issues
  3. Volume 29, Issue 4 (2018)
  4. A Comparison of Decision Tree Induction ...

Informatica

Information Submit your article For Referees Help ATTENTION!
  • Article info
  • Full article
  • Related articles
  • Cited by
  • More
    Article info Full article Related articles Cited by

A Comparison of Decision Tree Induction with Binary Logistic Regression for the Prediction of the Risk of Cardiovascular Diseases in Adult Men
Volume 29, Issue 4 (2018), pp. 675–692
Ingrida Grabauskytė   Abdonas Tamošiūnas   Mindaugas Kavaliauskas   Ričardas Radišauskas   Gailutė Bernotienė   Vytautas Janilionis  

Authors

 
Placeholder
https://doi.org/10.15388/Informatica.2018.187
Pub. online: 1 January 2018      Type: Research Article      Open accessOpen Access

Received
1 June 2017
Accepted
1 July 2018
Published
1 January 2018

Abstract

The main purpose of this article was to compare traditional binary logistic regression analysis with decision tree analysis for the evaluation of the risk of cardiovascular diseases in adult men living in the city. Patients and methods. In our study, we used data from the Multifactorial Ischemic Heart Disease Prevention Study (MIHDPS). In the MIHDPS study, a random sample of male inhabitants of Kaunas city (Lithuania) aged 40–59 years was examined between 1977 and 1980. We analysed a sample of 5626 men. Taking blood pressure lowering medicine, disability, intermittent claudication, regular smoking, a higher value of the body mass index, systolic blood pressure, age, total serum cholesterol, and walking in winter were associated with a higher probability of ischemic heart disease or cardiovascular diseases. Having more siblings and drinking alcohol were associated with a lower probability of these diseases. The binary logistic regression method showed a very slightly lower level of errors than the decision tree did (the difference between the two methods was 2.04% for ischemic heart disease (IHD) and 2.86% for cardiovascular disease (CVD), but for consumers, the decision tree is easier to understand and interpret the results. Both of these methods are appropriate to analyse cardiovascular disease data.

References

 
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (Eds.), Proceedings of the 2nd International Symposium on Information Theory, Tsahkadsov, Armenia, pp. 267–281.
 
Breiman, L., Friedman, J.H., Olshen, J.H., Stone, R.A. (1984). Classification and Regression Trees. Wadsworth, Belmont, California.
 
Fernandez-Scola, J. (2015). Cardiovascular risks and benefits of moderate and heavy alcohol consumption. Nature Reviewas Cardiology, 12, 576–587.
 
Gaziano, J.M. (2016). Health alcohol consumption: myth or reality? Journal of Hypertension, 34, e16.
 
Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American Statistical Association, 70, 320–328.
 
Glasunov, I.S., Dowd, J.E., Baubinienė, A., Grabauskas, V., Sturmans, F., Shuurman, J.H. (1981). The Kaunas Rotterdam Intervention Study. Elsevier, North Holland Biomedical Press, Amsterdam.
 
Han, J., Kamber, M., Pei, J. (2012). Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufmann, Massachusetts.
 
Hothorn, T., Hornik, K., Zeileis, A. (2006). Unbiased recursive partitioning: a conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651–674.
 
Huang, T., Chen, C.P., Chen, V., Wefler, V., Raftery, A. (1961). A stable reagent for the Lieberman-Burchard reaction. Application to rapid serum cholesterol determination. Analytical Chemistry, 33, 1405–1407.
 
Jing, J. (2013). The introduction and application of recursive partitioning methods in organizational science. PhD thesis, University of Illinois at Urbana-Champaign.
 
Kerdprasop, N., Kittisak, K. (2011). Heuristic-based decision tree induction method for noisy data. In: Kim, T., Adeli, H., Cuzzocrea, A., Arslan, T., Zhang, Y., Ma, J., Chung, K., Mariyam, S., Song, X. (Eds.), Database Theory and Application, Bio-Science and Bio-Technology. Springer, Berlin, Heidelberg, pp. 1–10.
 
Khemphila, A., Boonjing, V. (2010). Comparing performances of logistic regression, decision trees, and neural networks for classifying heart disease patients. In: 2010 International Conference on Computer Information Systems and Industrial Management Applications (CISIM), pp. 193–198.
 
Kuzmickienė, I., Everatt, R., Virvičiūtė, D., Tamošiūnas, A., Radišauskas, R., Reklaitienė, R., Milinavičienė, E. (2013). Smoking and other risk factors for pancreatic cancer: a cohort study in men in Lithuania. Cancer Epidemiology, 37, 133–139.
 
Lithuanian Ministry of Health, Health Information Centre of Institute of Hygiene (2016). Health Statistics of Lithuania 2015. Available: http://sic.hi.lt/data/la2015.pdf. Accessed: 28 March 2017.
 
Long, W.J., Griffith, J.L., Selker, H.P., D’Agostino, R. (1993). A comparison of logistic regression to decision-tree induction in a medical domain. Computer in Biomedical Research, 26, 74–97.
 
Mukamal, K.J., Ascherio, A., Mittleman, M.A., Conigrave, K.M., Camargo, C., Kawachi, I., Stampfer, M.J., WC, W.C.W., Rimm, E.B. (2005). Alcohol and risk for ischemic stroke in men: the role of drinking patterns and usual beverage. Annals of Internal Medicine, 142, 11–19.
 
Prineas, R.J., Crow, R.S., Blackburn, H. (1982). The Minnesota Code Manual of Electrocardiographic Findings. John Wright, Boston.
 
Renaud, S., Lorgeril, M.D. (1992). Wine, alcohol, platelets and the French paradox for coronary heart disease. The Lancet, 339(8808), 1523–1526.
 
Rėklaitienė, R., Tamošiūnas, A., Virvičiūtė, D., Bacevičienė, M., Lukšienė, D. (2012). Trends in prevalence, awareness, treatment, and control of hypertension, and the risk of mortality among middle-aged Lithuanian urban population in 1983–2009. BMC Cardiovascular Disorders, 12.
 
Rose, G.A., Blackburn, H., Gillum, R.F., Prineas, R.J. (1982). Cardiovascular Survey Methods. WHO Monograph Series. Cardiovascular Disease Unit, Vol. 56. World Health Organization, Geneva, Switzerland.
 
Schneider, J. (1997). Cross Validation. Available: https://www.cs.cmu.edu/˜schneide/tut5/node42.html. Accessed: 28 March 2017.
 
Song, Y., Lu, Y. (2015). Decision tree methods: application for classification and prediction. Shanghai Archives of Psychiatry, 27(2), 130–135.
 
Soni, J., Ansari, U., Sharma, D., Soni, S. (2011). Predictive data mining for medical diagnosis: an overview of heart disease prediction. International Journal of Computer Applications, 17(8), 43–48.
 
Stone, M. (1974). Cross-validation choice and assessment of statistical predictions. Journal of the Royal Statistical Society, Series B, 36(2), 111–147.
 
Strobl, C., Malley, J., Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4), 323–348.
 
Tamošiūnas, A., Lukšienė, D., Bacevičienė, M., Bernotienė, G., Radišauskas, R., Malinauskienė, V., Krančiukaitė-Butylkinienė, D., Virvičiūtė, D., Peasey, A., Bobak, M. (2014). Health factors and risk of all-cause, cardiovascular, and coronary heart disease mortality: findings from the MONICA and HAPIEE studies in Lithuania. PLoS One, 9(12), e114283.
 
WebFOCUS RStat (2011). Explanation of the Decision Tree Model. Available: http://webfocusinfocenter.informationbuilders.com/wfappent/TLs/TL_rstat/source/topic41.htm. Accessed: 28 March 2017.
 
Witten, I.H., Frank, E., Hall, M.A. (2011). Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed. Morgan Kaufmann.
 
Zhao, Z., Xu, G., Qi, Y. (2016). Representation of binary feature pooling for detection of insulator strings in infrared images. IEEE Transactions on Dielectrics and Electrical Insulation, 23(5), 2858–2866.

Biographies

Grabauskytė Ingrida
ingrida.grabauskyte@lsmuni.lt

I. Grabauskytė is a PhD student at the Department of Population Studies, Institute of Cardiology, Lithuanian University of Health Sciences. She is a lecturer of biostatistics at the university. Her current research focus is on statistics and medical data analysis.

Tamošiūnas Abdonas
abdonas.tamosiunas@lsmuni.lt

A. Tamošiūnas, Prof. Dr. Habil., head of laboratory, head researcher in the Department of Population Studies, Institute of Cardiology, Lithuanian University of Health Sciences. The field of research – epidemiology and primary prevention of cardiovascular disease.

Kavaliauskas Mindaugas
m.kavaliauskas@ktu.lt

M. Kavaliauskas, Dr. is a lecturer at Kaunas University of Technology. He is giving lectures on mathematical statistics, time series analysis and data mining. His field of scientific research is methods of multivariate data analysis.

Radišauskas Ričardas
ricardas.radisauskas@lsmuni.lt

R. Radišauskas, Prof. Dr., senior researcher in the Department of Population Studies, Institute of Cardiology, Lithuanian University of Health Sciences. The field of research – epidemiology and primary prevention of cardiovascular disease.

Bernotienė Gailutė
gailute.bernotiene@lsmuni.lt

G. Bernotienė, Assoc. Prof. Dr., senior researcher in the Department of Population Studies, Institute of Cardiology, Lithuanian University of Health Sciences. The field of research – epidemiology and primary prevention of cardiovascular disease.

Janilionis Vytautas
vytautas.janilionis@ktu.lt

V. Janilionis is an associate professor at the Department of Applied Mathematics, Kaunas University of Technology. He received a PhD degree (Technical cybernetics and information theory) in 1989 from the Kaunas Polytechnic Institute, Lithuania. His major research interests include statistical data analysis, system modelling, identification and control, data mining methods and applications.


Full article Related articles Cited by PDF XML
Full article Related articles Cited by PDF XML

Copyright
© 2018 Vilnius University
by logo by logo
Open access article under the CC BY license.

Keywords
logistic regression decision tree ischemic heart disease cardiovascular disease

Metrics
since January 2020
1231

Article info
views

643

Full article
views

554

PDF
downloads

253

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

INFORMATICA

  • Online ISSN: 1822-8844
  • Print ISSN: 0868-4952
  • Copyright © 2023 Vilnius University

About

  • About journal

For contributors

  • OA Policy
  • Submit your article
  • Instructions for Referees
    •  

    •  

Contact us

  • Institute of Data Science and Digital Technologies
  • Vilnius University

    Akademijos St. 4

    08412 Vilnius, Lithuania

    Phone: (+370 5) 2109 338

    E-mail: informatica@mii.vu.lt

    https://informatica.vu.lt/journal/INFORMATICA
Powered by PubliMill  •  Privacy policy