Pub. online:1 Jan 2018Type:Research ArticleOpen Access
Journal:Informatica
Volume 29, Issue 4 (2018), pp. 675–692
Abstract
The main purpose of this article was to compare traditional binary logistic regression analysis with decision tree analysis for the evaluation of the risk of cardiovascular diseases in adult men living in the city. Patients and methods. In our study, we used data from the Multifactorial Ischemic Heart Disease Prevention Study (MIHDPS). In the MIHDPS study, a random sample of male inhabitants of Kaunas city (Lithuania) aged 40–59 years was examined between 1977 and 1980. We analysed a sample of 5626 men. Taking blood pressure lowering medicine, disability, intermittent claudication, regular smoking, a higher value of the body mass index, systolic blood pressure, age, total serum cholesterol, and walking in winter were associated with a higher probability of ischemic heart disease or cardiovascular diseases. Having more siblings and drinking alcohol were associated with a lower probability of these diseases. The binary logistic regression method showed a very slightly lower level of errors than the decision tree did (the difference between the two methods was 2.04% for ischemic heart disease (IHD) and 2.86% for cardiovascular disease (CVD), but for consumers, the decision tree is easier to understand and interpret the results. Both of these methods are appropriate to analyse cardiovascular disease data.
Journal:Informatica
Volume 20, Issue 1 (2009), pp. 35–50
Abstract
We tested the ability of humans and machines (data mining techniques) to assign stress to Slovene words. This is a challenging comparison for machines since humans accomplish the task outstandingly even on unknown words without any context. The goal of finding good machine-made models for stress assignment was set by applying new methods and by making use of a known theory about rules for stress assignment in Slovene. The upgraded data mining methods outperformed expert-defined rules on practically all subtasks, thus showing that data mining can more than compete with humans when constructing formal knowledge about stress assignment is concerned. Unfortunately, compared to humans directly, the data mining methods still failed to achieve as good results as humans on assigning stress to unknown words.
Journal:Informatica
Volume 19, Issue 1 (2008), pp. 101–112
Abstract
This paper studies an adaptive clustering problem. We focus on re-clustering an object set, previously clustered, when the feature set characterizing the objects increases. We propose an adaptive clustering method based on a hierarchical agglomerative approach, Hierarchical Adaptive Clustering (HAC), that adjusts the partitioning into clusters that was established by applying the hierarchical agglomerative clustering algorithm (HACA) (Han and Kamber, 2001) before the feature set changed. We aim to reach the result more efficiently than running HACA again from scratch on the feature-extended object set. Experiments testing the method's efficiency and a practical distributed systems problem in which the HAC method can be efficiently used (the problem of adaptive horizontal fragmentation in object oriented databases) are also reported.
Journal:Informatica
Volume 14, Issue 3 (2003), pp. 277–288
Abstract
In the paper, we present an algorithm that can be applied to protect data before a data mining process takes place. The data mining, a part of the knowledge discovery process, is mainly about building models from data. We address the following question: can we protect the data and still allow the data modelling process to take place? We consider the case where the distributions of original data values are preserved while the values themselves change, so that the resulting model is equivalent to the one built with original data. The presented formal approach is especially useful when the knowledge discovery process is outsourced. The application of the algorithm is demonstrated through an example.
Journal:Informatica
Volume 13, Issue 4 (2002), pp. 455–464
Abstract
Application of knowledge discovery in databases (data mining) for medical decision support is discussed in this work. The aim of the study was to use decision support algorithm for the differential diagnosis of intraocular tumors using parameters from eye images obtained by the ultrasound examination. Application of predictive modeling algorithm for decision tree formation using See5.0/C5.0 data mining system is presented. The decision tree was build using tumor geometry and microstructure parameters. The use of decision tree allows to differentiate tumors from other tumor-like formations. Low percentage of diagnostic errors shows that decision tree is reliable enough to offer it as “second opinion” for physician's decision support.