Journal:Informatica
Volume 20, Issue 1 (2009), pp. 35–50
Abstract
We tested the ability of humans and machines (data mining techniques) to assign stress to Slovene words. This is a challenging comparison for machines since humans accomplish the task outstandingly even on unknown words without any context. The goal of finding good machine-made models for stress assignment was set by applying new methods and by making use of a known theory about rules for stress assignment in Slovene. The upgraded data mining methods outperformed expert-defined rules on practically all subtasks, thus showing that data mining can more than compete with humans when constructing formal knowledge about stress assignment is concerned. Unfortunately, compared to humans directly, the data mining methods still failed to achieve as good results as humans on assigning stress to unknown words.
Journal:Informatica
Volume 19, Issue 1 (2008), pp. 101–112
Abstract
This paper studies an adaptive clustering problem. We focus on re-clustering an object set, previously clustered, when the feature set characterizing the objects increases. We propose an adaptive clustering method based on a hierarchical agglomerative approach, Hierarchical Adaptive Clustering (HAC), that adjusts the partitioning into clusters that was established by applying the hierarchical agglomerative clustering algorithm (HACA) (Han and Kamber, 2001) before the feature set changed. We aim to reach the result more efficiently than running HACA again from scratch on the feature-extended object set. Experiments testing the method's efficiency and a practical distributed systems problem in which the HAC method can be efficiently used (the problem of adaptive horizontal fragmentation in object oriented databases) are also reported.
Journal:Informatica
Volume 14, Issue 3 (2003), pp. 277–288
Abstract
In the paper, we present an algorithm that can be applied to protect data before a data mining process takes place. The data mining, a part of the knowledge discovery process, is mainly about building models from data. We address the following question: can we protect the data and still allow the data modelling process to take place? We consider the case where the distributions of original data values are preserved while the values themselves change, so that the resulting model is equivalent to the one built with original data. The presented formal approach is especially useful when the knowledge discovery process is outsourced. The application of the algorithm is demonstrated through an example.
Journal:Informatica
Volume 12, Issue 2 (2001), pp. 239–262
Abstract
The paper deals with the analysis of Research and Technology Development (RTD) in the Central European countries and the relation of RTD with economic and social parameters of countries in this region. A methodology has been developed for quantitative and qualitative ranking and estimates of relationship among multidimensional objects on the base of such analysis. The knowledge has been discovered in four databases: two databases of European Commission (EC) containing data on the RTD activities, databases of USA CIA and The World bank containing economic and social data. Data mining has been performed by means of visual cluster analysis (using the non-linear Sammon's mapping and Kohonen's artificial neural network – the self-organising map), regression analysis and non-linear ranking (using graphs of domination). The results on clustering of the Central European countries and on the relations among RTD parameters with economic and social parameters are obtained. In addition, the data served for testing various features of realisation of the self-organising map. The integration of non-classical methods (the self-organising map and graphs of domination) with classical ones (regress analysis and Sammon' mapping) increases the capacity of visual analysis and allows making more complete conclusions.
Journal:Informatica
Volume 8, Issue 1 (1997), pp. 83–118
Abstract
The problem is to discover knowledge in the correlation matrix of parameters (variables) about their groups. Results that deal with deterministic approaches of parameter clustering on the basis of their correlation matrix are reviewed and extended. The conclusions on both theoretical and experimental investigations of various deterministic strategies in solving the problem of extremal parameter grouping are presented. The possibility of finding the optimal number of clusters is considered. The transformation of a general clustering problem into the clustering on the sphere and the relation between clustering of parameters on the basis of their correlation matrix and clustering of vectors (objects, cases) of an n-dimensional unit sphere are analysed.