Journal:Informatica
Volume 25, Issue 4 (2014), pp. 563–580
Abstract
Abstract
Clustering is one of the better known unsupervised learning methods with the aim of discovering structures in the data. This paper presents a distance-based Sweep-Hyperplane Clustering Algorithm (SHCA), which uses sweep-hyperplanes to quickly locate each point’s approximate nearest neighbourhood. Furthermore, a new distance-based dynamic model that is based on -tree hierarchical space partitioning, extends SHCA’s capability for finding clusters that are not well-separated, with arbitrary shape and density. Experimental results on different synthetic and real multidimensional datasets that are large and noisy demonstrate the effectiveness of the proposed algorithm.
Journal:Informatica
Volume 20, Issue 2 (2009), pp. 187–202
Abstract
In this paper, a method for the study of cluster stability is purposed. We draw pairs of samples from the data, according to two sampling distributions. The first distribution corresponds to the high density zones of data-elements distribution. Thus it is associated with the clusters cores. The second one, associated with the cluster margins, is related to the low density zones. The samples are clustered and the two obtained partitions are compared. The partitions are considered to be consistent if the obtained clusters are similar. The resemblance is measured by the total number of edges, in the clusters minimal spanning trees, connecting points from different samples. We use the Friedman and Rafsky two sample test statistic. Under the homogeneity hypothesis, this statistic is normally distributed. Thus, it can be expected that the true number of clusters corresponds to the statistic empirical distribution which is closest to normal. Numerical experiments demonstrate the ability of the approach to detect the true number of clusters.
Journal:Informatica
Volume 19, Issue 3 (2008), pp. 377–390
Abstract
We investigate applicability of quantitative methods to discover the most fundamental structural properties of the most reliable political data in Lithuania. Namely, we analyze voting data of the Lithuanian Parliament. Two most widely used techniques of structural data analysis (clustering and multidimensional scaling) are compared. We draw some technical conclusions which can serve as recommendations in more purposeful application of these methods.
Journal:Informatica
Volume 2, Issue 2 (1991), pp. 171–194
Abstract
The paper deals with the minimization algorithms which enable us to economize the computing time during the coordinated calculation of the values of an objective function on the nodes of a rectangular lattice by storing and using quantities that are common for several nodes. The algorithm of a uniform search with clustering, the variable metric algorithm and the polytope algorithm are modified.
Journal:Informatica
Volume 2, Issue 1 (1991), pp. 77–99
Abstract
The problem multialternative recognition of non-stationary processes on the basis of dynamic models is investigated in the paper. The algorithms of pointwise and group classifications are compared. Clustering algorithms based on nonlinear mapping of the segments of random processes onto the plain are used to construct the classifiers.