Journal:Informatica
Volume 22, Issue 4 (2011), pp. 507–520
Abstract
The most classical visualization methods, including multidimensional scaling and its particular case – Sammon's mapping, encounter difficulties when analyzing large data sets. One of possible ways to solve the problem is the application of artificial neural networks. This paper presents the visualization of large data sets using the feed-forward neural network – SAMANN. This back propagation-like learning rule has been developed to allow a feed-forward artificial neural network to learn Sammon's mapping in an unsupervised way. In its initial form, SAMANN training is computation expensive. In this paper, we discover conditions optimizing the computational expenditure in visualization even of large data sets. It is shown possibility to reduce the original dimensionality of data to a lower one using small number of iterations. The visualization results of real-world data sets are presented.
Journal:Informatica
Volume 22, Issue 1 (2011), pp. 1–10
Abstract
Estimation and modelling problems as they arise in many data analysis areas often turn out to be unstable and/or intractable by standard numerical methods. Such problems frequently occur in fitting of large data sets to a certain model and in predictive learning. Heuristics are general recommendations based on practical statistical evidence, in contrast to a fixed set of rules that cannot vary, although guarantee to give the correct answer. Although the use of these methods became more standard in several fields of sciences, their use for estimation and modelling in statistics appears to be still limited. This paper surveys a set of problem-solving strategies, guided by heuristic information, that are expected to be used more frequently. The use of recent advances in different fields of large-scale data analysis is promoted focusing on applications in medicine, biology and technology.
Journal:Informatica
Volume 21, Issue 3 (2010), pp. 455–470
Abstract
In this article, a method is proposed for analysing the thermovision-based video data that characterize the dynamics of temperature anisotropy of the heart tissue in a spatial domain. Many cardiac rhythm disturbances at present time are treated by applying destructive energy sources. One of the most common source and the related methodology is to use radio-frequency ablation procedure. However, the rate of the risk of complications including arrhythmia recurrence remains enough high. The drawback of the methodology used is that the suchlike destruction procedure cannot be monitored by visual spectra and results in the inability to control the ablation efficiency. To the end of understanding the nature of possible complications and controlling the treating process, the means of thermovision could be used. The aim of the study was to analyse possible mechanisms of these complications, measure and determine optimal radio-frequency ablation parameters, according to the analysis of video data, acquired using thermovision.
Journal:Informatica
Volume 20, Issue 2 (2009), pp. 235–254
Abstract
Most of real-life data are not often truly high-dimensional. The data points just lie on a low-dimensional manifold embedded in a high-dimensional space. Nonlinear manifold learning methods automatically discover the low-dimensional nonlinear manifold in a high-dimensional data space and then embed the data points into a low-dimensional embedding space, preserving the underlying structure in the data. In this paper, we have used the locally linear embedding method on purpose to unravel a manifold. In order to quantitatively estimate the topology preservation of a manifold after unfolding it in a low-dimensional space, some quantitative numerical measure must be used. There are lots of different measures of topology preservation. We have investigated three measures: Spearman's rho, Konig's measure (KM), and mean relative rank errors (MRRE). After investigating different manifolds, it turned out that only KM and MRRE gave proper results of manifold topology preservation in all the cases. The main reason is that Spearman's rho considers distances between all the pairs of points from the analysed data set, while KM and MRRE evaluate a limited number of neighbours of each point from the analysed data set.
Journal:Informatica
Volume 20, Issue 2 (2009), pp. 165–172
Abstract
Recent changes in the intersection of the fields of intelligent systems optimization and statistical learning are surveyed. These changes bring new theoretical and computational challenges to the existing research areas racing from web page mining to computer vision, pattern recognition, financial mathematics, bioinformatics and many other ones.
Journal:Informatica
Volume 19, Issue 3 (2008), pp. 403–420
Abstract
New information technologies provide a possibility of collecting a large amount of fundus images into databases. It allows us to use automated processing and classification of images for clinical decisions. Automated localization and parameterization of the optic nerve disc is particularly important in making a diagnosis of glaucoma, because the main symptoms in these cases are relations between the optic nerve and cupping parameters. This article describes the automated algorithm for the optic nerve disc localization and parameterization by an ellipse within colour retinal images. The testing results are discussed as well.
Journal:Informatica
Volume 18, Issue 2 (2007), pp. 187–202
Abstract
In this paper, the relative multidimensional scaling method is investigated. This method is designated to visualize large multidimensional data. The method encompasses application of multidimensional scaling (MDS) to the so-called basic vector set and further mapping of the remaining vectors from the analyzed data set. In the original algorithm of relative MDS, the visualization process is divided into three steps: the set of basis vectors is constructed using the k-means clustering method; this set is projected onto the plane using the MDS algorithm; the set of remaining data is visualized using the relative mapping algorithm. We propose a modification, which differs from the original algorithm in the strategy of selecting the basis vectors. The experimental investigation has shown that the modification exceeds the original algorithm in the visualization quality and computational expenses. The conditions, where the relative MDS efficiency exceeds that of standard MDS, are estimated.
Journal:Informatica
Volume 13, Issue 4 (2002), pp. 485–500
Abstract
This paper presents model-based forecasting of the Lithuanian education system in the period of 2001–2010. In order to obtain satisfactory forecasting results, development of models used for these aims should be grounded on some interactive data mining. The process of the development is usually accompanied by the formulation of some assumptions to background methods or models. The accessibility and reliability of data sources should be verified. Special data mining of data sources may verify the assumptions. Interactive data mining of the data, stored in the system of the Lithuanian teachers' database, and that of other sources representing the state of the education system and demographic changes in Lithuania was used. The models cover the estimation of data quality in the databases, analysis of the flow of teachers and pupils, clustering of schools, the model of dynamics of the pedagogical staff and pupils, and the quality analysis of teachers. The main results of forecasting and integrated analysis of the Lithuanian teachers' database with other data reflecting the state of the education system and demographic changes in Lithuania are presented.
Journal:Informatica
Volume 13, Issue 3 (2002), pp. 275–286
Abstract
In the paper, we analyze the software that realizes the self-organizing maps: SOM-PAK, SOM-TOOLBOX, Viscovery SOMine, Nenet, and two academic systems. Most of the software may be found in the Internet. These are freeware, shareware or demo. The self-organizing maps assist in data clustering and analyzing data similarities. The software differs one from another in the realization and visualization capabilities. The data on coastal dunes and their vegetation in Finland are used for the experimental comparison of the graphical result presentation of the software. Similarities of the systems and their differences, advantages and imperfections are exposed.
Journal:Informatica
Volume 12, Issue 2 (2001), pp. 239–262
Abstract
The paper deals with the analysis of Research and Technology Development (RTD) in the Central European countries and the relation of RTD with economic and social parameters of countries in this region. A methodology has been developed for quantitative and qualitative ranking and estimates of relationship among multidimensional objects on the base of such analysis. The knowledge has been discovered in four databases: two databases of European Commission (EC) containing data on the RTD activities, databases of USA CIA and The World bank containing economic and social data. Data mining has been performed by means of visual cluster analysis (using the non-linear Sammon's mapping and Kohonen's artificial neural network – the self-organising map), regression analysis and non-linear ranking (using graphs of domination). The results on clustering of the Central European countries and on the relations among RTD parameters with economic and social parameters are obtained. In addition, the data served for testing various features of realisation of the self-organising map. The integration of non-classical methods (the self-organising map and graphs of domination) with classical ones (regress analysis and Sammon' mapping) increases the capacity of visual analysis and allows making more complete conclusions.