Pub. online:10 Jan 2022Type:Research ArticleOpen Access
Journal:Informatica
Volume 33, Issue 1 (2022), pp. 109–130
Abstract
In this paper, a new approach has been proposed for multi-label text data class verification and adjustment. The approach helps to make semi-automated revisions of class assignments to improve the quality of the data. The data quality significantly influences the accuracy of the created models, for example, in classification tasks. It can also be useful for other data analysis tasks. The proposed approach is based on the combination of the usage of the text similarity measure and two methods: latent semantic analysis and self-organizing map. First, the text data must be pre-processed by selecting various filters to clean the data from unnecessary and irrelevant information. Latent semantic analysis has been selected to reduce the vectors dimensionality of the obtained vectors that correspond to each text from the analysed data. The cosine similarity distance has been used to determine which of the multi-label text data class should be changed or adjusted. The self-organizing map has been selected as the key method to detect similarity between text data and make decisions for a new class assignment. The experimental investigation has been performed using the newly collected multi-label text data. Financial news data in the Lithuanian language have been collected from four public websites and classified by experts into ten classes manually. Various parameters of the methods have been analysed, and the influence on the final results has been estimated. The final results are validated by experts. The research proved that the proposed approach could be helpful to verify and adjust multi-label text data classes. 82% of the correct assignments are obtained when the data dimensionality is reduced to 40 using the latent semantic analysis, and the self-organizing map size is reduced from 40 to 5 by step 5.
Journal:Informatica
Volume 22, Issue 4 (2011), pp. 507–520
Abstract
The most classical visualization methods, including multidimensional scaling and its particular case – Sammon's mapping, encounter difficulties when analyzing large data sets. One of possible ways to solve the problem is the application of artificial neural networks. This paper presents the visualization of large data sets using the feed-forward neural network – SAMANN. This back propagation-like learning rule has been developed to allow a feed-forward artificial neural network to learn Sammon's mapping in an unsupervised way. In its initial form, SAMANN training is computation expensive. In this paper, we discover conditions optimizing the computational expenditure in visualization even of large data sets. It is shown possibility to reduce the original dimensionality of data to a lower one using small number of iterations. The visualization results of real-world data sets are presented.
Journal:Informatica
Volume 20, Issue 4 (2009), pp. 477–486
Abstract
In the present paper, the neural networks theory based on presumptions of the Ising model is considered. Indirect couplings, the Dirac distributions and the corrected Hebb rule are introduced and analyzed. The embedded patterns memorized in a neural network and the indirect couplings are considered as random. Apart from the complex theory based on Dirac distributions the simplified stationary mean field equations and their solutions taking into account an ergodicity of the average overlap and the indirect order parameter are presented. The modeling results are demonstrated to corroborate theoretical statements and applied aspects.
Journal:Informatica
Volume 12, Issue 1 (2001), pp. 101–108
Abstract
This paper considers some aspects of using a cascade-correlation network in the investment task in which it is required to determine the most suitable project to invest money. This task is one of the most often met economical tasks. In various bibliographical sources on economics there are described different methods of choosing investment projects. However, they all use either one or a few criteria, i.e., out of the set of criteria there are chosen most valuable ones. With this, a lot of information contained in other choice criteria is omitted. A neural network enables one to avoid information losses. It accumulates information and helps to gain better results when choosing an investment project in comparison with classical methods. The cascade-correlation network architecture that is used in this paper has been developed by Scott E. Fahlman and Cristian Lebiere at Carnegie Mellon University.
Journal:Informatica
Volume 2, Issue 2 (1991), pp. 221–232
Abstract
The principles of a neural network environmental model are proposed. The principles are universal and can use different neural network architectures. Such a model is self-organizing, it can operate in both regimes with and without a teacher. It codes information about objects, their features, the actions operating in an environment, analyzes concrete situations. There are functions for making an action plan, for action control. The goal of the model is given from an external site. The model has more than sixteen active regimes. The neural network environmental model is fulfilled in software and hardware tools.