Journal:Informatica
Volume 19, Issue 1 (2008), pp. 101–112
Abstract
This paper studies an adaptive clustering problem. We focus on re-clustering an object set, previously clustered, when the feature set characterizing the objects increases. We propose an adaptive clustering method based on a hierarchical agglomerative approach, Hierarchical Adaptive Clustering (HAC), that adjusts the partitioning into clusters that was established by applying the hierarchical agglomerative clustering algorithm (HACA) (Han and Kamber, 2001) before the feature set changed. We aim to reach the result more efficiently than running HACA again from scratch on the feature-extended object set. Experiments testing the method's efficiency and a practical distributed systems problem in which the HAC method can be efficiently used (the problem of adaptive horizontal fragmentation in object oriented databases) are also reported.
Journal:Informatica
Volume 15, Issue 1 (2004), pp. 23–38
Abstract
Extensive amounts of knowledge and data stored in medical databases require the development of specialized tools for storing, accessing, analysis, and effectiveness usage of stored knowledge and data. Intelligent methods such as neural networks, fuzzy sets, decision trees, and expert systems are, slowly but steadily, applied in the medical fields. Recently, rough set theory is a new intelligent technique was used for the discovery of data dependencies, data reduction, approximate set classification, and rule induction from databases.
In this paper, we present a rough set method for generating classification rules from a set of observed 360 samples of the breast cancer data. The attributes are selected, normalized and then the rough set dependency rules are generated directly from the real value attribute vector. Then the rough set reduction technique is applied to find all reducts of the data which contains the minimal subset of attributes that are associated with a class label for classification. Experimental results from applying the rough set analysis to the set of data samples are given and evaluated. In addition, the generated rules are also compared to the well‐known IDS classifier algorithm. The study showed that the theory of rough sets seems to be a useful tool for inductive learning and a valuable aid for building expert systems.
Journal:Informatica
Volume 14, Issue 3 (2003), pp. 277–288
Abstract
In the paper, we present an algorithm that can be applied to protect data before a data mining process takes place. The data mining, a part of the knowledge discovery process, is mainly about building models from data. We address the following question: can we protect the data and still allow the data modelling process to take place? We consider the case where the distributions of original data values are preserved while the values themselves change, so that the resulting model is equivalent to the one built with original data. The presented formal approach is especially useful when the knowledge discovery process is outsourced. The application of the algorithm is demonstrated through an example.
Journal:Informatica
Volume 12, Issue 1 (2001), pp. 101–108
Abstract
This paper considers some aspects of using a cascade-correlation network in the investment task in which it is required to determine the most suitable project to invest money. This task is one of the most often met economical tasks. In various bibliographical sources on economics there are described different methods of choosing investment projects. However, they all use either one or a few criteria, i.e., out of the set of criteria there are chosen most valuable ones. With this, a lot of information contained in other choice criteria is omitted. A neural network enables one to avoid information losses. It accumulates information and helps to gain better results when choosing an investment project in comparison with classical methods. The cascade-correlation network architecture that is used in this paper has been developed by Scott E. Fahlman and Cristian Lebiere at Carnegie Mellon University.
Journal:Informatica
Volume 8, Issue 1 (1997), pp. 83–118
Abstract
The problem is to discover knowledge in the correlation matrix of parameters (variables) about their groups. Results that deal with deterministic approaches of parameter clustering on the basis of their correlation matrix are reviewed and extended. The conclusions on both theoretical and experimental investigations of various deterministic strategies in solving the problem of extremal parameter grouping are presented. The possibility of finding the optimal number of clusters is considered. The transformation of a general clustering problem into the clustering on the sphere and the relation between clustering of parameters on the basis of their correlation matrix and clustering of vectors (objects, cases) of an n-dimensional unit sphere are analysed.