Journal:Informatica
Volume 23, Issue 4 (2012), pp. 521–536
Abstract
In a supervised learning, the relationship between the available data and the performance (what is learnt) is not well understood. How much data to use, or when to stop the learning process, are the key questions.
In the paper, we present an approach for an early assessment of the extracted knowledge (classification models) in the terms of performance (accuracy). The key questions are answered by detecting the point of convergence, i.e., where the classification model's performance does not improve any more even when adding more data items to the learning set. For the learning process termination criteria we developed a set of equations for detection of the convergence that follow the basic principles of the learning curve. The developed solution was evaluated on real datasets. The results of the experiment prove that the solution is well-designed: the learning process stopping criteria are not subjected to local variance and the convergence is detected where it actually has occurred.
Journal:Informatica
Volume 19, Issue 2 (2008), pp. 161–190
Abstract
In this paper, a new multi-criteria decision-making procedure is presented, which captures preferential information in the form of the threshold model. It is based on the ELECTRE-like sorting analysis restricted by the localization principle, which enables high adaptability of the decision model and reduces the cognitive load imposed on the decision-makers. It lays the foundation for the introduction of three concepts that have been previously insufficiently supported by outranking methods – semiautomatic derivation of criteria weights according to the selective effects of discordance and veto thresholds, convergent group consensus seeking, and autonomous multi-agent negotiation. The interdependent principles are justified, and the methodological solutions underlying their implementation are provided.
Journal:Informatica
Volume 18, Issue 3 (2007), pp. 343–362
Abstract
One of the tasks of data mining is classification, which provides a mapping from attributes (observations) to pre-specified classes. Classification models are built by using underlying data. In principle, the models built with more data yield better results. However, the relationship between the available data and the performance is not well understood, except that the accuracy of a classification model has diminishing improvements as a function of data size. In this paper, we present an approach for an early assessment of the extracted knowledge (classification models) in the terms of performance (accuracy), based on the amount of data used. The assessment is based on the observation of the performance on smaller sample sizes. The solution is formally defined and used in an experiment. In experiments we show the correctness and utility of the approach.