Journal:Informatica
Volume 15, Issue 1 (2004), pp. 23–38
Abstract
Extensive amounts of knowledge and data stored in medical databases require the development of specialized tools for storing, accessing, analysis, and effectiveness usage of stored knowledge and data. Intelligent methods such as neural networks, fuzzy sets, decision trees, and expert systems are, slowly but steadily, applied in the medical fields. Recently, rough set theory is a new intelligent technique was used for the discovery of data dependencies, data reduction, approximate set classification, and rule induction from databases.
In this paper, we present a rough set method for generating classification rules from a set of observed 360 samples of the breast cancer data. The attributes are selected, normalized and then the rough set dependency rules are generated directly from the real value attribute vector. Then the rough set reduction technique is applied to find all reducts of the data which contains the minimal subset of attributes that are associated with a class label for classification. Experimental results from applying the rough set analysis to the set of data samples are given and evaluated. In addition, the generated rules are also compared to the well‐known IDS classifier algorithm. The study showed that the theory of rough sets seems to be a useful tool for inductive learning and a valuable aid for building expert systems.
Journal:Informatica
Volume 14, Issue 4 (2003), pp. 471–486
Abstract
This research work is aimed at the development of data analysis strategy in a complex, multidimensional, and dynamic domain. Our universe of discourse is concerned with the data mining techniques of data warehouses revealing the importance of multivariate structures of social‐economic data which influence criminality. Distinct tasks require different data structures and various data mining exercises in data warehouses. The proposed problem solution strategy allows choosing an appropriate method in recognition processes. The ensembles of diverse and accurate classifiers are constructed on the base of multidimensional classification and clusterisation methods. Factor analysis is introduced into data mining process for revealing influencing impacts of factors. The temporal nature and multidimensionality of the target object is revealed in dynamic model using multidimension regression estimates. The paper describes the strategy of integrating the methods of multiple statistical analysis in cases, where a great set of variables is observed in short time period. The demonstration of the data analysis strategy is performed using real social and economic development of data warehouses in different regions of Lithuania.
Journal:Informatica
Volume 14, Issue 3 (2003), pp. 277–288
Abstract
In the paper, we present an algorithm that can be applied to protect data before a data mining process takes place. The data mining, a part of the knowledge discovery process, is mainly about building models from data. We address the following question: can we protect the data and still allow the data modelling process to take place? We consider the case where the distributions of original data values are preserved while the values themselves change, so that the resulting model is equivalent to the one built with original data. The presented formal approach is especially useful when the knowledge discovery process is outsourced. The application of the algorithm is demonstrated through an example.
Journal:Informatica
Volume 13, Issue 4 (2002), pp. 485–500
Abstract
This paper presents model-based forecasting of the Lithuanian education system in the period of 2001–2010. In order to obtain satisfactory forecasting results, development of models used for these aims should be grounded on some interactive data mining. The process of the development is usually accompanied by the formulation of some assumptions to background methods or models. The accessibility and reliability of data sources should be verified. Special data mining of data sources may verify the assumptions. Interactive data mining of the data, stored in the system of the Lithuanian teachers' database, and that of other sources representing the state of the education system and demographic changes in Lithuania was used. The models cover the estimation of data quality in the databases, analysis of the flow of teachers and pupils, clustering of schools, the model of dynamics of the pedagogical staff and pupils, and the quality analysis of teachers. The main results of forecasting and integrated analysis of the Lithuanian teachers' database with other data reflecting the state of the education system and demographic changes in Lithuania are presented.
Journal:Informatica
Volume 13, Issue 4 (2002), pp. 455–464
Abstract
Application of knowledge discovery in databases (data mining) for medical decision support is discussed in this work. The aim of the study was to use decision support algorithm for the differential diagnosis of intraocular tumors using parameters from eye images obtained by the ultrasound examination. Application of predictive modeling algorithm for decision tree formation using See5.0/C5.0 data mining system is presented. The decision tree was build using tumor geometry and microstructure parameters. The use of decision tree allows to differentiate tumors from other tumor-like formations. Low percentage of diagnostic errors shows that decision tree is reliable enough to offer it as “second opinion” for physician's decision support.
Journal:Informatica
Volume 12, Issue 2 (2001), pp. 239–262
Abstract
The paper deals with the analysis of Research and Technology Development (RTD) in the Central European countries and the relation of RTD with economic and social parameters of countries in this region. A methodology has been developed for quantitative and qualitative ranking and estimates of relationship among multidimensional objects on the base of such analysis. The knowledge has been discovered in four databases: two databases of European Commission (EC) containing data on the RTD activities, databases of USA CIA and The World bank containing economic and social data. Data mining has been performed by means of visual cluster analysis (using the non-linear Sammon's mapping and Kohonen's artificial neural network – the self-organising map), regression analysis and non-linear ranking (using graphs of domination). The results on clustering of the Central European countries and on the relations among RTD parameters with economic and social parameters are obtained. In addition, the data served for testing various features of realisation of the self-organising map. The integration of non-classical methods (the self-organising map and graphs of domination) with classical ones (regress analysis and Sammon' mapping) increases the capacity of visual analysis and allows making more complete conclusions.