Journal:Informatica
Volume 21, Issue 1 (2010), pp. 13–30
Abstract
The genetic information in cells is stored in DNA sequences, represented by a string of four letters, each corresponding to a definite type of nucleotides. Genomic DNA sequences are very abundant in periodic patterns, which play important biological roles. The complexity of genetic sequences can be estimated using the information-theoretic methods. Low complexity regions are of particular interest to genome researchers, because they indicate to sequence repeats and patterns. In this paper, the complexity of genetic sequences is estimated using Shannon entropy, Rényi entropy and relative Kolmogorov complexity. The structural complexity based on periodicities is analyzed using the autocorrelation function and time delayed mutual information. As a case study, we analyze human 22nd chromosome and identify 3 and 49 bp periodicities.
Journal:Informatica
Volume 10, Issue 2 (1999), pp. 245–269
Abstract
Structurization of the sample covariance matrix reduces the number of the parameters to be estimated and, in a case the structurization assumptions are correct, improves small sample properties of a statistical linear classifier. Structured estimates of the sample covariance matrix are used to decorellate and scale the data, and to train a single layer perceptron classifier afterwards. In most from ten real world pattern classification problems tested, the structurization methodology applied together with the data transformations and subsequent use of the optimally stopped single layer perceptron resulted in a significant gain in comparison with the best statistical linear classifier – the regularized discriminant analysis.
Journal:Informatica
Volume 7, Issue 2 (1996), pp. 137–154
Abstract
There exist two principally different approaches to design the classification rule. In classical (parametric) approach one parametrizes conditional density functions of the pattern classes. In a second (nonparametric) approach one parametrizes a type of the discriminant function and minimizes an empirical classification error to find unknown coefficients of the discriminant function. There is a number of asymptotic expansions for an expected probability of misclassification of parametric classifiers. Error bounds exist for nonparametric classifiers so far. In this paper an exact analytical expression for the expected error EPN of nonparametric linear zero empirical error classifier is derived for a case when the distributions of pattern classes are spherically Gaussian. The asymptotic expansion of EPN is obtained for a case when both the number of learning patterns N and their, dimensionality p increase infinitely. The tables for exact and approximate expected errors as functions of N, dimensionality p and the distance δ between pattern classes are presented and compared with the expected error of the Fisher's linear classifier and indicate that the minimum empirical error classifier can be used even in cases where dimensionality exceeds the number of learning examples.
Journal:Informatica
Volume 7, Issue 1 (1996), pp. 15–26
Abstract
In this paper, we propose to present the direct form recursive digital filter as a state space filter. Then, we apply a look-ahead technique and derive a pipelined equation for block output computation. In addition, we study the stability and multiplication complexity of the proposed pipelined-block implementation and compare with complexities of other methods. An algorithm is derived for the iterative computation of pipelined-block matrices.
Journal:Informatica
Volume 4, Issues 3-4 (1993), pp. 360–383
Abstract
An analytical equation for a generalization error of minimum empirical error classifier is derived for a case when true classes are spherically Gaussian. It is compared with the generalization error of a mean squared error classifier – a standard Fisher linear discriminant function. In a case of spherically distributed classes the generalization error depends on a distance between the classes and a number of training samples. It depends on an intrinsic dimensionality of a data only via initialization of a weight vector. If initialization is successful the dimensionality does not effect the generalization error. It is concluded advantageous conditions to use artificial neural nets are to classify patterns in a changing environment, when intrinsic dimensionality of the data is low or when the number of training sample vectors is really large.