Pub. online:5 Aug 2022Type:Research ArticleOpen Access
Journal:Informatica
Volume 16, Issue 1 (2005), pp. 61–74
Abstract
This paper discusses a soft sample clustering problem for multivariate independent random data satisfying the mixture model of the Gaussian distribution. The theory recommends to estimate the parameters of model by the maximum likelihood method and to use “plug-in” approach for data clustering. Unfortunately, the calculation problem of the maximum likelihood estimate is not completely solved in multivariate case. This work proposes a new constructive a few stage procedure to solve this task. This procedure includes statistical distribution analysis of a large number of the univariate projections of observations, geometric clustering of a multivariate sample and application of EM algorithm. The results of the accuracy analysis of the proposed methods is made by means of Monte-Carlo simulation.
Journal:Informatica
Volume 19, Issue 4 (2008), pp. 535–554
Abstract
This paper examins approaches for translation between English and morphology-rich languages. Experiment with English–Russian and English–Lithuanian revels that “pure” statistical approaches on 10 million word corpus gives unsatisfactory translation. Then, several Web-available linguistic resources are suggested for translation. Syntax parsers, bilingual and semantic dictionaries, bilingual parallel corpus and monolingualWeb-based corpus are integrated in one comprehensive statistical model. Multi-abstraction language representation is used for statistical induction of syntactic and semantic transformation rules called multi-alignment templates. The decodingmodel is described using the feature functions, a log-linear modeling approach and A* search algorithm. An evaluation of this approach is performed on the English–Lithuanian language pair. Presented experimental results demonstrates that the multi-abstraction approach and hybridization of learning methods can improve quality of translation.