Pub. online:1 Jan 2019Type:Research ArticleOpen Access
Journal:Informatica
Volume 30, Issue 4 (2019), pp. 629–645
Abstract
Machine Translation has become an important tool in overcoming the language barrier. The quality of translations depends on the languages and used methods. The research presented in this paper is based on well-known standard methods for Statistical Machine Translation that are advanced by a newly proposed approach for optimizing the weights of translation system components. Better weights of system components improve the translation quality. In most cases, machine translation systems translate to/from English and, in our research, English is paired with a Slavic language, Slovenian. In our experiment, we built two Statistical Machine Translation systems for the Slovenian-English language pair of the Acquis Communautaire corpus. Both systems were optimized using self-adaptive Differential Evolution and compared to the other related optimization methods. The results show improvement in the translation quality, and are comparable to the other related methods.
Journal:Informatica
Volume 21, Issue 1 (2010), pp. 95–116
Abstract
We address the problem of statistical machine translation from highly inflective language to less inflective one. The characteristics of inflective languages are generally not taken into account by the statistical machine translation system. Existing translation systems often treat different inflected word forms of the same lemma as if they were independent of each other, although some interdependencies exist. On the other hand we know that if we reduce inflected word forms to common lemmas, some information is lost. It would be reasonable to eliminate only the variations in inflected word forms, which are not relevant for translation. Inflectional features of words are defined by morpho-syntactic descriptions (MSD) tags and we want reduce them. To do this the explicit knowledge about both languages (source and target language) is needed. The idea of the paper is to find the information-bearing MSDs in source language by data-driven approach. The task is performed by a global optimization algorithm, named Differential Evolution. The experiments were performed using freely available parallel English–Slovenian corpus SVEZ-IJS, which is lemmatized and annotated with MSD tags. The results show a promising direction toward optimal subset of morpho-syntactic features.