Pub. online:1 Jan 2018Type:Research ArticleOpen Access
Journal:Informatica
Volume 29, Issue 4 (2018), pp. 693–710
Abstract
In this paper, we propose a framework for extracting translation memory from a corpus of fiction and non-fiction books. In recent years, there have been several proposals to align bilingual corpus and extract translation memory from legal and technical documents. Yet, when it comes to an alignment of the corpus of translated fiction and non-fiction books, the existing alignment algorithms give low precision results. In order to solve this low precision problem, we propose a new method that incorporates existing alignment algorithms with proactive learning approach. We define several feature functions that are used to build two classifiers for text filtering and alignment. We report results on English-Lithuanian language pair and on bilingual corpus from 200 books. We demonstrate a significant improvement in alignment accuracy over currently available alignment systems.
Journal:Informatica
Volume 19, Issue 4 (2008), pp. 535–554
Abstract
This paper examins approaches for translation between English and morphology-rich languages. Experiment with English–Russian and English–Lithuanian revels that “pure” statistical approaches on 10 million word corpus gives unsatisfactory translation. Then, several Web-available linguistic resources are suggested for translation. Syntax parsers, bilingual and semantic dictionaries, bilingual parallel corpus and monolingualWeb-based corpus are integrated in one comprehensive statistical model. Multi-abstraction language representation is used for statistical induction of syntactic and semantic transformation rules called multi-alignment templates. The decodingmodel is described using the feature functions, a log-linear modeling approach and A* search algorithm. An evaluation of this approach is performed on the English–Lithuanian language pair. Presented experimental results demonstrates that the multi-abstraction approach and hybridization of learning methods can improve quality of translation.
Journal:Informatica
Volume 16, Issue 3 (2005), pp. 407–418
Abstract
The paper offers a new way of presenting the structure of a sentence. None of the two widely known methods of representation the syntactic structure of a sentence can be of any avail when applied to the Lithuanian language. Neither the tree, based on the phrase structure principle, nor the tree, suggested by the dependency grammar, do reflect all the syntactic information, which a Lithuanian sentence contains.
The paper points out the differences between the Lithuanian language and other languages as well as presents the reasons why a Lithuanian sentence should be represented by a graph.
The paper presents a generalized structure of a simple sentence in the Lithuanian language, namely, such a structure, which would embrace all the possible instances of a Lithuanian simple sentence. Every sentence of the text would have to activate only one path in the generalized structure.
Journal:Informatica
Volume 13, Issue 4 (2002), pp. 417–440
Abstract
High-quality machine translation between human languages has for a long time been an unattainable dream for many computer scientists involved in this fascinating and interdisciplinary field of the application of computers. The developed quite recently example-based machine translation technique seems to be a serious alternative to the existing automatic translation techniques. In the paper the usage of the example based machine translation technique for the development of system, which would be able to translate an unrestricted German text into Polish is proposed. The new approach to the example-based machine translation technique that takes into account the peculiarity of the Polish grammar is developed. The obtained primary results of the development of proposed system seem to be very promising and appear to be a step made in the right direction towards a fully-automatic high quality German-into-Polish machine translation system for unrestricted text.
Journal:Informatica
Volume 9, Issue 4 (1998), pp. 449–456
Abstract
Language engineering encompassing natural language processing and speech processing became very important for a development of every nation in multilingual Europe. After the Council of European Union approved conclucions on linguistic and cultural diversity, tools and systems created for every European language are necessary to overcome language barriers and to use all languages in various spheres of human cooperation. The paper gives an overview and a consideration of language engineering in Lithuania.