Multi-Alignment Templates Induction
Volume 19, Issue 4 (2008), pp. 535–554
Pub. online: 1 January 2008
Type: Research Article
Received
1 January 2008
1 January 2008
Accepted
1 June 2008
1 June 2008
Published
1 January 2008
1 January 2008
Abstract
This paper examins approaches for translation between English and morphology-rich languages. Experiment with English–Russian and English–Lithuanian revels that “pure” statistical approaches on 10 million word corpus gives unsatisfactory translation. Then, several Web-available linguistic resources are suggested for translation. Syntax parsers, bilingual and semantic dictionaries, bilingual parallel corpus and monolingualWeb-based corpus are integrated in one comprehensive statistical model. Multi-abstraction language representation is used for statistical induction of syntactic and semantic transformation rules called multi-alignment templates. The decodingmodel is described using the feature functions, a log-linear modeling approach and A* search algorithm. An evaluation of this approach is performed on the English–Lithuanian language pair. Presented experimental results demonstrates that the multi-abstraction approach and hybridization of learning methods can improve quality of translation.