Informatica logo


Login Register

  1. Home
  2. Issues
  3. Volume 15, Issue 4 (2004)
  4. Statistical Language Models of Lithuania ...

Informatica

Information Submit your article For Referees Help ATTENTION!
  • Article info
  • Related articles
  • Cited by
  • More
    Article info Related articles Cited by

Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition
Volume 15, Issue 4 (2004), pp. 565–580
Airenas Vaičiūnas   Vytautas Kaminskas   Gailius Raškinis  

Authors

 
Placeholder
https://doi.org/10.15388/Informatica.2004.079
Pub. online: 1 January 2004      Type: Research Article     

Received
1 March 2004
Published
1 January 2004

Abstract

This paper describes our research on statistical language modeling of Lithuanian. The idea of improving sparse n‐gram models of highly inflected Lithuanian language by interpolating them with complex n‐gram models based on word clustering and morphological word decomposition was investigated. Words, word base forms and part‐of‐speech tags were clustered into 50 to 5000 automatically generated classes. Multiple 3‐gram and 4‐gram class‐based language models were built and evaluated on Lithuanian text corpus, which contained 85 million words. Class‐based models linearly interpolated with the 3‐gram model led up to a 13% reduction in the perplexity compared with the baseline 3‐gram model. Morphological models decreased out‐of‐vocabulary word rate from 1.5% to 1.02%.

Related articles Cited by PDF XML
Related articles Cited by PDF XML

Copyright
No copyright data available.

Keywords
language models n‐grams class‐based models morphology inflections interpolation perplexity reduction out‐of‐vocabulary words

Metrics
since January 2020
721

Article info
views

0

Full article
views

511

PDF
downloads

204

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

INFORMATICA

  • Online ISSN: 1822-8844
  • Print ISSN: 0868-4952
  • Copyright © 2023 Vilnius University

About

  • About journal

For contributors

  • OA Policy
  • Submit your article
  • Instructions for Referees
    •  

    •  

Contact us

  • Institute of Data Science and Digital Technologies
  • Vilnius University

    Akademijos St. 4

    08412 Vilnius, Lithuania

    Phone: (+370 5) 2109 338

    E-mail: informatica@mii.vu.lt

    https://informatica.vu.lt/journal/INFORMATICA
Powered by PubliMill  •  Privacy policy