Informatica logo


Login Register

  1. Home
  2. Issues
  3. Volume 17, Issue 1 (2006)
  4. Cache-based Statistical Language Models ...

Informatica

Information Submit your article For Referees Help ATTENTION!
  • Article info
  • Related articles
  • Cited by
  • More
    Article info Related articles Cited by

Cache-based Statistical Language Models of English and Highly Inflected Lithuanian
Volume 17, Issue 1 (2006), pp. 111–124
Airenas Vaičiūnas   Gailius Raškinis  

Authors

 
Placeholder
https://doi.org/10.15388/Informatica.2006.127
Pub. online: 1 January 2006      Type: Research Article     

Received
1 August 2005
Published
1 January 2006

Abstract

This paper investigates a variety of statistical cache-based language models built upon three corpora: English, Lithuanian, and Lithuanian base forms. The impact of the cache size, type of the decay function, including custom corpus derived functions, and interpolation technique (static vs. dynamic) on the perplexity of a language model is studied. The best results are achieved by models consisting of 3 components: standard 3-gram, decaying cache 1-gram and decaying cache 2-gram that are joined together by means of linear interpolation using the technique of dynamic weight update. Such a model led up to 36% and 43% perplexity improvement with respect to the 3-gram baseline for Lithuanian words and Lithuanian word base forms respectively. The best language model of English led up to a 16% perplexity improvement. This suggests that cache-based modeling is of greater utility for the free word order highly inflected languages.

Related articles Cited by PDF XML
Related articles Cited by PDF XML

Copyright
No copyright data available.

Keywords
language models n-grams cache models dynamic interpolation perplexity reduction inflected language free word order language Lithuanian

Metrics
since January 2020
735

Article info
views

0

Full article
views

492

PDF
downloads

187

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

INFORMATICA

  • Online ISSN: 1822-8844
  • Print ISSN: 0868-4952
  • Copyright © 2023 Vilnius University

About

  • About journal

For contributors

  • OA Policy
  • Submit your article
  • Instructions for Referees
    •  

    •  

Contact us

  • Institute of Data Science and Digital Technologies
  • Vilnius University

    Akademijos St. 4

    08412 Vilnius, Lithuania

    Phone: (+370 5) 2109 338

    E-mail: informatica@mii.vu.lt

    https://informatica.vu.lt/journal/INFORMATICA
Powered by PubliMill  •  Privacy policy