Pub. online:6 Mar 2026Type:Research ArticleOpen Access
Journal:Informatica
Volume 37, Issue 1 (2026), pp. 229–249
Abstract
Forward-looking coding has recently been introduced as a source modelling paradigm that exploits predictions of forthcoming symbols. In this paper, we extend this methodology to word-level alphabets, enabling improved compression performance for large and variable-length symbol sets. We present a space-efficient scheme for encoding header information, with particular emphasis on the accurate representation of symbol frequency distributions. In addition, we propose an alternative ordering strategy for word-based dictionaries that leverages the adaptive nature of forward-looking compression. We further show how these techniques can be integrated with a word-based Prediction by Partial Matching model of order one, while avoiding the zero-frequency problem. Experimental results confirm the effectiveness of the proposed approach across multiple datasets.