Journal:Informatica
Volume 15, Issue 3 (2004), pp. 303–314
Abstract
The article presents a limited‐vocabulary speaker independent continuous Estonian speech recognition system based on hidden Markov models. The system is trained using an annotated Estonian speech database of 60 speakers, approximately 4 hours in duration. Words are modelled using clustered triphones with multiple Gaussian mixture components. The system is evaluated using a number recognition task and a simple medium‐vocabulary recognition task. The system performance is explored by employing acoustic models of increasing complexity. The number recognizer achieves an accuracy of 97%. The medium‐vocabulary system recognizes 82.9% words correctly if operating in real time. The correctness increases to 90.6% if real‐time requirement is discarded.