For more than two decades, Lithuanian speech recognition has been researched solely in Lithuania due to the need for deep knowledge of Lithuanian. AI advancements now allow high-quality speech-to-text systems to be built without native knowledge, given sufficient annotated data is available. This study evaluated as many as 18 Lithuanian speech transcribers using a small piece of recording; 7 best ones were selected and evaluated using extensive data. The top system achieved a WER of 5.1% for Lithuanian words, with three others showing 8.7–9.2%. For other word-size tokens, such as numbers, speech disfluencies, abbreviations, foreign words, a classification adapted to the Lithuanian language was proposed. Different processing strategies for tokens of these classes were examined and it was assessed which transcribers tend to follow which strategies.
Lithuanian Broadcast Speech Transcription Using Semi-supervised Acoustic Model Training
Rasa Lileikytė, Arseniy Gorin, Lori Lamel, Jean-Luc Gauvain, Thiago Fraga-Silva