Journal:Informatica
Volume 19, Issue 4 (2008), pp. 505–516
Abstract
The present work is concerned with speech recognition using a small or medium size vocabulary. The possibility to use the English speech recognizer for the recognition of Lithuanian was investigated. Two methods were used to deal with such problems: the expert-driven (knowledge-based) method and the data-driven one. Phonological systems of English and Lithuanian were compared on the basis of the knowledge of phonology, and relations between certain Lithuanian and English phonemes were established. Situations in which correspondences between the phonemes were to be established experimentally (i.e., using the data-driven method) and the English phonemes that best matched the Lithuanian sounds or their combinations (e.g., diphthongs) in such situations were identified. The results obtained were used for creating transcriptions of the Lithuanian names and surnames that were used in recognition experiments. The experiments without transcriptions, with a single transcription and with many transcriptions were carried on. The method that allowed finding a small number of best transcriptions was proposed. The recognition rate achieved was as follows: 84.2% with the vocabulary containing 500 word pairs.
Journal:Informatica
Volume 18, Issue 3 (2007), pp. 395–406
Abstract
This paper describes a framework for making up a set of syllables and phonemes that subsequently is used in the creation of acoustic models for continuous speech recognition of Lithuanian. The target is to discover a set of syllables and phonemes that is of utmost importance in speech recognition. This framework includes operations with lexicon, and transcriptions of records. To facilitate this work, additional programs have been developed that perform word syllabification, lexicon adjustment, etc. Series of experiments were done in order to establish the framework and model syllable- and phoneme-based speech recognition. Dominance of a syllable in lexicon has improved speech recognition results and encouraged us to move away from a strict definition of syllable, i.e., a syllable becomes a simple sub-word unit derived from a syllable. Two sets of syllables and phonemes and two types of lexicons have been developed and tested. The best recognition accuracy achieved 56.67% ±0.33. The speech recognition system is based on Hidden Markov Models (HMM). The continuous speech corpus LRN0 was used for the speech recognition experiments.
Journal:Informatica
Volume 17, Issue 4 (2006), pp. 587–600
Abstract
There is presented a technique of transcribing Lithuanian text into phonemes for speech recognition. Text-phoneme transformation has been made by formal rules and the dictionary. Formal rules were designed to set the relationship between segments of the text and units of formalized speech sounds – phonemes, dictionary – to correct transcription and specify stress mark and position. Proposed the automatic transcription technique was tested by comparing its results with manually obtained ones. The experiment has shown that less than 6% of transcribed words have not matched.
Journal:Informatica
Volume 15, Issue 4 (2004), pp. 465–474
Abstract
The development of Lithuanian HMM/ANN speech recognition system, which combines artificial neural networks (ANNs) and hidden Markov models (HMMs), is described in this paper. A hybrid HMM/ANN architecture was applied in the system. In this architecture, a fully connected three‐layer neural network (a multi‐layer perceptron) is trained by conventional stochastic back‐propagation algorithm to estimate the probability of 115 context‐independent phonetic categories and during recognition it is used as a state output probability estimator. The hybrid HMM/ANN speech recognition system based on Mel Frequency Cepstral Coefficients (MFCC) was developed using CSLU Toolkit. The system was tested on the VDU isolated‐word Lithuanian speech corpus and evaluated on a speaker‐independent ∼750 distinct isolated‐word recognition task. The word recognition accuracy obtained was about 86.7%.
Journal:Informatica
Volume 14, Issue 1 (2003), pp. 75–84
Abstract
In this paper, the opening work on the development of a Lithuanian HMM speech recognition system is described. The triphone single‐Gaussian HMM speech recognition system based on Mel Frequency Cepstral Coefficients (MFCC) was developed using HTK toolkit. Hidden Markov model's parameters were estimated from phone‐level hand‐annotated Lithuanian speech corpus. The system was evaluated on a speaker‐independent ∼750 distinct isolated‐word recognition task. Though the speaker adaptation and language modeling techniques were not used, the system was performing at 20% word error rate.