Journal:Informatica
Volume 24, Issue 3 (2013), pp. 435–446
Abstract
The performance of an automatic speech recognition system heavily depends on the used feature set. Quality of speech recognition features is estimated by classification error, but then the recognition experiments must be performed, including both front-end and back-end implementations. We propose a method for features quality estimation that does not require recognition experiments and accelerate automatic speech recognition system development. The key component of our method is usage of metrics right after front-end features computation. The experimental results show that our method is suitable for recognition systems with back-end Euclidean space classifiers.
Journal:Informatica
Volume 19, Issue 4 (2008), pp. 505–516
Abstract
The present work is concerned with speech recognition using a small or medium size vocabulary. The possibility to use the English speech recognizer for the recognition of Lithuanian was investigated. Two methods were used to deal with such problems: the expert-driven (knowledge-based) method and the data-driven one. Phonological systems of English and Lithuanian were compared on the basis of the knowledge of phonology, and relations between certain Lithuanian and English phonemes were established. Situations in which correspondences between the phonemes were to be established experimentally (i.e., using the data-driven method) and the English phonemes that best matched the Lithuanian sounds or their combinations (e.g., diphthongs) in such situations were identified. The results obtained were used for creating transcriptions of the Lithuanian names and surnames that were used in recognition experiments. The experiments without transcriptions, with a single transcription and with many transcriptions were carried on. The method that allowed finding a small number of best transcriptions was proposed. The recognition rate achieved was as follows: 84.2% with the vocabulary containing 500 word pairs.
Journal:Informatica
Volume 17, Issue 4 (2006), pp. 587–600
Abstract
There is presented a technique of transcribing Lithuanian text into phonemes for speech recognition. Text-phoneme transformation has been made by formal rules and the dictionary. Formal rules were designed to set the relationship between segments of the text and units of formalized speech sounds – phonemes, dictionary – to correct transcription and specify stress mark and position. Proposed the automatic transcription technique was tested by comparing its results with manually obtained ones. The experiment has shown that less than 6% of transcribed words have not matched.
Journal:Informatica
Volume 15, Issue 4 (2004), pp. 465–474
Abstract
The development of Lithuanian HMM/ANN speech recognition system, which combines artificial neural networks (ANNs) and hidden Markov models (HMMs), is described in this paper. A hybrid HMM/ANN architecture was applied in the system. In this architecture, a fully connected three‐layer neural network (a multi‐layer perceptron) is trained by conventional stochastic back‐propagation algorithm to estimate the probability of 115 context‐independent phonetic categories and during recognition it is used as a state output probability estimator. The hybrid HMM/ANN speech recognition system based on Mel Frequency Cepstral Coefficients (MFCC) was developed using CSLU Toolkit. The system was tested on the VDU isolated‐word Lithuanian speech corpus and evaluated on a speaker‐independent ∼750 distinct isolated‐word recognition task. The word recognition accuracy obtained was about 86.7%.
Journal:Informatica
Volume 14, Issue 1 (2003), pp. 75–84
Abstract
In this paper, the opening work on the development of a Lithuanian HMM speech recognition system is described. The triphone single‐Gaussian HMM speech recognition system based on Mel Frequency Cepstral Coefficients (MFCC) was developed using HTK toolkit. Hidden Markov model's parameters were estimated from phone‐level hand‐annotated Lithuanian speech corpus. The system was evaluated on a speaker‐independent ∼750 distinct isolated‐word recognition task. Though the speaker adaptation and language modeling techniques were not used, the system was performing at 20% word error rate.