Journal:Informatica
Volume 21, Issue 3 (2010), pp. 361–374
Abstract
The paper deals with the use of formant features in dynamic time warping based speech recognition. These features can be simply visualized and give a new insight into understanding the reasons of speech recognition errors. The formant feature extraction method, based on the singular prediction polynomials, has been applied in recognition of isolated words. However, the speech recognition performance depends on the order of singular prediction polynomials, whether symmetric or antisymmetric singular prediction polynomials are used for recognition and as well on the fact even or odd order of these polynomials is chosen. Also, it is important to know how informative separate formants are, how the speech recognition results depend on other parameters of the recognition system such as: analysis frame length, number of the formants used in recognition, frequency scale used for representation of formant features, and the preemphasis filter parameters. Properly choosing the processing parameters, it is possible to optimize the speech recognition performance.
The aim of our current investigation is to optimize formant feature based isolated word recognition performance by varying processing parameters of the recognition system as well as to find improvements of the recognition system which could make it more robust to white noise. The optimization experiments were carried out using speech records of 111 Lithuanian words. The speech signals were recorded in the conventional room environment (SNR = 30 dB). Then the white noise was generated at a predefined level (65 dB, 60 dB and 55 dB) and added to the test utterances. The recognition performance was evaluated at various noise levels.
The optimization experiments allowed us to improve considerably the performance of the formant feature based speech recognition system and made the system more robust to white noise.
Journal:Informatica
Volume 19, Issue 2 (2008), pp. 213–226
Abstract
A possibility to use the formant features (FF) in the user-dependent isolated word recognition has been investigated. The word recognition was performed using a dynamic time-warping technique. Several methods of the formant feature extraction were compared and a method based on the singular prediction polynomials has been proposed for the recognition of isolated words. Recognition performance of the proposed method was compared to that of the linear prediction coding (LPC) and LPC-derived cepstral features (LPCC). In total, 111 Lithuanian words were used in the recognition experiment. The recognition performance was evaluated at various noise levels. The experiments have shown that the formant features calculated from the singular prediction polynomials are more reliable than the LPC and LPCC features at all noise levels.
Journal:Informatica
Volume 18, Issue 3 (2007), pp. 395–406
Abstract
This paper describes a framework for making up a set of syllables and phonemes that subsequently is used in the creation of acoustic models for continuous speech recognition of Lithuanian. The target is to discover a set of syllables and phonemes that is of utmost importance in speech recognition. This framework includes operations with lexicon, and transcriptions of records. To facilitate this work, additional programs have been developed that perform word syllabification, lexicon adjustment, etc. Series of experiments were done in order to establish the framework and model syllable- and phoneme-based speech recognition. Dominance of a syllable in lexicon has improved speech recognition results and encouraged us to move away from a strict definition of syllable, i.e., a syllable becomes a simple sub-word unit derived from a syllable. Two sets of syllables and phonemes and two types of lexicons have been developed and tested. The best recognition accuracy achieved 56.67% ±0.33. The speech recognition system is based on Hidden Markov Models (HMM). The continuous speech corpus LRN0 was used for the speech recognition experiments.
Journal:Informatica
Volume 15, Issue 4 (2004), pp. 465–474
Abstract
The development of Lithuanian HMM/ANN speech recognition system, which combines artificial neural networks (ANNs) and hidden Markov models (HMMs), is described in this paper. A hybrid HMM/ANN architecture was applied in the system. In this architecture, a fully connected three‐layer neural network (a multi‐layer perceptron) is trained by conventional stochastic back‐propagation algorithm to estimate the probability of 115 context‐independent phonetic categories and during recognition it is used as a state output probability estimator. The hybrid HMM/ANN speech recognition system based on Mel Frequency Cepstral Coefficients (MFCC) was developed using CSLU Toolkit. The system was tested on the VDU isolated‐word Lithuanian speech corpus and evaluated on a speaker‐independent ∼750 distinct isolated‐word recognition task. The word recognition accuracy obtained was about 86.7%.
Journal:Informatica
Volume 14, Issue 4 (2003), pp. 487–496
Abstract
The paper deals with the use of dynamic programming for word endpoint detection in isolated word recognition. Endpoint detection is based on likelihood maximization. Expectation maximization approach is used to deal with the problem of unknown parameters. Speech signal and background noise energy is used as features for making decision. Performance of the proposed approach was evaluated using isolated Lithuanian words speech corpus.
Journal:Informatica
Volume 13, Issue 1 (2002), pp. 37–46
Abstract
The isolated word speech recognition system based on dynamic time warping (DTW) has been developed. Speaker adaptation is performed using speaker recognition techniques. Vector quantization is used to create reference templates for speaker recognition. Linear predictive coding (LPC) parameters are used as features for recognition. Performance is evaluated using 12 words of Lithuanian language pronounced ten times by ten speakers.
Journal:Informatica
Volume 11, Issue 3 (2000), pp. 243–256
Abstract
This paper deals with maximum likelihood and least square segmentation of autoregressive random sequences with abruptly changing parameters. Conditional distribution of the observations has been derived. Objective function was modified to the form suitable to apply dynamic programming method for its optimization. Expressions of Bellman functions for this case were obtained. Performance of presented approach is illustrated with simulation examples and segmentation of speech signals examples.
Journal:Informatica
Volume 10, Issue 4 (1999), pp. 377–388
Abstract
The problem of text-independent speaker recognition based on the use of vocal tract and residue signal LPC parameters is investigated. Pseudostationary segments of voiced sounds are used for feature selection. Parameters of the linear prediction model (LPC) of vocal tract and residue signal or LPC derived cepstral parameters are used as features for speaker recognition. Speaker identification is performed by applying nearest neighbour rule to average distance between speakers. Comparison of distributions of intraindividual and interindividual distortions is used for speaker verification. Speaker recognition performance is investigated. Results of experiments demonstrate speaker recognition performance.
Journal:Informatica
Volume 9, Issue 4 (1998), pp. 449–456
Abstract
Language engineering encompassing natural language processing and speech processing became very important for a development of every nation in multilingual Europe. After the Council of European Union approved conclucions on linguistic and cultural diversity, tools and systems created for every European language are necessary to overcome language barriers and to use all languages in various spheres of human cooperation. The paper gives an overview and a consideration of language engineering in Lithuania.
Journal:Informatica
Volume 7, Issue 4 (1996), pp. 469–484
Abstract
The problem of speaker identification is investigated. Basic segments – pseudostationary intervals of voiced sounds are used for identification. The identification is carried out, comparing average distances between an investigative and comparatives. Coefficients of the linear prediction model (LPC) of a vocal tract, cepstral coefficients and LPC coefficients of an excitation signal are used for identification as features. Three speaker identification methods are presented. Experimental investigation of their performance is discussed.