Pub. online:1 Jan 2019Type:Research ArticleOpen Access
Journal:Informatica
Volume 30, Issue 3 (2019), pp. 573–593
Abstract
Conventional large vocabulary automatic speech recognition (ASR) systems require a mapping from words into sub-word units to generalize over the words that were absent in the training data and to enable the robust estimation of acoustic model parameters. This paper surveys the research done during the last 15 years on the topic of word to sub-word mappings for Lithuanian ASR systems. It also compares various phoneme and grapheme based mappings across a broad range of acoustic modelling techniques including monophone and triphone based Hidden Markov models (HMM), speaker adaptively trained HMMs, subspace gaussian mixture models (SGMM), feed-forward time delay neural network (TDNN), and state-of-the-art low frame rate bidirectional long short term memory (LFR BLSTM) recurrent deep neural network. Experimental comparisons are based on a 50-hour speech corpus. This paper shows that the best phone-based mapping significantly outperforms a grapheme-based mapping. It also shows that the lowest phone error rate of an ASR system is achieved by the phoneme-based lexicon that explicitly models syllable stress and represents diphthongs as single phonetic units.
Journal:Informatica
Volume 10, Issue 2 (1999), pp. 245–269
Abstract
Structurization of the sample covariance matrix reduces the number of the parameters to be estimated and, in a case the structurization assumptions are correct, improves small sample properties of a statistical linear classifier. Structured estimates of the sample covariance matrix are used to decorellate and scale the data, and to train a single layer perceptron classifier afterwards. In most from ten real world pattern classification problems tested, the structurization methodology applied together with the data transformations and subsequent use of the optimally stopped single layer perceptron resulted in a significant gain in comparison with the best statistical linear classifier – the regularized discriminant analysis.