Home
Search

Informatica

Information Submit your article For Referees Help ATTENTION!

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Keywords: speech recognition

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 12

Order by:

Select: All None Download:

An Overview of Lithuanian Intonation: A Linguistic and Modelling Perspective

Gerda Ana Melnik-Leroy Jolita Bernatavičienė Gražina Korvel Gediminas Navickas Gintautas Tamulevičius Povilas Treigys

https://doi.org/10.15388/22-INFOR502

Pub. online: 6 Dec 2022 Type: Research Article

Open Access

Journal: Informatica Volume 33, Issue 4 (2022), pp. 795–832

Abstract

Intonation is a complex suprasegmental phenomenon essential for speech processing. However, it is still largely understudied, especially in the case of under-resourced languages, such as Lithuanian. The current paper focuses on intonation in Lithuanian, a Baltic pitch-accent language with free stress and tonal variations on accented heavy syllables. Due to historical circumstances, the description and analysis of Lithuanian intonation were carried out within different theoretical frameworks and in several languages, which makes them hardly accessible to the international research community. This paper is the first attempt to gather research on Lithuanian intonation from both the Lithuanian and the Western traditions, the structuralist and generativist points of view, and the linguistic and modelling perspectives. The paper identifies issues in existing research that require special attention and proposes directions for future investigations both in linguistics and modelling.

Comparison of Phonemic and Graphemic Word to Sub-Word Unit Mappings for Lithuanian Phone-Level Speech Transcription

Gailius Raškinis Gintarė Paškauskaitė Aušra Saudargienė Asta Kazlauskienė Airenas Vaičiūnas

https://doi.org/10.15388/Informatica.2019.219

Pub. online: 1 Jan 2019 Type: Research Article

Open Access

Journal: Informatica Volume 30, Issue 3 (2019), pp. 573–593

Abstract

Conventional large vocabulary automatic speech recognition (ASR) systems require a mapping from words into sub-word units to generalize over the words that were absent in the training data and to enable the robust estimation of acoustic model parameters. This paper surveys the research done during the last 15 years on the topic of word to sub-word mappings for Lithuanian ASR systems. It also compares various phoneme and grapheme based mappings across a broad range of acoustic modelling techniques including monophone and triphone based Hidden Markov models (HMM), speaker adaptively trained HMMs, subspace gaussian mixture models (SGMM), feed-forward time delay neural network (TDNN), and state-of-the-art low frame rate bidirectional long short term memory (LFR BLSTM) recurrent deep neural network. Experimental comparisons are based on a 50-hour speech corpus. This paper shows that the best phone-based mapping significantly outperforms a grapheme-based mapping. It also shows that the lowest phone error rate of an ASR system is achieved by the phoneme-based lexicon that explicitly models syllable stress and represents diphthongs as single phonetic units.

Lithuanian Speech Corpus Liepa for Development of Human-Computer Interfaces Working in Voice Recognition and Synthesis Mode

Sigita Laurinčiukaitė Laimutis Telksnys Pijus Kasparaitis Regina Kliukienė Vilma Paukštytė

https://doi.org/10.15388/Informatica.2018.177

Pub. online: 1 Jan 2018 Type: Research Article

Open Access

Journal: Informatica Volume 29, Issue 3 (2018), pp. 487–498

Abstract

The problem of speech corpus for design of human-computer interfaces working in voice recognition and synthesis mode is investigated. Specific requirements of speech corpus for speech recognizers and synthesizers were accented. It has been discussed that in order to develop above mentioned speech corpus, it has to consist of two parts. One part of speech corpus should be presented for the needs of Lithuanian text-to-speech synthesizers, another part of speech corpus – for the needs of Lithuanian speech recognition engines. It has been determined that the part of speech corpus designed for speech recognition engines has to ensure the availability to present language specificity by the use of different sets of phonemes. According to the research results, the speech corpus Liepa, which consists of two parts, was developed. This speech corpus opens possibilities for cost-effective and flexible development of human-computer interfaces working in voice recognition and synthesis mode.

Metrics Based Quality Estimation of Speech Recognition Features

Rasa Lileikytė Laimutis Telksnys

https://doi.org/10.15388/Informatica.2013.404

Pub. online: 1 Jan 2013 Type: Research Article

Journal: Informatica Volume 24, Issue 3 (2013), pp. 435–446

Abstract

The performance of an automatic speech recognition system heavily depends on the used feature set. Quality of speech recognition features is estimated by classification error, but then the recognition experiments must be performed, including both front-end and back-end implementations. We propose a method for features quality estimation that does not require recognition experiments and accelerate automatic speech recognition system development. The key component of our method is usage of metrics right after front-end features computation. The experimental results show that our method is suitable for recognition systems with back-end Euclidean space classifiers.

Lithuanian Speech Recognition Using the English Recognizer

Pijus Kasparaitis

https://doi.org/10.15388/Informatica.2008.227

Pub. online: 1 Jan 2008 Type: Research Article

Journal: Informatica Volume 19, Issue 4 (2008), pp. 505–516

Abstract

The present work is concerned with speech recognition using a small or medium size vocabulary. The possibility to use the English speech recognizer for the recognition of Lithuanian was investigated. Two methods were used to deal with such problems: the expert-driven (knowledge-based) method and the data-driven one. Phonological systems of English and Lithuanian were compared on the basis of the knowledge of phonology, and relations between certain Lithuanian and English phonemes were established. Situations in which correspondences between the phonemes were to be established experimentally (i.e., using the data-driven method) and the English phonemes that best matched the Lithuanian sounds or their combinations (e.g., diphthongs) in such situations were identified. The results obtained were used for creating transcriptions of the Lithuanian names and surnames that were used in recognition experiments. The experiments without transcriptions, with a single transcription and with many transcriptions were carried on. The method that allowed finding a small number of best transcriptions was proposed. The recognition rate achieved was as follows: 84.2% with the vocabulary containing 500 word pairs.

Framework for Choosing a Set of Syllables and Phonemes for Lithuanian Speech Recognition

Sigita Laurinčiukaitė Antanas Lipeika

https://doi.org/10.15388/Informatica.2007.184

Pub. online: 1 Jan 2007 Type: Research Article

Journal: Informatica Volume 18, Issue 3 (2007), pp. 395–406

Abstract

This paper describes a framework for making up a set of syllables and phonemes that subsequently is used in the creation of acoustic models for continuous speech recognition of Lithuanian. The target is to discover a set of syllables and phonemes that is of utmost importance in speech recognition. This framework includes operations with lexicon, and transcriptions of records. To facilitate this work, additional programs have been developed that perform word syllabification, lexicon adjustment, etc. Series of experiments were done in order to establish the framework and model syllable- and phoneme-based speech recognition. Dominance of a syllable in lexicon has improved speech recognition results and encouraged us to move away from a strict definition of syllable, i.e., a syllable becomes a simple sub-word unit derived from a syllable. Two sets of syllables and phonemes and two types of lexicons have been developed and tested. The best recognition accuracy achieved 56.67% ±0.33. The speech recognition system is based on Hidden Markov Models (HMM). The continuous speech corpus LRN0 was used for the speech recognition experiments.

Automatic Transcription of Lithuanian Text Using Dictionary

Mantas Skripkauskas Laimutis Telksnys

https://doi.org/10.15388/Informatica.2006.157

Pub. online: 1 Jan 2006 Type: Research Article

Journal: Informatica Volume 17, Issue 4 (2006), pp. 587–600

Abstract

There is presented a technique of transcribing Lithuanian text into phonemes for speech recognition. Text-phoneme transformation has been made by formal rules and the dictionary. Formal rules were designed to set the relationship between segments of the text and units of formalized speech sounds – phonemes, dictionary – to correct transcription and specify stress mark and position. Proposed the automatic transcription technique was tested by comparing its results with manually obtained ones. The experiment has shown that less than 6% of transcribed words have not matched.

Discrimination of Homographs Distorted by a Lengthy Impulsive Noise

Šarūnas Paulikas Dalius Navakauskas

https://doi.org/10.15388/Informatica.2006.139

Pub. online: 1 Jan 2006 Type: Research Article

Journal: Informatica Volume 17, Issue 2 (2006), pp. 297–304

Abstract

The paper addresses the problem of discrimination of homographs when a lengthy segment of an uttered word is missing. The considered discrimination procedure is done by recognizer that operates on cepstrum coefficients extracted from the speech signal. For restoration of the missing speech segment rather than use of the known speech signal, it has been proposed to calculate speech signal characteristics: the period of fundamental frequency and intensity. By experimentation it has been shown that the polynomial approximation of speech signal characteristics improves homograph discrimination results. An extra computational burden associated with the proposed method is not high because it involves recalculation of the already extracted Fourier coefficients.

Development of HMM/Neural Network‐Based Medium‐Vocabulary Isolated‐Word Lithuanian Speech Recognition System

Mark Filipovič Antanas Lipeika

https://doi.org/10.15388/Informatica.2004.073

Pub. online: 1 Jan 2004 Type: Research Article

Journal: Informatica Volume 15, Issue 4 (2004), pp. 465–474

Abstract

The development of Lithuanian HMM/ANN speech recognition system, which combines artificial neural networks (ANNs) and hidden Markov models (HMMs), is described in this paper. A hybrid HMM/ANN architecture was applied in the system. In this architecture, a fully connected three‐layer neural network (a multi‐layer perceptron) is trained by conventional stochastic back‐propagation algorithm to estimate the probability of 115 context‐independent phonetic categories and during recognition it is used as a state output probability estimator. The hybrid HMM/ANN speech recognition system based on Mel Frequency Cepstral Coefficients (MFCC) was developed using CSLU Toolkit. The system was tested on the VDU isolated‐word Lithuanian speech corpus and evaluated on a speaker‐independent ∼750 distinct isolated‐word recognition task. The word recognition accuracy obtained was about 86.7%.

Building Medium‐Vocabulary Isolated‐Word Lithuanian HMM Speech Recognition System

Gailius Raškinis Danutė Raškinienė

https://doi.org/10.15388/Informatica.2003.005

Pub. online: 1 Jan 2003 Type: Research Article

Journal: Informatica Volume 14, Issue 1 (2003), pp. 75–84

Abstract

In this paper, the opening work on the development of a Lithuanian HMM speech recognition system is described. The triphone single‐Gaussian HMM speech recognition system based on Mel Frequency Cepstral Coefficients (MFCC) was developed using HTK toolkit. Hidden Markov model's parameters were estimated from phone‐level hand‐annotated Lithuanian speech corpus. The system was evaluated on a speaker‐independent ∼750 distinct isolated‐word recognition task. Though the speaker adaptation and language modeling techniques were not used, the system was performing at 20% word error rate.

1 2

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

INFORMATICA

Online ISSN: 1822-8844
Print ISSN: 0868-4952

About

About journal

For contributors

OA Policy
Submit your article
Instructions for Referees

Contact us

Institute of Data Science and Digital Technologies
Vilnius University

Akademijos St. 4

08412 Vilnius, Lithuania

Phone: (+370 5) 2109 338

E-mail: informatica@mii.vu.lt
https://informatica.vu.lt/journal/INFORMATICA