Pub. online:5 Aug 2022Type:Research ArticleOpen Access
Journal:Informatica
Volume 16, Issue 2 (2005), pp. 193–202
Abstract
One of the components of the text-to-speech synthesis system is the database of sounds. Two Lithuanian diphone databases in the MBROLA format are presented in this paper. The list of phonemes and the list of diphones necessary for Lithuanian text-to-speech synthesis are described. The problem of phoneme combinations that are not used in the Lithuanian language is dealt with in the work. Also, the article is concerned with transcribing a Lithuanian text.
Pub. online:1 Jan 2018Type:Research ArticleOpen Access
Journal:Informatica
Volume 29, Issue 3 (2018), pp. 487–498
Abstract
The problem of speech corpus for design of human-computer interfaces working in voice recognition and synthesis mode is investigated. Specific requirements of speech corpus for speech recognizers and synthesizers were accented. It has been discussed that in order to develop above mentioned speech corpus, it has to consist of two parts. One part of speech corpus should be presented for the needs of Lithuanian text-to-speech synthesizers, another part of speech corpus – for the needs of Lithuanian speech recognition engines. It has been determined that the part of speech corpus designed for speech recognition engines has to ensure the availability to present language specificity by the use of different sets of phonemes. According to the research results, the speech corpus Liepa, which consists of two parts, was developed. This speech corpus opens possibilities for cost-effective and flexible development of human-computer interfaces working in voice recognition and synthesis mode.
Journal:Informatica
Volume 27, Issue 3 (2016), pp. 573–586
Abstract
Phoneme duration modelling is one of the stages in prosody modelling for text-to-speech systems. The rule-based phoneme duration model proposed by Klatt (1979) is still quite a popular method. One of the main shortcomings of this method is that the values of the parameters are selected in an experimental way. This work proposes a new iterative algorithm for the automatic estimation of the factors for the Klatt model using the corpus of an annotated audio record of the speaker. The phoneme duration models were built for three different Lithuanian speakers. The quality of the estimation of phonemes durations was evaluated by the root mean square error, the mean absolute error and the correlation coefficient.
Journal:Informatica
Volume 25, Issue 4 (2014), pp. 551–562
Abstract
Abstract
The present paper deals with building the text corpus for unit selection text-to-speech synthesis. During synthesis the target and concatenation costs are calculated and these costs are usually based on the prosodic and acoustic features of sounds. If the cost calculation is moved to the phonological level, it is possible to simulate unit selection synthesis without any real recordings; in this case text transcriptions are sufficient. We propose to use the cost calculated during the test data synthesis simulation to evaluate the text corpus quality. The greedy algorithm that maximizes coverage of certain phonetic units will be used to build the corpus. In this work the corpora optimized to cover phonetic units of different size and weight are evaluated.
Journal:Informatica
Volume 19, Issue 4 (2008), pp. 505–516
Abstract
The present work is concerned with speech recognition using a small or medium size vocabulary. The possibility to use the English speech recognizer for the recognition of Lithuanian was investigated. Two methods were used to deal with such problems: the expert-driven (knowledge-based) method and the data-driven one. Phonological systems of English and Lithuanian were compared on the basis of the knowledge of phonology, and relations between certain Lithuanian and English phonemes were established. Situations in which correspondences between the phonemes were to be established experimentally (i.e., using the data-driven method) and the English phonemes that best matched the Lithuanian sounds or their combinations (e.g., diphthongs) in such situations were identified. The results obtained were used for creating transcriptions of the Lithuanian names and surnames that were used in recognition experiments. The experiments without transcriptions, with a single transcription and with many transcriptions were carried on. The method that allowed finding a small number of best transcriptions was proposed. The recognition rate achieved was as follows: 84.2% with the vocabulary containing 500 word pairs.
Journal:Informatica
Volume 12, Issue 2 (2001), pp. 315–336
Abstract
The paper deals with automatic stressing of the Lithuanian text. In the previous work the author presented an algorithm for automatic stressing of the Lithuanian text on the basis of a dictionary. The aim of the present work is to improve the above mentioned algorithm by including formal stressing rules for nouns and adjectives. By means of these rules such words as diminutives, names and degrees of adjectives that are not present in the dictionary may be stressed. The work analyses when it is more convenient to formulate rules manually and when to generate them automatically. A method for formulating rules manually has been described and a set of such rules has been presented. Besides the algorithm for generating stressing rules with the help of a dictionary of stems of nouns and adjectives has been given.
Journal:Informatica
Volume 11, Issue 1 (2000), pp. 19–40
Abstract
The paper deals with one of the components of text-to-speech synthesis of the Lithuanian language, namely – automatic text stressing. The present work substantiates the necessity to divide words into fixed and variable parts used to build different grammatical forms, as well as to store only those parts rather than the whole worlds in the dictionary. According to the inflexion method, all words of the Lithuanian language are divided into three groups (noun-adjectives, verbs and non-inflectional words) and each group is analysed separately. The type of information, as well as the form in which it is to be stored, has been established for each group and the algorithm by means of which the grammatical form of a word can be recognised and stressed, has been presented.
Journal:Informatica
Volume 10, Issue 4 (1999), pp. 367–376
Abstract
This paper deals with one of the components of text-to-speech synthesis of Lithuanian language namely – text transcription. Formal rules' method is used for text transcription. In this work the suitability of this method is grounded, an analysis of the form of rules to fit is made and the set of rules and interpreting algorithm is presented. Contextual information, features of stress, syllable boundaries and softness are used in the rules.