Journal:Informatica
Volume 25, Issue 1 (2014), pp. 55–72
Abstract
Lithuanian vowel and semivowel phoneme modelling framework is proposed. Using this framework, the phoneme signal is described as the output of a linear multiple-input and single-output (MISO) system. The MISO system is a parallel connection of single-input and single-output (SISO) systems whose input impulse amplitudes vary in time. Within this framework two synthesis methods are proposed: harmonic and formant. The synthesized sounds obtained by the harmonic synthesis method are compared with those obtained by the formant method. Application of this modelling framework to all of Lithuanian vowel and semivowel synthesis gives naturally sounding result.
Journal:Informatica
Volume 22, Issue 3 (2011), pp. 411–434
Abstract
The goal of the paper is to get a method of Lithuanian speech diphthong modelling. We use a formant-based synthesizer for this modelling. The second order quasipolynomial has been chosen as the formant model in time domain. A general diphthong model is a multi-input and single-output (MISO) system, that consists of two parts where the first part corresponds to the first vowel of the diphthong and the second one – to the other vowel. The system is excited by semi-periodic impulses with a smooth transition from one vowel to the other. We derived the parametric input-output equations in the case of quasipolynomial formants, defined a new notion of the convoluted basic signal matrix, derived parametric minimization functional formulas for the convoluted output data. The new formant parameter estimation algorithm for convoluted data, based on Levenberg–Marquardt approach, has been derived and its stepwise form presented. Lithuanian diphthong /ai/ was selected as an example. This diphthong was recorded with the following parameters: PCM 48 kHz, 16 bit, stereo. Two characteristic pitches of the vowels /a/ and /i/ have been chosen. Equidistant samples of these pitches have been used for estimating parameters of MISO formant models of the vowels. Transition from the vowel /a/ to the vowel /i/ was achieved by changing excitation impulse amplitudes by the arctangent law. The method was audio tested, and the Fourier transforms of the real data and output of the MISO model have been compared. It was impossible to distinguish between the real and simulated diphthongs. The magnitude and phase responses only have shown small differences.