Modelling of Lithuanian Speech Diphthongs
Volume 22, Issue 3 (2011), pp. 411–434
Pub. online: 1 January 2011
Type: Research Article
Received
1 January 2011
1 January 2011
Accepted
1 May 2011
1 May 2011
Published
1 January 2011
1 January 2011
Abstract
The goal of the paper is to get a method of Lithuanian speech diphthong modelling. We use a formant-based synthesizer for this modelling. The second order quasipolynomial has been chosen as the formant model in time domain. A general diphthong model is a multi-input and single-output (MISO) system, that consists of two parts where the first part corresponds to the first vowel of the diphthong and the second one – to the other vowel. The system is excited by semi-periodic impulses with a smooth transition from one vowel to the other. We derived the parametric input-output equations in the case of quasipolynomial formants, defined a new notion of the convoluted basic signal matrix, derived parametric minimization functional formulas for the convoluted output data. The new formant parameter estimation algorithm for convoluted data, based on Levenberg–Marquardt approach, has been derived and its stepwise form presented. Lithuanian diphthong /ai/ was selected as an example. This diphthong was recorded with the following parameters: PCM 48 kHz, 16 bit, stereo. Two characteristic pitches of the vowels /a/ and /i/ have been chosen. Equidistant samples of these pitches have been used for estimating parameters of MISO formant models of the vowels. Transition from the vowel /a/ to the vowel /i/ was achieved by changing excitation impulse amplitudes by the arctangent law. The method was audio tested, and the Fourier transforms of the real data and output of the MISO model have been compared. It was impossible to distinguish between the real and simulated diphthongs. The magnitude and phase responses only have shown small differences.