Informatica logo


Login Register

  1. Home
  2. Issues
  3. Volume 30, Issue 3 (2019)
  4. Comparison of Phonemic and Graphemic Wor ...

Informatica

Information Submit your article For Referees Help ATTENTION!
  • Article info
  • Full article
  • Related articles
  • Cited by
  • More
    Article info Full article Related articles Cited by

Comparison of Phonemic and Graphemic Word to Sub-Word Unit Mappings for Lithuanian Phone-Level Speech Transcription
Volume 30, Issue 3 (2019), pp. 573–593
Gailius Raškinis   Gintarė Paškauskaitė   Aušra Saudargienė   Asta Kazlauskienė   Airenas Vaičiūnas  

Authors

 
Placeholder
https://doi.org/10.15388/Informatica.2019.219
Pub. online: 1 January 2019      Type: Research Article      Open accessOpen Access

Received
1 June 2018
Accepted
1 May 2019
Published
1 January 2019

Abstract

Conventional large vocabulary automatic speech recognition (ASR) systems require a mapping from words into sub-word units to generalize over the words that were absent in the training data and to enable the robust estimation of acoustic model parameters. This paper surveys the research done during the last 15 years on the topic of word to sub-word mappings for Lithuanian ASR systems. It also compares various phoneme and grapheme based mappings across a broad range of acoustic modelling techniques including monophone and triphone based Hidden Markov models (HMM), speaker adaptively trained HMMs, subspace gaussian mixture models (SGMM), feed-forward time delay neural network (TDNN), and state-of-the-art low frame rate bidirectional long short term memory (LFR BLSTM) recurrent deep neural network. Experimental comparisons are based on a 50-hour speech corpus. This paper shows that the best phone-based mapping significantly outperforms a grapheme-based mapping. It also shows that the lowest phone error rate of an ASR system is achieved by the phoneme-based lexicon that explicitly models syllable stress and represents diphthongs as single phonetic units.

References

 
Alumäe, T., Tilk, O. (2016). Automatic speech recognition system for Lithuanian broadcast audio. In: Human Language Technologies – The Baltic Perspective: Proceedings of the Seventh International Conference, Baltic HLT 2016, Vol. 289, pp. 39–45.
 
Collobert, R., Puhrsch, C., Synnaeve, G. (2016). Wav2Letter: an end-to-end ConvNet-based speech recognition system. arXiv:1609.03193 [cs.LG].
 
Gales, M.J.F. (1998). Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language, 12(2), 75–98.
 
Gales, M.J.F. (1999). Semi-tied covariance matrices for hidden Markov models. IEEE Transactions on Speech and Audio Processing, 7, 272–281.
 
Gales, M.J.F., Knill, K.M., Ragni, A. (2015). Unicode-based graphemic systems for limited resource languages. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5186–5190.
 
Girdenis, A. (2014). Theoretical Foundations of Lithuanian Phonology. English translation by Steven Young, XVII, 413 p.
 
Greibus, M., Ringelienė, Ž., Telksnys, A.L. (2017). The phoneme set influence for Lithuanian speech commands recognition accuracy. In: Proceedings of the Conference Electrical, Electronic and Information Sciences (eStream), pp. 1–4.
 
Harper, M. (2016). Babel: US IARPA Project (2012–2016). https://www.iarpa.gov/index.php/research-programs/babel.
 
Kanthak, S., Ney, H. (2002). Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 2, pp. 845–848.
 
Kazlauskienė, A., Raškinis, G., Vaičiūnas, A. (2010). Automatic Syllabification, Stress Assignment and Phonetic Transcription of Lithuanian Words (in Lithuanian).
 
Killer, M., Stüker, S., Schultz, T. (2003). Grapheme based speech recognition. In: Proceedings of Interspeech-2003, pp. 3141–3144.
 
Ko, T., Peddinti, V., Povey, D., Khudanpur, S. (2015). Audio augmentation for speech recognition. In: Proceedings of Interspeech-2015, pp. 3586–3589.
 
Laurinčiukaitė, S., Šilingas, D., Skripkauskas, M., Telksnys, L. (2006). Lithuanian continuous speech corpus LRN 0.1: design and potential applications. Information Technology and Control, 35(4), 431–440.
 
Laurinčiukaitė, S., Lipeika, A. (2007). Framework for choosing a set of syllables and phonemes for Lithuanian speech recognition. Informatica, 18(3), 395–406.
 
Laurinčiukaitė, S. (2008). Acoustic Modeling of Lithuanian Speech Recogniton. PhD Thesis (in Lithuanian).
 
Laurinčiukaitė, S., Telksnys, L., Kasparaitis, P., Kliukienė, R., Paukštytė, V. (2018). Lithuanian speech corpus Liepa for development of human-computer interfaces working in voice recognition and synthesis mode. Informatica, 29(3), 487–498.
 
Lileikytė, R., Gorin, A., Lamel, L., Gauvain, J., Fraga-Silva, T. (2016). Lithuanian broadcast speech transcription using semi-supervised acoustic model training. Proceedings of Computer Science, 81, 107–113.
 
Lileikytė, R., Lamel, L., Gauvain, J., Gorin, A. (2018). Conversational telephone speech recognition for Lithuanian. Computer Speech and Language, 49, 71–92.
 
Norkevičius, G., Raškinis, G., Kazlauskienė, A. (2005). Knowledge-based grapheme-to-phoneme conversion of Lithuanian words. In: SPECOM 2005, 10th International Conference Speech and Computer, pp. 235–238.
 
Pakerys, A. (2003). Lietuvių bendrinės kalbos fonetika [Phonetics of Standard Lithuanian]. Vilnius, Enciklopedija, 35, pp. 83–84.
 
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, P., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K. (2011a). The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (ASRU).
 
Povey, D., Burget, L., Agarwal, M., Akyazi, P., Kai, F., Ghoshal, A., Glembek, O., Goel, N., Karafiát, M., Rastrow, A., Rose, R.C., Schwarz, P., Thomas, S. (2011b). The subspace Gaussian mixture model – a structured model for speech recognition. Computer Speech and Language, 25(2), 404–439.
 
Povey, D., Peddinti, V., Galvez, D., Ghahremani, P., Manohar, V., Na, X., Wang, Y., Khudanpur, S. (2016). Purely sequence-trained neural networks for asr based on lattice-free MMI. In: Proceedings of Interspeech-2016, pp. 2751–2755.
 
Raškinis, G., Raškinienė, D. (2003). Parameter investigation and optimization for the Lithuanian HMM-based speech recognition system. In: Proceedings of the Conference “Information Technologies 2003”, pp. 41–48.
 
Raškinis, A., Raškinis, G., Kazlauskienė, A. (2003). Speech assessment methods phonetic alphabet (SAMPA) for encoding transcriptions of Lithuanian speech corpora. Information Technology and Control, 29(4), 52–55.
 
Ratkevicius, K., Paskauskaite, G., Bartisiute, G. (2018). Advanced recognition of Lithuanian digit names using hybrid approach. Elektronika ir Elektrotechnika, 24(2), 70–73.
 
Rudžionis, V., Ratkevičius, K., Rudžionis, A., Raškinis, G., Maskeliūnas, R. (2013). Recognition of voice commands using hybrid approach. In: Information and Software Technologies. ICIST 2013. Communications in Computer and Information Science, Vol. 403, pp. 249–260.
 
Salimbajevs, A., Kapočiūtė-Dzikienė, J. (2018). General-purpose Lithuanian automatic speech recognition system. In: Human Language Technologies – The Baltic Perspective, pp. 150–157.
 
Saon, G., Soltau, H., Nahamoo, D., Picheny, M. (2013). Speaker adaptation of neural network acoustic models using i-vectors. In: Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop, pp. 55–59.
 
Skripkauskas, M., Telksnys, L. (2006). Automatic transcription of Lithuanian text using dictionary. Informatica, 17(4), 587–600.
 
Šilingas, D. (2005). Choosing Acoustic Modeling Units for Lithuanian Continuous Speech Recogniton Based on Hidden Markov Models. PhD Thesis (in Lithuanian).
 
Zhang, X., Trmal, J., Povey, D., Khudanpur, S. (2014). Improving deep neural network acoustic models using generalized maxout networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 215–219.

Biographies

Raškinis Gailius
gailius.raskinis@vdu.lt

G. Raškinis (born in 1972) received a PhD in the field of informatics in 2000. Presently, he works at the Center of Computational Linguistics and teaches at the Faculty of Informatics of Vytautas Magnus University. His research interests include application of machine learning techniques to music recognition, speech recognition and natural language processing.

Paškauskaitė Gintarė
gintare.paskauskaite@ktu.lt

G. Paškauskaitė (born in 1990) received BS and MS degrees from the Department of Automatics, Kaunas University of Technology. She is a PhD student in the Kaunas University of Technology from 2016. Her main research interests include automatic Lithuanian speech recognition.

Saudargienė Aušra
ausra.saudargiene@lsmuni.lt

A. Saudargienė (born in 1970) received a PhD degree in the field of informatics from the Institute of Mathematics and Informatics, Vilnius. Currently she works at the Department of Applied Informatics, Vytautas Magnus University, and Neuroscience Institute, Lithuanian University of Health Sciences. Her research field is learning and memory in artificial and biological neural systems.

Kazlauskienė Asta
asta.kazlauskiene@vdu.lt

A. Kazlauskienė (born in 1964) received a doctor’s degree in the field of humanities (philology) in 1998. She teaches at the Department of Lithuanian Studies of Vytautas Magnus University. Her research interests are phonology, phonotactics, accentuation, rhythm, applied linguistics.

Vaičiūnas Airenas
airenass@gmail.com

A. Vaičiūnas (born in 1976) received a PhD in the field of informatics in 2006. Since then he has worked as software engineer and researcher in various computational linguistics projects. His research interests are human language technologies.


Full article Related articles Cited by PDF XML
Full article Related articles Cited by PDF XML

Copyright
© 2019 Vilnius University
by logo by logo
Open access article under the CC BY license.

Keywords
speech recognition grapheme phoneme G2P conversion HMM SGMM TDNN BLSTM Lithuanian

Funding
Part of this research has been supported by a grant from the Research Council of Lithuania under the National Lithuanian studies development programme for 2009–2015 through the project “A unified approach to Lithuanian prosody: the intonation, rhythm, and stress” (reg. no. LIT-5-4).

Metrics
since January 2020
1267

Article info
views

939

Full article
views

653

PDF
downloads

242

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

INFORMATICA

  • Online ISSN: 1822-8844
  • Print ISSN: 0868-4952
  • Copyright © 2023 Vilnius University

About

  • About journal

For contributors

  • OA Policy
  • Submit your article
  • Instructions for Referees
    •  

    •  

Contact us

  • Institute of Data Science and Digital Technologies
  • Vilnius University

    Akademijos St. 4

    08412 Vilnius, Lithuania

    Phone: (+370 5) 2109 338

    E-mail: informatica@mii.vu.lt

    https://informatica.vu.lt/journal/INFORMATICA
Powered by PubliMill  •  Privacy policy