Informatica logo


Login Register

  1. Home
  2. Issues
  3. Volume 29, Issue 3 (2018)
  4. Lithuanian Speech Corpus Liepa for Devel ...

Informatica

Information Submit your article For Referees Help ATTENTION!
  • Article info
  • Full article
  • Related articles
  • Cited by
  • More
    Article info Full article Related articles Cited by

Lithuanian Speech Corpus Liepa for Development of Human-Computer Interfaces Working in Voice Recognition and Synthesis Mode
Volume 29, Issue 3 (2018), pp. 487–498
Sigita Laurinčiukaitė   Laimutis Telksnys   Pijus Kasparaitis   Regina Kliukienė   Vilma Paukštytė  

Authors

 
Placeholder
https://doi.org/10.15388/Informatica.2018.177
Pub. online: 1 January 2018      Type: Research Article      Open accessOpen Access

Received
1 May 2017
Accepted
1 March 2018
Published
1 January 2018

Abstract

The problem of speech corpus for design of human-computer interfaces working in voice recognition and synthesis mode is investigated. Specific requirements of speech corpus for speech recognizers and synthesizers were accented. It has been discussed that in order to develop above mentioned speech corpus, it has to consist of two parts. One part of speech corpus should be presented for the needs of Lithuanian text-to-speech synthesizers, another part of speech corpus – for the needs of Lithuanian speech recognition engines. It has been determined that the part of speech corpus designed for speech recognition engines has to ensure the availability to present language specificity by the use of different sets of phonemes. According to the research results, the speech corpus Liepa, which consists of two parts, was developed. This speech corpus opens possibilities for cost-effective and flexible development of human-computer interfaces working in voice recognition and synthesis mode.

References

 
Amdal, I., Strand, O., Almberg, M.J., Svendsen, T. (2008). RUNDKAST: an annotated Norwegian broadcast news speech corpus. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC’08, Morocco, pp. 1907–1913.
 
Axelrod, A., Resnik, P., He, X., Ostendorf, M. (2015). Data selection with fewer words. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, pp. 58–65.
 
Czyzewski, A., Kostek, B., Bratoszewski, P., Kotus, J., Szykulski, M. (2017). An audio-visual corpus for multimodal automatic speech recognition. Journal of Intelligent Information Systems, 49(2), 167–192.
 
Esteve, Y., Bazillon, T., Antoine, J.-Y., Bechet, F., Rarinas, J. (2010). The EPAC corpus: manual and automatic annotations of conversational speech in French broadcast news. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC’10, Malta, pp. 1686–1689.
 
Giraudel, A., Carre, M., Mapelli, V., Kahn, J., Galibert, O., Quintard, L. (2012). The REPERE Corpus: a multimodal corpus for person recognition. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, pp. 1102–1107.
 
Glavatskih, I., Platonova, T., Rogozhina, V., Shirokova, A., Smolina, A., Kotov, M., Ovsyannikova, A., Repalov, S., Zulkarneev, M. (2015). The multi-level approach to speech corpora annotation for automatic speech recognition. In: Proceedings of 17th International Conference of Speech and Computer, SPECOM 2015, Athens, pp. 438–445.
 
Greibus, M., Ringelienė, Ž., Telksnys, L. (2017). The phoneme set influence for Lithuanian speech commands recognition accuracy. In: Proceedings of Open Conference of Electrical, Electronic and Information Sciences, eStream, Vilnius, pp. 1–4.
 
Grishina, E. (2010). Multimodal Russian corpus (MURCO): first steps. In: Proceedings of the 7th Language Resources and Evolution Conference, LREC 2010, pp. 2953–2960.
 
Hateva, N., Mitankin, P., Mihov, S. (2016). BulPhonC: Bulgarian speech corpus for development of ASR technology. In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, pp. 771–774.
 
toolkit, H.T.K. (2017). http://htk.eng.cam.ac.uk/. Last view online.
 
Johannessen, J.B., Hagen, K., Priestley, J., Nygaard, L. (2007). An advanced speech corpus for Norwegian. In: Proceedings of the 16th Nordic Conference of Computational Linguistics, NODALIDA–2007, pp. 29–36.
 
Kamandulytė-Merfeldienė, L. (2017). Grammatically coded corpus of spoken Lithuanian: methodology and development. Engineering and Technology International Journal of Cognitive and Language Sciences, 11(4), 853–857.
 
Kasparaitis, P. (1999). Transcribing of the Lithuanian text using formal rules. Informatica, 10(4), 367–376.
 
Kasparaitis, P. (2000). Automatic stressing of the Lithuanian text on the basis of a dictionary. Informatica, 11(1), 19–40.
 
Kasparaitis, P. (2005). Diphone databases for Lithuanian text-to-speech synthesis. Informatica, 16(2), 193–202.
 
Kazlauskienė, A., Raškinis, G. (2013). Principles of development of the intonational annotated spoken corpus. Žmogus ir žodis: didaktinė lingvistika, 15(1), 101–110 (in Lithuanian).
 
Language Resources in Icelandic: Parliament Speech Corpus (2018). http://www.malfong.is/index.php?pg=althingi&lang=en. Last view online.
 
Laurinčiukaitė, S., Filipovič, M., Telksnys, L. (2009). Lithuanian continuous speech corpus LRN 1: an improvement. Information Technology And Control, 38(3), 203–207.
 
Lileikytė, R., Gorin, A., Lamel, L., Gauvain, J.-L., Fraga-Silva, T. (2016). Lithuanian broadcast speech transcription using semi-supervised acoustic model training. In: Proceedings of 5th Workshop on Spoken Language Technologies for Under-Resourced Languages, SLTU-2016, Yogyakarta, pp. 107–113.
 
Mansikkaniemi, A., Smit, P., Kurimo, M. (2017). Automatic construction of the finnish parliament speech corpus. In: Proceedings of INTERSPEECH 2017, Stockholm, pp. 3762–3766.
 
Martins, C., Mascarenhas, M.I., Meinedo, H., Neto, J.P., Oliveira, L., Ribeiro, C., Trancoso, I., Viana, C. (1998). Spoken language corpora for speech recognition and synthesis in European Portuguese. In: Proceedings of 10th Portugese Conference on Pattern Recognition, RECPAD’98, Lisboa.
 
Meta-Net (2018). White Paper Series: Press Release. http://www.meta-net.eu/whitepapers/press-release. Last view online.
 
Patil, H.A., Basu, T.K. (2009). Development of speech corpora for speaker recognition research and evaluation in Indian languages. International Journal of Speech Technology, 11(1), 17–32.
 
Petursson, M., Klüpfel, S., Gudnason, J. (2016). Eyra – speech data acquisition system for many languages. In: Proceedings of 5th Workshop on Spoken Language Technologies for Under-Resourced Languages, SLTU-2016, Yogyakarta, pp. 53–60.
 
Pinnis, M., Auzina, I., Goba, K. (2014). Designing the Latvian speech recognition corpus. In: Proceedings of the 9th Edition of the Language Resources and Evaluation Conference, LREC’14, Reykjavik.
 
Rudžionis, V., Raškinis, G., Ratkevičius, K., Rudžionis, A., Bartišiūtė, G. (2014). Medical – pharmaceutical information system with recognition of Lithuanian voice commands. In: Proceedings of 6th International Conference: Human Language Technologies – The Baltic Perspective HLT, Riga, pp. 40–45.
 
Samson, J.S., Besacier, L., Lecouteux, B., Tan, T.-P. (2014). Using closely-related language to build an ASR for a very under-resourced language: Iban. In: Proceedings of Co-Ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), Phuket, pp. 1–5.
 
Stan, A., Dinescu, F., Ṭiple, C., Meza, S., Orza, B., Chirila, M., Giurgiu, M. (2017). The SWARA speech corpus: a large parallel romanian read speech dataset. In: Proceedings of the 9th Conference on Speech Technology and Human-Computer Dialogue, Bucharest.
 
Takahashi, N., Naghibi, T., Pfister, B. (2016). Automatic pronunciation generation by utilizing a semi-supervised deep neural networks. In: Proceedings of the 17th Interspeech 2016. (submitted on 15 Jun 2016).
 
Vaičiūnas, A., Raškinis, G., Kazlauskienė, R. (2016). Corpus-based hidden Markov modelling of the fundamental frequency of Lithuanian. Informatica, 27(3), 673–688.
 
Vo, M.T., Waibel, A. (1993). Multimodal human-computer interaction. In: Proceedings of the International Symposium on Spoken Dialogue, ISSD’93.
 
Zgank, A., Rotovnik, T., Grasic, M., Kos, M., Vlaj, D., Kacic, Z. (2006). SloParl – Slovenian parliamentary speech and text corpus for large vocabulary continuous speech recognition. In: Proceedings of Interspeech 2006, pp. 197–200.
 
Zwitter, V.A., Zemljaric, M.J., Krek, S., Stabej, M., Erjavec, T. (2013). Spoken corpus Gos 1.0, Slovenian language resource repository CLARIN.SI. http://hdl.handle.net/11356/1040. Last view online: 2018.

Biographies

Laurinčiukaitė Sigita
sigita.lau@gmail.com

S. Laurinčiukaitė received her PhD degree from Vilnius Gediminas Technical University. From 2000 to 2008 she worked at the Institute of Mathematics and Informatics. Currently she is working with development of speech corpora. Her research field is HMM based methods for Lithuanian speech recognition, development of speech corpora.

Telksnys Laimutis
laimutis.telksnys@mii.vu.lt

L. Telksnys, professor, doctor habilitatis in informatics, doctor honoris causa of the Kaunas University of Technology, member of Lithuanian Academy of Sciences, senior research fellow of Recognition Processes Department at the Institute of Mathematics and Informatics, Vilnius University, Lithuania. He is the author of an original theory of detecting changes in random processes, investigator and developer of a computerized system for statistical analysis and recognition of random signals. His current research interests are in analysis and recognition of random processes, cardiovascular signals and speech processing.

Kasparaitis Pijus
pkasparaitis@yahoo.com

P. Kasparaitis graduated from Vilnius University (Faculty of Mathematics) in 1991. He became a PhD student at Vilnius University in 1991. In 2001 he defended his PhD thesis. Presently he is an associate professor at Vilnius University. Current research includes text-to-speech synthesis and other areas of computer linguistics.

Kliukienė Regina

R. Kliukienė received her PhD degree from Vilnius University. She worked at Vilnius University, the Faculty of Philology. She supervised the construction of the Lithuanian speech corpus Liepa.

Paukštytė Vilma
vilma.paukstyte@gmail.com

V. Paukštytė received her master’s degree in Lithuanian linguistics from the Vilnius University in 2012. She worked with the Lithuanian speech corpus Liepa.


Full article Related articles Cited by PDF XML
Full article Related articles Cited by PDF XML

Copyright
© 2018 Vilnius University
by logo by logo
Open access article under the CC BY license.

Keywords
speech corpus speech annotation speech synthesis speech recognition human-computer interfaces

Metrics
since January 2020
2753

Article info
views

869

Full article
views

627

PDF
downloads

259

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

INFORMATICA

  • Online ISSN: 1822-8844
  • Print ISSN: 0868-4952
  • Copyright © 2023 Vilnius University

About

  • About journal

For contributors

  • OA Policy
  • Submit your article
  • Instructions for Referees
    •  

    •  

Contact us

  • Institute of Data Science and Digital Technologies
  • Vilnius University

    Akademijos St. 4

    08412 Vilnius, Lithuania

    Phone: (+370 5) 2109 338

    E-mail: informatica@mii.vu.lt

    https://informatica.vu.lt/journal/INFORMATICA
Powered by PubliMill  •  Privacy policy