Informatica logo


Login Register

  1. Home
  2. To appear
  3. Transformer-Based Detection of Propagand ...

Informatica

Information Submit your article For Referees Help ATTENTION!
  • Article info
  • Full article
  • More
    Article info Full article

Transformer-Based Detection of Propaganda Techniques in a Low-Resource Language: A Case Study in Lithuanian
Ieva Rizgelienė   Paulius Zaranka   Gražina Korvel   Virginijus Marcinkevičius  

Authors

 
Placeholder
https://doi.org/10.15388/26-INFOR633
Pub. online: 12 June 2026      Type: Research Article      Open accessOpen Access

Received
1 April 2026
Accepted
1 June 2026
Published
12 June 2026

Abstract

Propaganda techniques are a key tool for creating misleading content, often disseminated in native languages to increase their impact. Therefore, it is increasingly important to develop detection models not only for high-resource languages but also for low-resource languages, which still face significant limitations in propaganda detection. This study presents the first approach to automated propaganda technique detection in Lithuanian using the HALT-PROP corpus. We adapt the standard framework to account for frequent overlap between techniques. Experiments with the Lithuanian transformer LT-MLKM-modernBERT show that BILOU tagging improves span identification, while sentence classification based on span-level information enhances technique detection for most techniques. The results also indicate that training separate binary classifiers is more effective than multi-label classification in this setting. Overall, the proposed approach outperforms GPT-5.3 on most techniques and provides a strong baseline for propaganda technique detection in Lithuanian.

References

 
Alam, F., Mubarak, H., Zaghouani, W., Da San Martino, G., Nakov, P. (2022). Overview of the WANLP 2022 shared task on propaganda detection in arabic. In: Bouamor, H., Al-Khalifa, H., Darwish, K., Rambow, O., Bougares, F., Abdelali, A., Tomeh, N., Khalifa, S., Zaghouani, W. (Eds.), Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, pp. 108–118. https://doi.org/10.18653/v1/2022.wanlp-1.11.
 
Barrón-Cedeño, A., Jaradat, I., Da San Martino, G., Nakov, P. (2019). Proppy: organizing the news based on their propagandistic content. Information Processing & Management, 56(5), 1849–1864. https://doi.org/10.1016/j.ipm.2019.03.005. https://www.sciencedirect.com/science/article/pii/S0306457318306058.
 
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747.
 
Da San Martino, G., Yu, S., Barrón-Cedeño, A., Petrov, R., Nakov, P. (2019). Fine-grained analysis of propaganda in news articles. In: Inui, K., Jiang, J., Ng, V., Wan, X. (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp. 5636–5646. https://doi.org/10.18653/v1/D19-1565.
 
Da San Martino, G., Barrón-Cedeño, A., Wachsmuth, H., Petrov, R., Nakov, P. (2020). SemEval-2020 Task 11: detection of propaganda techniques in news articles. In: Herbelot, A., Zhu, X., Palmer, A., Schneider, N., May, J., Shutova, E. (Eds.), Proceedings of the Fourteenth Workshop on Semantic Evaluation. International Committee for Computational Linguistics, Barcelona, pp. 1377–1414. https://doi.org/10.18653/v1/2020.semeval-1.186.
 
Dimitrov, D., Bin Ali, B., Shaar, S., Alam, F., Silvestri, F., Firooz, H., Nakov, P., Da San Martino, G. (2021). SemEval-2021 Task 6: detection of persuasion techniques in texts and images. In: Palmer, A., Schneider, N., Schluter, N., Emerson, G., Herbelot, A., Zhu, X. (Eds.), Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021). Association for Computational Linguistics, pp. 70–98. https://doi.org/10.18653/v1/2021.semeval-1.7.
 
Dimitrov, D., Alam, F., Hasanain, M., Hasnat, A., Silvestri, F., Nakov, P., Da San Martino, G. (2024). SemEval-2024 Task 4: multilingual detection of persuasion techniques in memes. In: Ojha, A.K., Doğruöz, A.S., Tayyar Madabushi, H., Da San Martino, G., Rosenthal, S., Rosá, A. (Eds.), Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024). Association for Computational Linguistics, Mexico City, Mexico, pp. 2009–2026. https://doi.org/10.18653/v1/2024.semeval-1.275.
 
Hasanain, M., Hasan, M.A., Ahmad, F., Suwaileh, R., Biswas, M.R., Zaghouani, W., Alam, F. (2024). ArAIEval shared task: propagandistic techniques detection in unimodal and multimodal arabic content. In: Habash, N., Bouamor, H., Eskander, R., Tomeh, N., Abu Farha, I., Abdelali, A., Touileb, S., Hamed, I., Onaizan, Y., Alhafni, B., Antoun, W., Khalifa, S., Haddad, H., Zitouni, I., AlKhamissi, B., Almatham, R., Mrini, K. (Eds.), Proceedings of the Second Arabic Natural Language Processing Conference. Association for Computational Linguistics, Bangkok, Thailand, pp. 456–466. https://doi.org/10.18653/v1/2024.arabicnlp-1.44.
 
Horák, A., Sabol, R., Herman, O., Baisa, V. (2024). Recognition of propaganda techniques in newspaper texts: fusion of content and style analysis. Expert Systems with Applications, 251, 124085. https://doi.org/10.1016/j.eswa.2024.124085. https://www.sciencedirect.com/science/article/pii/S0957417424009515.
 
Jose, J., Geeng, C., Morales, K.O., McCoy, D., Greenstadt, R. (2025). What’s in a label? Propaganda labels and user sharing behavior on social media platforms. Proceedings of the International AAAI Conference on Web and Social Media, 19(1), 918–934. https://doi.org/10.1609/icwsm.v19i1.35853.
 
Moral, P., Marco, G., Gonzalo, J., Carrillo-de-Albornoz, J., Gonzalo-Verdugo, I. (2023). Overview of DIPROMATS 2023: automatic detection and characterization of propaganda techniques in messages from diplomats and authorities of world powers. Procesamiento del Lenguaje Natural, 71, 397–407. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6569.
 
Moral, P., Fraile, J.M., Marco, G., Peñas, A., Gonzalo, J. (2024). Overview of DIPROMATS 2024: detection, characterization and tracking of propaganda in messages from diplomats and authorities of world powers. Procesamiento del Lenguaje Natural, 73, 347–358.
 
Perišić, A., Vanbelle, S., Petričević, R.B. (2025). Quantifying binary classifier algorithms similarity with a consensus agreement approach. Informatica, 36(3), 657–676. https://doi.org/10.15388/25-INFOR601.
 
Piskorski, J., Stefanovitch, N., Da San Martino, G., Nakov, P. (2023). SemEval-2023 Task 3: detecting the category, the framing, and the persuasion techniques in online news in a multi-lingual setup. In: Ojha, A.K., Doğruöz, A.S., Da San Martino, G., Tayyar Madabushi, H., Kumar, R., Sartori, E. (Eds.), Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023). Association for Computational Linguistics, Toronto, Canada, pp. 2343–2361. https://doi.org/10.18653/v1/2023.semeval-1.317.
 
Rashkin, H., Choi, E., Jang, J.Y., Volkova, S., Choi, Y. (2017). Truth of varying shades: analyzing language in fake news and political fact-checking. In: Palmer, M., Hwa, R., Riedel, S. (Eds.), Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, pp. 2931–2937. https://doi.org/10.18653/v1/D17-1317.
 
Ratinov, L., Roth, D. (2009). Design Challenges and Misconceptions in Named Entity Recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009). Association for Computational Linguistics, Boulder, Colorado, pp. 147–155. https://aclanthology.org/W09-1119/.
 
Rizgelienė, I., Zubaitienė, V., Maliukevičius, N., Marcinkevičius, V. (2025). HALT-PROP: Human-Annotated Lithuanian Textual Corpus for Propaganda Narratives and Techniques. Scientific Data, 13(1), 47. https://doi.org/10.1038/s41597-025-06367-w.
 
State Digital Solutions Agency (SDSA) (2025). LT-MLKM-modernBERT: Lithuanian ModernBERT Language Model. https://huggingface.co/VSSA-SDSA/LT-MLKM-modernBERT. Developed by Vytautas Magnus University (VMU), UAB Neurotechnology, UAB Tilde informacinės technologijos, MB Krilas.
 
Ulčar, M., Robnik-Šikonja, M. (2020). EMBEDDIA: LitLat BERT: Model Card. https://huggingface.co/EMBEDDIA/litlat-bert. XLM-RoBERTa-base configuration; 12 layers, 12 heads; vocabulary size 84,201.

Biographies

Rizgelienė Ieva
ieva.rizgeliene@mif.vu.lt

I. Rizgelienė is a PhD student at the Institute of Data Science and Digital Technologies, Vilnius University. Her primary research interests include propaganda detection and analysis, with an emphasis on low-resource languages.

Zaranka Paulius
paulius.zaranka@mif.vu.lt

P. Zaranka received his master’s degree in computer modelling from Vilnius University in 2025 and is currently a lecturer in NLP at Vilnius University. His primary research interests include large language models, natural language processing, and agent-based modelling.

Korvel Gražina
grazina.korvel@mif.vu.lt

P. Zaranka received his master’s degree in computer modelling from Vilnius University in 2025 and is currently a lecturer in NLP at Vilnius University. His primary research interests include large language models, natural language processing, and agent-based modelling.

Marcinkevičius Virginijus
virginijus.marcinkevicius@mif.vu.lt

P. Zaranka received his master’s degree in computer modelling from Vilnius University in 2025 and is currently a lecturer in NLP at Vilnius University. His primary research interests include large language models, natural language processing, and agent-based modelling.


Full article PDF XML
Full article PDF XML

Copyright
© 2026 Vilnius University
by logo by logo
Open access article under the CC BY license.

Keywords
propaganda technique detection low-resource language transformers

Funding
This research was supported by the Lithuanian Government Priority Research Program “Building Societal Resilience and Crisis Management in the Context of Con temporary Geopolitical Developments” (implemented through the Lithuania Research Council) under grant number S-VIS-23-8. Project title: “Propaganda and Disinformation Research: Machine Learning-Based Automatic Detection, Impact and Societal Resilience.”

Metrics
since January 2020
56

Article info
views

5

Full article
views

8

PDF
downloads

3

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

INFORMATICA

  • Online ISSN: 1822-8844
  • Print ISSN: 0868-4952
  • Copyright © 2023 Vilnius University

About

  • About journal

For contributors

  • OA Policy
  • Submit your article
  • Instructions for Referees
    •  

    •  

Contact us

  • Institute of Data Science and Digital Technologies
  • Vilnius University

    Akademijos St. 4

    08412 Vilnius, Lithuania

    Phone: (+370 5) 2109 338

    E-mail: informatica@mii.vu.lt

    https://informatica.vu.lt/journal/INFORMATICA
Powered by PubliMill  •  Privacy policy