Informatica logo


Login Register

  1. Home
  2. Issues
  3. Volume 33, Issue 2 (2022)
  4. The Impact of Churn Labelling Rules on C ...

Informatica

Information Submit your article For Referees Help ATTENTION!
  • Article info
  • Full article
  • Related articles
  • Cited by
  • More
    Article info Full article Related articles Cited by

The Impact of Churn Labelling Rules on Churn Prediction in Telecommunications
Volume 33, Issue 2 (2022), pp. 247–277
Andrej Bugajev   Rima Kriauzienė   Olegas Vasilecas   Viktoras Chadyšas  

Authors

 
Placeholder
https://doi.org/10.15388/22-INFOR484
Pub. online: 17 May 2022      Type: Research Article      Open accessOpen Access

Received
1 February 2022
Accepted
1 May 2022
Published
17 May 2022

Abstract

One of the biggest difficulties in telecommunication industry is to retain the customers and prevent the churn. In this article, we overview the most recent researches related to churn detection for telecommunication companies. The selected machine learning methods are applied to the publicly available datasets, partially reproducing the results of other authors and then it is applied to the private Moremins company dataset. Next, we extend the analysis to cover the exiting research gaps: the differences of churn definitions are analysed, it is shown that the accuracy in other researches is better due to some false assumptions, i.e. labelling rules derived from definition lead to very good classification accuracy, however, it does not imply the usefulness for such churn detection in the context of further customer retention. The main outcome of the research is the detailed analysis of the impact of the differences in churn definitions to a final result, it was shown that the impact of labelling rules derived from definitions can be large. The data in this study consist of call detail records (CDRs) and other user aggregated daily data, 11000 user entries over 275 days of data was analysed. 6 different classification methods were applied, all of them giving similar results, one of the best results was achieved using Gradient Boosting Classifier with accuracy rate 0.832, F-measure 0.646, recall 0.769.

References

 
Adhikary, D.D., Gupta, D. (2020). Applying over 100 classifiers for churn prediction in telecom companies. Multimedia Tools and Applications, 1–22.
 
Adwan, O., Faris, H., Jaradat, K., Harfoushi, O., Ghatasheh, N. (2014). Predicting customer churn in telecom industry using multilayer preceptron neural networks: modeling and analysis. Life Science Journal, 11(3), 75–81.
 
Ahmad, A.K., Jafar, A., Aljoumaa, K. (2019). Customer churn prediction in telecom using machine learning in big data platform. Journal of Big Data, 6(1), 1–24.
 
Ahn, J., Hwang, J., Kim, D., Choi, H., Kang, S. (2020). A survey on churn analysis in various business domains. IEEE Access, 8, 220816–220839.
 
Alboukaey, N., Joukhadar, A., Ghneim, N. (2020). Dynamic behavior based churn prediction in mobile telecom. Expert Systems with Applications, 162, 113779.
 
Amin, A., Anwar, S., Adnan, A., Nawaz, M., Howard, N., Qadir, J., Hawalah, A., Hussain, A. (2016). Comparing oversampling techniques to handle the class imbalance problem: a customer churn prediction case study. IEEE Access, 4, 7940–7957. https://doi.org/10.1109/ACCESS.2016.2619719.
 
Azeem, M., Usman, M., Fong, A.C.M. (2017). A churn prediction model for prepaid customers in telecom using fuzzy classifiers. Telecommunication Systems, 66(4), 603–614.
 
Barrett, J. (2003). US Mobile Market Intelligence. Parks Associates, Dallas, TX.
 
Bose, I., Chen, X. (2009). Hybrid models using unsupervised clustering for prediction of customer churn. Journal of Organizational Computing and Electronic Commerce, 19(2), 133–151.
 
Breiman, L. (2001). Random Forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324.
 
Chawla, N.V. (2009). Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (Eds.), Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA, pp. 875–886.
 
Chen, T., Guestrin, C. (2016). XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. Association for Computing Machinery, New York, NY, USA, pp. 785–794. 9781450342322. https://doi.org/10.1145/2939672.2939785.
 
Cortes, C., Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297. https://doi.org/10.1007/BF00994018.
 
Coussement, K., Lessmann, S., Verstraeten, G. (2017). A comparative analysis of data preparation algorithms for customer churn prediction: a case study in the telecommunication industry. Decision Support Systems, 95, 27–36. https://doi.org/10.1016/j.dss.2016.11.007.
 
De Caigny, A., Coussement, K., De Bock, K.W. (2018). A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research, 269(2), 760–772. https://doi.org/10.1016/j.ejor.2018.02.009.
 
Fix, E., Hodges, J.L. (1951). Nonparametric Discrimination: Consistency Properties. USAF School of Aviation Medicine, Report.
 
Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451.
 
Friedman, J.H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38(4), 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2.
 
Gupta, S., Hanssens, D., Hardie, B., Kahn, W., Kumar, V., Lin, N., Ravishanker, N., Sriram, S. (2006). Modeling Customer Lifetime Value. Journal of Service Research, 9(2), 139–155. https://doi.org/10.1177/1094670506293810.
 
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.-Y. (2017). LightGBM: a highly efficient gradient boosting decision tree. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (Eds.), Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf.
 
Keramati, A., Ardabili, S.M.S. (2011). Churn analysis for an Iranian mobile operator. Telecommunications Policy, 35(4), 344–356. https://doi.org/10.1016/j.telpol.2011.02.009.
 
Keramati, A., Jafari-Marandi, R., Aliannejadi, M., Ahmadian, I., Mozaffari, M., Abbasi, U. (2014). Improved churn prediction in telecommunication industry using data mining techniques. Applied Soft Computing, 24, 994–1012. https://doi.org/10.1016/j.asoc.2014.08.041.
 
Khajvand, M., Zolfaghar, K., Ashoori, S., Alizadeh, S. (2011). Estimating customer lifetime value based on RFM analysis of customer purchase behavior: case study. Procedia Computer Science, 3, 57–63. World Conference on Information Technology. https://doi.org/10.1016/j.procs.2010.12.011.
 
Lu, J. (2002). Predicting Customer Churn in the Telecommunications Industry —- An Application of Survival Analysis Modeling Using SAS. In: Proceedings of the Twenty-Seventh Annual SAS Users Group International Conference. Retrieved from https://support.sas.com/resources/papers/proceedings/proceedings/sugi27/p114-27.pdf.
 
Moremins (2021). https://www.moremins.com/en.
 
Pamina, J., Raja, B., SathyaBama, S., S, Soundarya, Sruthi, M.S., S, Kiruthika, V J, Aiswaryadevi G, Priyanka (2019). An effective classifier for predicting churn in telecommunication. Journal of Advanced Research in Dynamical and Control Systems, 11.
 
Routh, P., Roy, A., Meyer, J. (2021). Estimating customer churn under competing risks. Journal of the Operational Research Society, 72(5), 1138–1155.
 
Singh, D., Jatana, V., Kanchana, M. (2021). Survey Paper on Churn Prediction on Telecom. Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3849664.
 
Śniegula, A., Poniszewska-Marańda, A., Popović, M. (2019). Study of machine learning methods for customer churn prediction in telecommunication company. In: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services, pp. 640–644.
 
Telco custumer churn (2020). https://view.officeapps.live.com/op/view.aspx?src=https%3A%2F%2Fpublic.dhe.ibm.com%2Fsoftware%2Fdata%2Fsw-library%2Fcognos%2Fmobile%2FC11%2Fdata%2FTelco_customer_churn.xlsx&wdOrigin=BROWSELINK.
 
Telco data (2021). https://bigml.com/user/francisco/gallery/dataset/5163ad540c0b5e5b22000383.
 
Ullah, I., Raza, B., Malik, A.K., Imran, M., Islam, S.U., Kim, S.W. (2019). A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector. IEEE Access, 7, 60134–60149.
 
Vafeiadis, T., Diamantaras, K.I., Sarigiannidis, G., Chatzisavvas, K.C. (2015). A comparison of machine learning techniques for customer churn prediction. Simulation Modelling Practice and Theory, 55, 1–9. https://doi.org/10.1016/j.simpat.2015.03.003.
 
Xu, T., Ma, Y., Kim, K. (2021a). Telecom churn prediction system based on ensemble learning using feature grouping. Applied Sciences, 11(11), 1–12. https://doi.org/10.3390/app11114742.
 
Xu, T., Ma, Y., Kim, K. (2021b). Telecom churn prediction system based on ensemble learning using feature grouping. Applied Sciences, 11(11), 4742.
 
Zhang, J., Fu, J., Zhang, C., Ke, X., Hu, Z. (2016). Not too late to identify potential churners: early churn prediction in telecommunication industry. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT ’16. Association for Computing Machinery, New York, NY, USA, pp. 194–199. 9781450346177. https://doi.org/10.1145/3006299.3006324.

Biographies

Bugajev Andrej
zvex77777@gmail.com

A. Bugajev in 2015 has defended the dissertation on a topic “The investigation of efficiency of physical phenomena modelling using differential equations on distributed systems”. In his dissertation the computational efficiency problems were solved – the efficient parallel algorithms were created and examined, the stability of the algorithms was investigated. The research interest covers theory of algorithms, parallel algorithms, machine learning. He has published 12 papers in journals with Impact Factors indexed in the “Web of Science” database, 7 of them during the last 5 years.

Kriauzienė Rima
rima.kriauziene@gmail.com

R. Kriauzienė in 2020 has defended dissertation on the topic “Parallel algorithms for non-classical problems with big computational costs”. The dissertation is devoted to parallel algorithms that help to solve the problems of memory resource and computation time. Effective parallel algorithms were developed and analysed. At the Young Scientists’ Conference of the Lithuanian Academy of Sciences “Interdisciplinary Research in Physical and Technological Sciences: 7th Conference of Young Scientists”, her work was rated high, she was included in the list of laureates and awarded the INFOBALT second degree award. She has published 5 articles with Impact Factors indexed in the “Web of Science” database, 4 of them during the last five years.

Vasilecas Olegas
ovasilecas@gmail.com

O. Vasilecas is a senior researcher at the Institute of Applied Informatics of Vilnius Gediminas Technical University (Vilnius Tech). He is the author of more than 329 research papers and 5 books in the field of information systems development. His research interests: knowledge, including business rule and ontology, based information systems development, and Data Science. He delivered lectures in 7 European universities including London, Barcelona, Athens and Ljubljana. O. Vasilecas carried out an apprenticeship in Germany, Holland, China, and last time in Latvia and Slovenia universities. He supervised 13 successfully defended doctoral theses and now is supervising 2 doctoral students. He was the leader of many international and local research projects. Last time he led the “Business Rules Solutions for Information Systems Development (VeTIS)” project carried out under the High Technology Development Program.

Chadyšas Viktoras
viktoras.chadysas@vilniustech.lt

V. Chadyšas involves its research areas with a comprehensive analysis of data and the application of different statistical methods in various areas of life. In 2010, he defended the doctoral thesis on the topic “Statistical estimators of the finite population parameters in the case of sample rotation”. During his scientific career, Viktoras Chadyšas has prepared and published over 20 scientific articles in mathematics journals. The results of the research were presented at more than 20 scientific conferences held in different Lithuanian and foreign cities. In 2005, Viktoras Chadyšas received the Lithuanian Academy of Science Prize in mathematics, physics and chemistry section for the work “Viktoro Chadyšo 2005 publications”. From 2006 Viktoras Chadyšas is a member of the Society of Lithuanian Mathematicians.


Full article Related articles Cited by PDF XML
Full article Related articles Cited by PDF XML

Copyright
© 2022 Vilnius University
by logo by logo
Open access article under the CC BY license.

Keywords
churn prediction churn definition telecom machine learning binary classification customer classification imbalanced learning RFM

Metrics
since January 2020
1372

Article info
views

655

Full article
views

710

PDF
downloads

152

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

INFORMATICA

  • Online ISSN: 1822-8844
  • Print ISSN: 0868-4952
  • Copyright © 2023 Vilnius University

About

  • About journal

For contributors

  • OA Policy
  • Submit your article
  • Instructions for Referees
    •  

    •  

Contact us

  • Institute of Data Science and Digital Technologies
  • Vilnius University

    Akademijos St. 4

    08412 Vilnius, Lithuania

    Phone: (+370 5) 2109 338

    E-mail: informatica@mii.vu.lt

    https://informatica.vu.lt/journal/INFORMATICA
Powered by PubliMill  •  Privacy policy