1 Introduction
-
• RQ1: Which labelling rules are sufficient to achieve reasonably small differences in prediction errors comparing to a full churn definition?
-
• RQ2: Which classification methods perform best with different churn definitions?
-
• RQ3: Which classification methods suffer the most from the simplification of churn definition?
1.1 The Related Work
Table 1
Authors | Methods | Atributes |
Bose and Chen (2009) | Two-stage hybrid models: the first stage – unsupervised clustering technique (KM, KMD, FCM, and SOM), the second stage – C5.0 tree with boosting | Revenue contribution: mean monthly revenue (charge amount); percentage change in monthly revenue versus previous three months average; total revenue, billing adjusted total revenue over the life of the customer, etc. Service usage: percentage change in monthly minutes of use versus previous three months average, mean number of attempted voice calls placed, mean number of received voice calls, etc. |
Keramati and Ardabili (2011) | Binomial logistic regression | Number of failed calls, subscription length, customer complaints, amount of charge, length of all calls, second of use, number of calls, frequency of SMS, frequency of use, number of distinct calls, type of service, group age , status, churn. |
Keramati et al. (2014) | Decision Tree (DT), Artificial Neural Networks (ANN), K-Nearest Neighbours (KNN), Support Vector Machine’s (SVM) | Call failure (CF), number of complains (Co), subscription length (SL), charge amount (CA), seconds of use (SU), frequency of use (FU), frequency of SMS (FS), distinct calls number (DCN), age group (AG), type of service (TS), status (St), churn (Ch). |
Vafeiadis et al. (2015) | Back-Propagation algorithm (BPN) (case of the ANN classifier) , Support Vector Machine’s (SVM) and Decision Tree C5.0 (DT) with and without boosting, Naïve Bayes (NB), Logistic regression (LR) | Number of months active user, total charge of evening calls, area code, total minutes of night calls, international plan, total number of night calls, voice mail plan, total charge of night calls, number of voice-mail messages, total minutes of international calls and etc. |
Amin et al. (2016) | Oversampling techniques (SMOTE, ADASYN, MTDF, ICOTE, MWMOTE and TRkNN) and they compare the performance of four rules-generation algorithms (Exhaustive Algorithm, Genetic Algorithm, Covering Algorithm and RSES LEM2 Algorithm) | 4 data sets |
Azeem et al. (2017) | Fuzzy classifiers: FuzzyNN, VQNN, OWANN and FuzzyRoughNN | Days since last recharge, voice bucket revenue, active days since last call, sms bucket revenue, total revenue voice, revenue on net, active days since recharge, total revenue, sms charged outgoing count, GPRS bucket revenue, revenue sms, crbt revenue, charged off net, minute of use off-net, balance average daily, balance last recharge, off net outgoing minute of use, charged off net minute of use, free minute of use, free on net minute of use, free sms, revenue fix, active days recharge, recharge count, recharge value, last recharge value, act days minute of call, promo opt in, loan count, active days since loan, inactive days calls, inactive days sms, inactive days data. |
Coussement et al. (2017) | LOGIT-DPT, Bagged CART, Bayesian network, Decision Tree, Neural netwok, Naïve Bayes, Random Forest, Support Vector Machine’s, Stochastic gradient boosting | 156 categorical and 800 continuous variables |
De Caigny et al. (2018) | Decision tree (DT), Logistic model tree (LMT), Logistic regression (LR), Random forests (RF) | Mean number of call waiting calls, change in minutes of use, dummy if change in minutes of use is imputed, low credit rating, mean number of customer care calls, mean number of director assisted calls, number of days of the current equipment, mean number of inbound voice calls, models issued, mean monthly minutes of use, mean number of in and out off-peak voice call, mean number of outbound voice calls, handsets issued, mean total recurring charge, number of calls previously made to the retention team, mean monthly revenue, missing data on handset price, handset is web capable |
Ullah et al. (2019) | Random forest vs other machine learning techniques | 2 data sets |
Ahmad et al. (2019) | Decision Tree (DT), Random Forest (RF), Gradient Boosted Machine Tree (GBM) and Extreme Gradient Boosting (XGBOOST) | 10 000 variables |
Adhikary and Gupta (2020) | 100 classifiers | 57 attributes |
1.2 Some of Requirements for a Proper Churn Prediction
-
1. the temporal data is used when the features are derived from actual CDR and/or payment data keeping the information about behaviour dynamics. Data aggregation over the whole period removes the information about the temporal changes in customers’ behaviour leading to the loss of the discriminative ability of the classification methods. Note that the temporal data was not utilized in this research, the most of features lack the information about behaviour dynamics.
-
2. the dataset must be not synthetic and the labelling rules and data filtering should be defined, otherwise the usefulness for practical purposes is questionable. The big amount of publicly available datasets are synthetic, the sources of datasets are not properly described, thus, the practical application of results is questionable.
-
3. hyperparameter tuning is performed, since methods with default parameter values might perform far from perfect in many cases. The research ignores the hyperparameter tuning skipping the opportunity to get better results for every method, leading to incomplete comparison between different methods.
-
4. data balancing is performed as an important part of minority churn class detection. Churn labelling leads to imbalanced data, thus, proper balancing techniques are required to utilize the full power of machine learning methods.
Table 2
Article | Temporal data was used | Data is non-synthetic | Hyperparameters were tuned | Data balancing was performed |
Adhikary and Gupta (2020) | − | + | − | + |
Ahmad et al. (2019) | + | + | − | + |
Coussement et al. (2017) | nan | + | + | − |
De Caigny et al. (2018) | +/− | + | + | − |
Ullah et al. (2019) | +/− | + | − | − |
Amin et al. (2016) | +/− | +/− | − | − |
1.3 Churn and Partial Churn Definitions
Fig. 1
2 Dataset Overview
-
1. state – customer state;
-
2. total day minutes – total minutes of talk during the day, can be generalized into a value aggregated in other time interval, e.g. total month;
-
3. total day calls – number of calls in the day, can be generalized into a value aggregated in other time interval, e.g. month;
-
4. total day charge – call charges during the day;
-
5. total eve minutes – total minutes of talk last night;
-
6. total eve calls – number of calls last night;
-
7. total eve charge – charges for calls last night;
-
8. total night minutes – night total call minutes;
-
9. total night calls – total number of calls in the evening;
-
10. total night charge – total charge for calls at night.
Table 3
Attribute | Values or their range | Data type | Description |
X1 | 0-26751 | Numerical | The sum of minutes from all calls through whole period |
X2 | 1-7032 | Numerical | The amount of calls through whole period |
X3 | 0-1475 | Numerical | The sum of costs of customers payments through whole period |
X4 | 0-255 | Numerical | The amount of payments through whole period |
X5 | 0-90 | Numerical | The average of minutes from all calls during the day |
X6 | 1-104 | Numerical | Activity provided by company |
X7 | 0-73 | Numerical | Usefulness provided by company |
X8 | 0-47 | Numerical | Involvement provided by company |
X9 | 0-266 | Numerical | The maximum pause in days of customer activity |
X10 | $0,1,2,3,4$ | Categorical | Customers classes provided by company |
X11 | 0-275 | Numerical | Duration of activities in days |
X12, X13, …, X18 | – | Numerical | The amounts of calls of different types (7 different) |
X19, X20, …, X66 | – | Numerical | RFM features |
X67, X68, …, X426 | – | Numerical | Daily parameters for the last 90 days |
Table 4
Dataset | Instances | Number of attributes | Attributes |
Dataset 1 Amin et al. (2016); Ullah et al. (2019) (BigML) | 3333 | 21 | State; account length; area code; phone number; international plan; voice mail plan; number vmail messages; total day minutes; total day calls; total day charge; total eve minutes; total eve calls; total eve charge; total night minutes; total night calls; total night charge; total intl minutes; total intl calls; total intl charge; customer service calls; churn. |
Dataset 2 IBM (2020) | 7043 | 33 | LoyaltyID; Customer ID; Senior Citizen; Partner Dependents; Tenure; Phone Service; Multiple Lines; Internet Service; Online Security; Online Backup; Device Protection; Tech Support; Streaming TV; Streaming Movies; Contract; Paperless Billing; Payment Method; Monthly Charges; Total Charges; Churn |
Moremins dataset | 11100 | 426 |
3 Methods
3.1 Feature Extraction
(1)
\[ \begin{aligned}{}& R({t_{1}},{t_{2}})=\left\{\begin{array}{l@{\hskip4.0pt}l}{\textstyle\textstyle\sum _{ts+1}^{t2}}1,\hspace{1em}& \text{if}\hspace{2.5pt}ts<{t_{2}},\\ {} 0,\hspace{1em}& \text{otherwise},\end{array}\right.\\ {} & ts=\max \big\{i:{f_{i}}>0,{f_{i}}\in \{{f_{{t_{1}}}},{f_{{t_{1}}+1}},\dots ,{f_{{t_{2}}}}\}\big\},\end{aligned}\]-
1. RFM features ${R_{1}},{F_{1}},{M_{1}}$ are calculated for $t\in [1,90]$;
-
2. labels ${C_{1}}$ are derived from data for $t\in [91,180]$;
-
3. RFM features are added to data, the set is split into train and test parts;
-
4. using ${R_{1}},{F_{1}},{M_{1}}$ the standard 5-folds cross validation technique is applied with a chosen classification method to a training set part, hyperparameter tuning is performed;
-
5. the final performance estimation is performed by applying the model on the test part of data labels ${C_{1}}$.
-
1. monthly revenue;
-
2. monthly minutes;
-
3. state – customer state;
-
4. payment method;
-
5. monthly change;
-
6. percentage change in monthly minutes of use vs. previous three months average;
-
7. mean monthly revenue over the data collection period;
-
8. mean number of monthly minutes of use;
-
9. mean monthly revenue.
3.2 Classification Algorithm
-
1. RFM and other feature extraction,
-
2. data labelling according to different churn labelling rules,
-
3. unsupervised method application: feature vectors’ normalization by standard scaler (division by standard deviation), application of PCA (Principal component analysis),
-
4. the construction of classification method based on these steps: random oversampler (churner entries duplicating), the selected method,
-
5. execution of Algorithm 1 passing to it data and a method,
-
6. saving the metrics.
-
• method drop() refers to removing the rows (entries);
-
• train_test_split randomly splits the data into test and training/validation samples with the given proportion (0.1 to the test part);
-
• GridSearchCV iterates through all possible combinations of the given ranges of parameters;
-
• resample refers to the random sampling from the given set of entries at the given proportion, i.e. it leaves 0.1 of the initial rows to sustain the constant proportion of removed and total entries between the initial and test data.
3.3 Classification Methods
-
• XGBClassifier (XGBoost) was proposed by Chen and Guestrin (2016). This model grows trees level-wise. It was developed as a method for computation speed and performance. It is state-of-the-art in many articles.
-
• LGBMClassifier (LGBM) – a gradient boosting model (Ke et al., 2017). This model grows trees leaf-wise. It chooses the leaf which is predicted to give the largest improvement for the loss function.
-
• RandomForestClassifier (RF) was proposed by Breiman (2001). This method creates a forest of random trees. L. Breiman minimized the generalization error for forests, because the number of trees in the forest increases.
-
• KNeighborsClassifier (KNN) was first developed by Fix and Hodges (1951). The idea of this method is to assign classes to new data (test data) on the basis of data already classified (learning data).
-
• SVM – supervised classification method for classification in two groups. SVM was proposed by Cortes and Vapnik (1995). The training data are divided in the high dimension feature space so that the distance between the classes is the greatest. New data (test data) is displayed in the same space. The assignment of test data is based on which side of the gap it is displayed.
3.4 Performance Metrics
4 Experimental Results
4.1 Data Preparation
Table 5
Label | ${T_{c}}$ |
Churner | 90 |
Churner1 | 60 |
Churner2 | 50 |
Churner3 | 40 |
Churner4 | 30 |
Churner5 | 15 |
-
• Churner – the churner according to the standard 90 day absence definition,
-
• Churner4 – the label according to a partial churner definition which is often used as an alternative due to many reasons and assumptions,
-
• Churner1, Churner2, Churner3, Churner4 – the labels used to describe the compromise between the standard (full) and partial churner definitions,
-
• Churner5 – the label used to investigate the extreme case with a very short churner detection window of 15 days.
4.2 Hyperparameter Values for Different Methods
Table 6
Method | Hyperparameters | Ranges |
GBC | min_samples_leaf: | $[3,5,7]$ |
n_estimators: | $[256,512]$ | |
max_depth: | $[2,3,5,7]$ | |
n_iter_no_change: | $[5]$ | |
tol: | $[0.0001]$ | |
XGBoost | booster: | [gbtree] |
nthread: | [1] | |
use_label_encoder: | [False] | |
max_bin: | $[128,256,512]$ | |
max_depth: | $[2,3,5,7]$ | |
subsample: | $[0.5,1]$ | |
eval_metric: | [‘logloss’] | |
tree_method: | [hist] | |
LGBM | boosting_type: | [‘gbdt’] |
n_jobs: | [1] | |
n_estimators: | $[128,256,512]$ | |
max_depth: | $[5,10,15,20]$ | |
learning_rate: | $[0.1]$ | |
subsample: | [1] | |
learning_rate: | $[0.05,0.1,0.15]$ | |
subsample: | $[0.5,1]$ | |
RF | n_estimators: | $[128,256,512]$ |
min_samples_leaf: | $[3,5,7]$ | |
max_depth: | $[15,30]$ | |
KNeighborsClassifier | n_neighbors: | $[10,15,20,25]$ |
algorithm: | [‘auto’, ‘ball_tree’] | |
leaf_size: | $[3,5,10,15]$ | |
svm.SVC() | tol: | $[1e-04]$ |
kernel: | [‘poly’, ‘rbf’, ‘sigmoid’] |
4.3 Method Comparison for Different Datasets
Table 7
Method | Datasets | Accuracy | F-measure | Recall | Precision | Specificity | Balanced accuracy |
GBC | BigML | 0.925 | 0.762 | 0.741 | 0.784 | 0.961 | 0.851 |
IBM | 0.746 | 0.637 | 0.818 | 0.522 | 0.719 | 0.769 | |
XGBoost | BigML | 0.922 | 0.745 | 0.704 | 0.792 | 0.964 | 0.834 |
IBM | 0.757 | 0.642 | 0.797 | 0.537 | 0.743 | 0.77 | |
LGBM | BigML | 0.931 | 0.768 | 0.704 | 0.844 | 0.975 | 0.839 |
IBM | 0.769 | 0.637 | 0.745 | 0.556 | 0.778 | 0.761 | |
RF | BigML | 0.91 | 0.737 | 0.778 | 0.7 | 0.936 | 0.857 |
IBM | 0.787 | 0.639 | 0.693 | 0.594 | 0.823 | 0.758 | |
SVMds | BigML | 0.91 | 0.732 | 0.759 | 0.707 | 0.939 | 0.849 |
IBM | 0.74 | 0.627 | 0.802 | 0.515 | 0.717 | 0.76 | |
KNeighborsClassifier | BigML | 0.835 | 0.604 | 0.778 | 0.494 | 0.846 | 0.812 |
IBM | 0.718 | 0.611 | 0.812 | 0.489 | 0.682 | 0.747 |
Table 8
Datasets | ||
Model | BigML | IBM |
Parameters | ||
SVMds | ‘kernel’:‘rbf’ | ‘kernel’:‘rbf’ |
‘tol’:0.0001 | ‘tol’:0.0001 | |
KNeighborsClassifier | ‘algorithm’:‘auto’ | ‘algorithm’:‘auto’ |
‘leaf_size’:15 | ‘leaf_size’:15 | |
‘n_neighbors’:20 | n_neighbors:20 | |
GBC | ‘max_depth’:7 | ‘max_depth’:2 |
‘min_samples_leaf’:5 | ‘min_samples_leaf’:3 | |
‘n_estimators’:512 | ‘n_estimators’:256 | |
‘n_iter_no_change’:5 | ‘n_iter_no_change’:5 | |
‘tol’:0.0001 | ‘tol’:0.0001 | |
XGBoost | ‘booster’:‘gbtree’ | ‘booster’:‘gbtree’ |
‘eval_metric’:‘logloss’ | ‘eval_metric’:‘logloss’ | |
‘max_bin’:128 | ‘max_bin’:128 | |
‘max_depth’:7 | ‘max_depth’:2 | |
‘nthread’:1 | ‘nthread’:1 | |
‘subsample’:1 | ‘subsample’:1 | |
‘use_label_encoder’:False | ‘use_label_encoder’:False | |
LGBM | ‘boosting_type’:‘gbdt’ | ‘boosting_type’:‘gbdt’ |
‘learning_rate’:0.1 | ‘learning_rate’:0.1 | |
‘max_depth’:15 | ‘max_depth’:5 | |
‘n_estimators’:128 | ‘n_estimators’:128 | |
‘n_jobs’:1 | ‘n_jobs’:1 | |
‘subsample’:1 | ‘subsample’:1 | |
RF | ‘max_depth’:15 | ‘max_depth’:30 |
‘min_samples_leaf’:7 | ‘min_samples_leaf’:7 | |
‘n_estimators’:256 | ‘n_estimators’:128 |
Table 9
Label | Type of metrics | Accuracy | F-measure | Recall | Precision | Specificity | Balanced accuracy |
Churner | Standard | 0.832 | 0.646 | 0.769 | 0.557 | 0.848 | 0.809 |
True | 0.832 | 0.646 | 0.769 | 0.557 | 0.848 | 0.809 | |
Churner1 | Standard | 0.819 | 0.597 | 0.745 | 0.498 | 0.836 | 0.79 |
True | 0.803 | 0.622 | 0.829 | 0.497 | 0.796 | 0.813 | |
Churner2 | Standard | 0.808 | 0.571 | 0.71 | 0.477 | 0.83 | 0.77 |
True | 0.786 | 0.612 | 0.847 | 0.48 | 0.77 | 0.809 | |
Churner3 | Standard | 0.83 | 0.578 | 0.665 | 0.512 | 0.865 | 0.765 |
True | 0.813 | 0.67 | 0.879 | 0.541 | 0.794 | 0.837 | |
Churner4 | Standard | 0.821 | 0.543 | 0.734 | 0.431 | 0.835 | 0.785 |
True | 0.743 | 0.558 | 0.909 | 0.403 | 0.707 | 0.808 | |
Churner5 | Standard | 0.763 | 0.574 | 0.698 | 0.487 | 0.782 | 0.74 |
True | 0.614 | 0.499 | 0.955 | 0.338 | 0.527 | 0.741 |
Table 10
Label | Type of metrics | Accuracy | F-measure | Recall | Precision | Specificity | Balanced accuracy |
Churner | Standard | 0.821 | 0.625 | 0.751 | 0.535 | 0.838 | 0.795 |
True | 0.821 | 0.625 | 0.751 | 0.535 | 0.838 | 0.795 | |
Churner1 | Standard | 0.819 | 0.59 | 0.723 | 0.498 | 0.84 | 0.782 |
True | 0.808 | 0.63 | 0.819 | 0.511 | 0.805 | 0.812 | |
Churner2 | Standard | 0.818 | 0.576 | 0.688 | 0.496 | 0.847 | 0.767 |
True | 0.798 | 0.624 | 0.838 | 0.497 | 0.788 | 0.813 | |
Churner3 | Standard | 0.829 | 0.596 | 0.72 | 0.509 | 0.852 | 0.786 |
True | 0.798 | 0.649 | 0.904 | 0.506 | 0.771 | 0.837 | |
Churner4 | Standard | 0.791 | 0.48 | 0.664 | 0.376 | 0.813 | 0.738 |
True | 0.749 | 0.588 | 0.896 | 0.437 | 0.712 | 0.804 | |
Churner5 | Standard | 0.763 | 0.576 | 0.704 | 0.487 | 0.781 | 0.742 |
True | 0.617 | 0.509 | 0.957 | 0.346 | 0.528 | 0.742 |
Table 11
Label | Type of metrics | Accuracy | F-measure | Recall | Precision | Specificity | Balanced accuracy |
Churner | Standard | 0.842 | 0.635 | 0.688 | 0.589 | 0.881 | 0.784 |
True | 0.842 | 0.635 | 0.688 | 0.589 | 0.881 | 0.784 | |
Churner1 | Standard | 0.848 | 0.589 | 0.609 | 0.571 | 0.9 | 0.754 |
True | 0.84 | 0.645 | 0.74 | 0.572 | 0.864 | 0.802 | |
Churner2 | Standard | 0.844 | 0.529 | 0.489 | 0.577 | 0.922 | 0.705 |
True | 0.826 | 0.607 | 0.703 | 0.534 | 0.855 | 0.779 | |
Churner3 | Standard | 0.852 | 0.546 | 0.506 | 0.593 | 0.926 | 0.716 |
True | 0.838 | 0.675 | 0.789 | 0.59 | 0.851 | 0.82 | |
Churner4 | Standard | 0.846 | 0.477 | 0.484 | 0.47 | 0.9 | 0.696 |
True | 0.803 | 0.618 | 0.835 | 0.49 | 0.795 | 0.815 | |
Churner5 | Standard | 0.795 | 0.537 | 0.519 | 0.556 | 0.878 | 0.698 |
True | 0.676 | 0.538 | 0.921 | 0.38 | 0.612 | 0.767 |
Table 12
Label | Type of metrics | Accuracy | F-measure | Recall | Precision | Specificity | Balanced accuracy |
Churner | Standard | 0.85 | 0.614 | 0.602 | 0.627 | 0.911 | 0.756 |
True | 0.85 | 0.614 | 0.602 | 0.627 | 0.911 | 0.756 | |
Churner1 | Standard | 0.842 | 0.506 | 0.451 | 0.576 | 0.927 | 0.689 |
True | 0.841 | 0.611 | 0.621 | 0.602 | 0.896 | 0.758 | |
Churner2 | Standard | 0.849 | 0.513 | 0.443 | 0.609 | 0.938 | 0.69 |
True | 0.844 | 0.639 | 0.692 | 0.593 | 0.882 | 0.787 | |
Churner3 | Standard | 0.857 | 0.546 | 0.488 | 0.62 | 0.936 | 0.712 |
True | 0.837 | 0.663 | 0.771 | 0.582 | 0.854 | 0.812 | |
Churner4 | Standard | 0.854 | 0.446 | 0.406 | 0.495 | 0.93 | 0.668 |
True | 0.801 | 0.587 | 0.781 | 0.47 | 0.805 | 0.793 | |
Churner5 | Standard | 0.783 | 0.513 | 0.5 | 0.526 | 0.867 | 0.683 |
True | 0.658 | 0.51 | 0.896 | 0.357 | 0.598 | 0.747 |
Table 13
Label | Type of metrics | Accuracy | F-measure | Recall | Precision | Specificity | Balanced accuracy |
Churner | Standard | 0.688 | 0.517 | 0.837 | 0.374 | 0.651 | 0.744 |
True | 0.688 | 0.517 | 0.837 | 0.374 | 0.651 | 0.744 | |
Churner1 | Standard | 0.697 | 0.509 | 0.875 | 0.359 | 0.658 | 0.766 |
True | 0.691 | 0.549 | 0.925 | 0.391 | 0.631 | 0.778 | |
Churner2 | Standard | 0.667 | 0.471 | 0.824 | 0.33 | 0.633 | 0.728 |
True | 0.655 | 0.519 | 0.912 | 0.363 | 0.589 | 0.75 | |
Churner3 | Standard | 0.689 | 0.477 | 0.805 | 0.338 | 0.664 | 0.735 |
True | 0.664 | 0.534 | 0.915 | 0.377 | 0.597 | 0.756 | |
Churner4 | Standard | 0.647 | 0.401 | 0.812 | 0.266 | 0.619 | 0.716 |
True | 0.608 | 0.482 | 0.922 | 0.326 | 0.531 | 0.727 | |
Churner5 | Standard | 0.609 | 0.486 | 0.809 | 0.347 | 0.55 | 0.679 |
True | 0.488 | 0.435 | 0.961 | 0.281 | 0.366 | 0.663 |
Table 14
Label | Type of metrics | Accuracy | F-measure | Recall | Precision | Specificity | Balanced accuracy |
Churner | Standard | 0.811 | 0.62 | 0.774 | 0.517 | 0.82 | 0.797 |
True | 0.811 | 0.62 | 0.774 | 0.517 | 0.82 | 0.797 | |
Churner1 | Standard | 0.814 | 0.596 | 0.761 | 0.49 | 0.826 | 0.793 |
True | 0.802 | 0.63 | 0.846 | 0.501 | 0.791 | 0.818 | |
Churner2 | Standard | 0.804 | 0.571 | 0.727 | 0.471 | 0.821 | 0.774 |
True | 0.78 | 0.611 | 0.85 | 0.478 | 0.762 | 0.806 | |
Churner3 | Standard | 0.819 | 0.585 | 0.726 | 0.49 | 0.839 | 0.782 |
True | 0.785 | 0.633 | 0.892 | 0.49 | 0.757 | 0.824 | |
Churner4 | Standard | 0.798 | 0.497 | 0.688 | 0.389 | 0.817 | 0.752 |
True | 0.735 | 0.556 | 0.889 | 0.404 | 0.7 | 0.794 | |
Churner5 | Standard | 0.756 | 0.571 | 0.71 | 0.477 | 0.77 | 0.74 |
True | 0.591 | 0.466 | 0.952 | 0.308 | 0.508 | 0.73 |
Table 15
Labelling rules for churners | ||||||
Method | Churner | Churner1 | Churner2 | Churner3 | Churner4 | Churner5 |
Parameters | ||||||
GBC | max_depth:3 | max_depth:2 | max_depth:2 | max_depth:2 | max_depth:2 | max_depth:2 |
min_samples_leaf:7 | min_samples_leaf:7 | min_samples_leaf:7 | min_samples_leaf:5 | min_samples_leaf:3 | min_samples_leaf:3 | |
n_estimators:256 | n_estimators:256 | n_estimators:256 | n_estimators:512 | n_estimators:256 | n_estimators:256 | |
n_iter_no_change:5 | n_iter_no_change:5 | n_iter_no_change:5 | n_iter_no_change:5 | n_iter_no_change:5 | n_iter_no_change:5 | |
tol:0.0001 | tol:0.0001 | tol:0.0001 | tol:0.0001 | tol:0.0001 | tol:0.0001 | |
XGBoost | booster:‘gbtree’ | booster:‘gbtree’ | booster:‘gbtree’ | booster:‘gbtree’ | booster:‘gbtree’ | booster:‘gbtree’ |
eval_metric:‘logloss’ | eval_metric:‘logloss’ | eval_metric:‘logloss’ | eval_metric:‘logloss’ | eval_metric:‘logloss’ | eval_metric:‘logloss’ | |
max_bin:256 | max_bin:512 | max_bin:128 | max_bin:256 | max_bin:128 | max_bin:256 | |
max_depth:3 | max_depth:2 | max_depth:2 | max_depth:2 | max_depth:2 | max_depth:2 | |
nthread:1 | nthread:1 | nthread:1 | nthread:1 | nthread:1 | nthread:1 | |
subsample:1 | subsample:1 | subsample:1 | subsample:1 | subsample:0.5 | subsample:1 | |
use_label_encoder:False | use_label_encoder:False | use_label_encoder:False | use_label_encoder:False | use_label_encoder:False | use_label_encoder:False | |
LGBM | boosting_type:‘gbdt’ | boosting_type:‘gbdt’ | boosting_type:‘gbdt’ | boosting_type:‘gbdt’ | boosting_type:‘gbdt’ | boosting_type:‘gbdt’ |
learning_rate:0.1 | learning_rate:0.1 | learning_rate:0.1 | learning_rate:0.1 | learning_rate:0.1 | learning_rate:0.1 | |
max_depth:10 | max_depth:15 | max_depth:10 | max_depth:10 | max_depth:5 | max_depth:20 | |
n_estimators:128 | n_estimators:128 | n_estimators:128 | n_estimators:128 | n_estimators:128 | n_estimators:128 | |
n_jobs:1 | n_jobs:1 | n_jobs:1 | n_jobs:1 | n_jobs:1 | n_jobs:1 | |
subsample:1 | subsample:1 | subsample:1 | subsample:1 | subsample:1 | subsample:1 | |
RF | max_depth:15 | max_depth:15 | max_depth:15 | max_depth:15 | max_depth:15 | max_depth:5 |
min_samples_leaf:4 | min_samples_leaf:4 | min_samples_leaf:4 | min_samples_leaf:5 | min_samples_leaf:4 | min_samples_leaf:2 | |
n_estimators:128 | n_estimators:128 | n_estimators:128 | n_estimators:256 | n_estimators:512 | n_estimators:128 | |
KNN | algorithm:‘auto’ | algorithm:‘auto’ | algorithm:‘auto’ | algorithm:‘ball_tree’ | algorithm:‘ball_tree’ | algorithm:‘auto’ |
leaf_size:10 | leaf_size:3 | leaf_size:15 | leaf_size:5 | leaf_size:5 | leaf_size:10 | |
n_neighbors:20 | n_neighbors:20 | n_neighbors:20 | n_neighbors:20 | n_neighbors:25 | n_neighbors:20 | |
SVM | kernel:‘rbf’ | kernel:‘rbf’ | kernel:‘rbf’ | kernel:‘rbf’ | kernel:‘rbf’ | kernel:‘rbf’ |
tol:0.0001 | tol:0.0001 | tol:0.0001 | tol:0.0001 | tol:0.0001 | tol:0.0001 |
Fig. 2
-
• As it can be expected, true Recall decreases, true Specificity increases. There are some exceptions. The Specificity with labelling rule Churner2 and Gradient Boosting Classifier method worked surprisingly well due to some reasons that are hidden from the researcher, such as better suitability of the provided set for convergence. As it can be seen from Table 15, it takes the highest N-estimators and different parameters than with other churn labelling rules leading to a better result. The same stands for Recall for these cases: XGBoost with Churner4, LGBM with Churner2.
-
• True precision values depend on the precision of classification and the rate of churners among the additional entries. More specifically, the low values of precision with labelling rule according to full churner definition mean more chances to be improved by the big rate of true positives among the additional entries. The overall picture from the Tables is clear – the precision decreases together with ${T_{c}}$ with some aforementioned exceptions, this means that precision was higher than the rate of churners among the additional entries.
-
• True accuracy parameter decreases, more specifically, partial churner (with corresponding label Churner4) comparing to full churner results in these accuracy drops: 0.089 for GBC, 0.072 for XGBoost, 0.039 for LGBM, 0.049 for RF, 0.08 for KNN, 0.076 for SVM.
Fig. 3
Fig. 4
Fig. 5
5 Discussion
-
• Do companies actually need a binary classification of churners, if the result is sensitive to the assumptions that look natural? Some sort of alternative classification generalization could be considered. Especially it can be true since nowadays changing operators is easy, also new eSIM technology possibilities appeared, the loyalty to some services of companies might be much more fuzzy than it was a couple of decades ago.
-
• The changes of behavioural patterns might greatly affect the proper classification procedure, thus data getting old can significantly affect the results, however, in most studies even the time period is not presented. I.e. even the fact of year season might greatly affect the behavioural patterns of clients, for example, in winter due to Christmas and other socially important events the behavioural pattern might differ a lot comparing to periods during summer time.
6 Conclusions
-
1. If the full churner definition must be avoided for different reasons, such as changes in user behavioural patterns, then the definitions based on 40 day inactivity interval can be a reasonable compromise to achieve reasonably good prediction accuracy, the main sources of errors in such case will likely be the classification problem solution.
-
2. According to the full churn definition, the best F-measure 0.646 was achieved with GBC method with accuracy 0.832, the best accuracy was achieved using Random Forest classifier with F-measure 0.614.
-
3. In terms of F-measure True metric, the best result was achieved with LGBM method using Churner3 label according to definition based on 40 day interval absence. It is important to note that labelling according to full churner definition gave worse F-measure result, although accuracy is better.
-
4. The most significant differences in True and standard metrics due to differences of churn definitions can be seen in cases of usage of LGBM and RF methods. For illustration, as a reference we will use Churner4 label derived from 30 day churn definition as it is done in other researches. For LGBM, the Recall metric standard one equal to 0.484, True – 0.835, F-measure standard and True are equal to 0.477 and 0.618, accordingly. There are similar big differences in case of RF method: the Recall standard and True metrics are equal to 0.406 and 0.781, accordingly; the F-measure standard and True metrics are equal to 0.446 and 0.587, accordingly.