1 Introduction
1.1 Contribution
2 Datasets Used
2.1 Datasets Considered for Analysis
2.2 Requirements for Cybersecurity Datasets
Table 1
No. | Criteria |
1. | Complete network configuration |
2. | Complete traffic |
3. | Labelled dataset |
4. | Complete interaction |
5. | Complete record |
6. | Available protocols |
7. | Attack diversity |
8. | Anonymity |
9. | Heterogeneity |
10. | Feature set |
11. | Metadata |
2.3 LITNET-2020 Compliance
-
1. Complete network configuration: In order to investigate the real course of attacks, it is necessary to test the real network configuration. All of the network flows in this dataset are received or generated at the Network of Lithuanian academic institutions LITNET.
-
2. Complete traffic: The dataset accumulates full packet flows from the source to the destination, which can be a workstation computer, router or another specialized service device.
-
3. Labelled dataset: The dataset is labelled into a single benign and 12 malignant classes. The benign class is not separately labelled into sub-classes, however, it could be done because the number of benign records is exceeding 36 million records and is close to $92\% $ of the whole dataset.
-
4. Complete interaction: The correct interpretation of the data requires data from the entire network interoperability process. LITNET-2020 dataset, however, is a pure network traffic dataset with no correlated host memory or host log information.
-
5. Record completeness: The LITNET-2020 dataset is compliant with this requirement.
-
6. Various protocols: Records of 13 types of protocols for normal and 3 types of protocols for malignant traffic are available in the LITNET-2020 dataset.
-
7. Diversity and novelty of attacks: The dataset includes attack flows that were detected from 2019-03-06 first flow and 2020-01-31 last flow.
-
8. Anonymity: It is important that the simulated set contain data for which privacy is not important. The LITNET-2020 data set contains no personally identifiable data.
-
9. Heterogeneity: Data from different sources, such as network streams, operating system logs, or network equipment logs, memory images, must be available. LITNET-2020 is not compliant with this requirement.
-
10. Feature Set/Attribute Linkage: It is important for the research that data from different types of sources for the same event be linked, for example, device memory view, network traffic, and device logs. LITNET-2020 is not compliant with this requirement as it contains no linked host sources.
-
11. Metadata and documentation: Information about attributes, how the traffic was generated or collected, network configuration, attackers and victims, machine operating system versions and attack scenarios are required to do the research. LITNET-2020 is documented in Damasevicius et al. (2020).
2.4 Cybersecurity Dataset Imbalance Problem
Table 2
Record Type | CIC-IDS2017 | CSE-CIC-IDS2018 | LITNET-2020 |
Benign | 80.3% | 83.1% | 92.0% |
Malignant | 19.7% | 16.9% | 8.0% |
Table 3
Imbalance category1 | CIC-IDS2017 | CSE-CIC-IDS2018 | LITNET-2020 |
Modest <(10 : 1) | 8.16% | 0.00% | 0.00% |
High <(1000 : 1) | 11.39% | 16.85% | 7.83% |
Extreme >(1000 : 1) | 0.15% | 0.08% | 0.20% |
Total Malignant | 19.7% | 16.9% | 8.0% |
Table 4
CIC-IDS2017 | CIC-IDS2018 | LITNET-2020 | |||
Class | Share | Class | Share | Class | Share |
Bot | 0.0695% | DoS-Slowloris | 0.0677% | W32.Blaster | 0.0660% |
Brute Force-Web | 0.0532% | LOIC-UDP1 | 0.0107% | ICMP Flood | 0.0638% |
Brute Force-XSS | 0.0230% | Brute Force-Web | 0.0038% | HTTP Flood | 0.0630% |
Infiltration | 0.0013% | Brute Force-XSS | 0.0014% | Scan | 0.0170% |
SQL Injection | 0.0007% | SQL Injection | 0.0005% | Reaper Worm | 0.0032% |
Heartbleed | 0.0004% | Spam | 0.0021% | ||
Fragmentation | 0.0013% | ||||
Total Extreme >(1 000 : 1) | 0.15% | 0.08% | 0.20% |
2.5 CIC-IDS-2017
Table 5
Traffic class | Record count | Share (%) |
BENIGN | 2 273 097 | 80.3004% |
DoS Hulk | 231 073 | 8.1630% |
PortScan | 158 930 | 5.6144% |
DDoS | 128 027 | 4.5227% |
DoS GoldenEye | 10 293 | 0.3636% |
FTP-Patator | 7 938 | 0.2804% |
SSH-Patator | 5 897 | 0.2083% |
DoS slowloris | 5 796 | 0.2048% |
DoS Slowhttptest | 5 499 | 0.1943% |
Bot | 1 966 | 0.0695% |
Web Attack-Brute Force | 1 507 | 0.0532% |
Web Attack-XSS | 652 | 0.0230% |
Infiltration | 36 | 0.0013% |
Web Attack-SQL Injection | 21 | 0.0007% |
Heartbleed | 11 | 0.0004% |
-
• Fiat (Forward Inter Arrival Time mean, min, max, std): aggregates on the time between two flows are sent in forward direction;
-
• Biat (Backward Inter Arrival Time mean, min, max, std): aggregates on the time between two flows are sent backwards;
-
• Flowiat (Flow Inter Arrival Time, mean, min, max, std): aggregates on the time between two flows sent in either direction;
-
• Active (mean, min, max, std): aggregates on the amount of time a flow was active before going idle;
-
• Idle (mean, min, max, std): aggregates on the amount of time a flow was idle before becoming active;
-
• Flow Bytes/s: Flow bytes sent per second;
-
• Flow Packets/s: Flow packets sent per second;
-
• Duration: The duration of a flow.
2.6 CSE-CIC-IDS2018
Table 6
Traffic class | Record count | Share (%) |
Benign | 13 484 708 | 83.070% |
HOIC1 | 686 012 | 4.226% |
LOIC-HTTP1 | 576 191 | 3.550% |
Hulk1 | 461 912 | 2.846% |
Bot | 286 191 | 1.76% |
FTP-BruteForce | 193 360 | 1.191% |
SSH-Bruteforce | 187 589 | 1.156% |
Infilteration | 161 934 | 0.998% |
SlowHTTPTest1 | 139 890 | 0.862% |
GoldenEye1 | 41 508 | 0.256% |
Slowloris1 | 10 990 | 0.068% |
LOIC-UDP1 | 1 730 | 0.011% |
Brute Force-Web | 611 | 0.004% |
Brute Force-XSS | 230 | 0.001% |
SQL Injection | 87 | 0.0005% |
2.7 LITNET-2020
Table 7
Traffic class | Record label | Record count1 | Share, % |
Benign | none | 36 423 860 | 91.9709% |
SYN Flood | tcp_syn_f | 1 580 016 | 3.9896% |
Code Red | tcp_red_w | 1 255 702 | 3.1707% |
Smurf | icmp_smf | 118 958 | 0.3004% |
UDP Flood | udp_f | 93 583 | 0.2363% |
LAND DoS | tcp_land | 52 417 | 0.1324% |
W32.Blaster | tcp_w32_w | 24 291 | 0.0613% |
ICMP Flood | icmp_f | 23 256 | 0.0587% |
HTTP Flood | http_f | 22 959 | 0.0580% |
Port Scan | tcp_udp_win_p | 6 232 | 0.0157% |
Reaper Worm | udp_reaper_w | 1 176 | 0.0030% |
Spam botnet | smtp_b | 747 | 0.0019% |
Fragmentation | udp_0 | 477 | 0.0012% |
3 Methods
3.1 Under-Sampling Methods
-
1. Major class records were first randomly under-sampled to a target number of records, such as to provide sufficient learning for all models. Target numbers were obtained after analysis of learning curves. Sufficient learning is defined here as the objective to have learning and testing curves to converge within a margin less than $1\% $, which for all models in this experiment occurs after approximately 0.6 million records.
-
2. Numbers of benign and other highly imbalanced classes were further transformed with a random under-sampling function from Imbalanced-learn library (Lemaitre et al., 2016) using the number of records per class targets, calculated with the following empirically chosen skewed ratio function $N\ast (1-\sqrt{(}s)/2)$ introduced in this research, where N is a number of initial records within a named class, where s is a share of records in that class. This proposed under-sampling method further on in this paper is referred to as Skewed fixed ratio under-sampling. The effect of this function is such that numbers of over-represented classes are decreased in a non linear manner, penalizing the best represented classes, while leaving the rare classes almost intact, thus simplifying, speeding up and decreasing the imbalance of the related learning of rare classes.
3.2 Over-Sampling Methods
3.3 Feature Selection Methods
3.4 Cost-Sensitive Learning Methods
3.5 Choice of Machine Learning Methods
3.5.1 Adaptive Boosting (Adaboost)
3.5.2 Classification and Regression Tree (CART)
3.5.3 k-Nearest Neighbours (KNN)
3.5.4 Quadratic Discriminant Analysis (QDA)
3.5.5 Random Forest Trees (RFT)
3.5.6 Gradient Boosting Classifier (GBC)
3.5.7 Multiple Layer Perceptron
3.6 Performance Measures
3.6.1 Confusion Matrix Based Metrics
(3)
\[ {\textit{Recall}_{i}}=\frac{{\textit{TP}_{i}}}{{\textit{TP}_{i}}+{\textit{FN}_{i}}}=\frac{{c_{ii}}}{{\textstyle\textstyle\sum _{j=1}^{k}}{c_{ij}}},\]3.7 Bias and Variance Decomposition
3.8 Tree Pruning
3.9 Variance Inflation Factor
4 Experiment Design
4.1 CIC-IDS2017 Pre-Processing Steps
Table 8
Class | Share of removed records (%), | Resulting counts1 | Resulting share (%) |
Benign | 7.770% | 2 096 484 | 83.1159% |
DoS Hulk | 25.197% | 172 849 | 6.8527% |
PortScan | 42.856% | 90 819 | 3.6006% |
DDoS | 0.009% | 128 016 | 5.0752% |
DoS GoldenEye | 0.068% | 10 286 | 0.4078% |
FTP-Patator | 25.258% | 5 933 | 0.2352% |
SSH-Patator | 45.413% | 3 219 | 0.1276% |
DoS slowloris | 7.091% | 5 385 | 0.2135% |
DoS Slowhttptest | 4.928% | 5 228 | 0.2073% |
Bot | 0.661% | 1 953 | 0.0774% |
Web Attack – Brute Force | 2.455% | 1 470 | 0.0583% |
Web Attack-XSS | 0.000% | 652 | 0.0258% |
Infiltration | 0.000% | 36 | 0.0014% |
Web Attack-Sql Injection | 0.000% | 21 | 0.0008% |
Heartbleed | 0.000% | 11 | 0.0004% |
Total | 2 522 362 |
Table 9
Class | Record count | Flow Bytes/s | Flow Packets/s |
Benign | 1 077 | 2.071e+09 | 4.0e+06 |
PortScan | 125 | 8.00e+06 | 2.0e+06 |
Bot | 5 | 1.20e+07 | 2.0e+06 |
FTP-Patator | 2 | 1.40e+07 | 3.0e+06 |
DDoS | 2 | 3.47e+08 | 2.0e+06 |
Total: | 1211 |
Table 10
Record label | Training records | Resulting share (%) | Testing records | Resulting share (%) |
Benign | 442 421 | 64.739% | 442 421 | 67.508% |
DoS Hulk | 86 425 | 12.646% | 86 424 | 13.187% |
DDoS | 64 008 | 9.366% | 64 008 | 9.767% |
PortScan | 45 410 | 6.645% | 4 5409 | 6.929% |
DoS GoldenEye | 5 143 | 0.753% | 5 143 | 0.785% |
FTP-Patator | 4 999 | 0.731% | 2 967 | 0.453% |
DoS slowloris | 4 999 | 0.731% | 2 692 | 0.411% |
DoS Slowhttptest | 4 999 | 0.731% | 2 614 | 0.399% |
SSH-Patator | 4 999 | 0.731% | 1 610 | 0.246% |
Bot | 4 999 | 0.731% | 976 | 0.149% |
Web Attack-Brute Force | 2 999 | 0.439% | 735 | 0.112% |
Web Attack-XSS | 2 999 | 0.439% | 326 | 0.050% |
Infiltration | 2 999 | 0.439% | 18 | 0.003% |
Web Attack-Sql Injection | 2 999 | 0.439% | 11 | 0.002% |
Heartbleed | 2 999 | 0.439% | 6 | 0.001% |
Total: | 683 397 | 655 360 |
4.2 CIC-IDS2018 Pre-Processing Steps
-
1. the top two classes (‘Benign’ and ‘DDoS attacks-LOIC-HTTP’) were under-sampled so as to represent no more than a number of records, providing sufficient learning for the worst performing model, obtained after analysis of learning curves.
-
2. The remaining data was split into test and train sub-samples.
Table 11
Record label | Training records | Resulting share (%) | Testing records | Resulting share (%) |
Benign | 134 850 | 20.067% | 134 849 | 20.576% |
DDoS attacks-LOIC-HTTP | 129 558 | 19.280% | 129 558 | 19.769% |
DDOS attack-HOIC | 99 430 | 14.796% | 99 431 | 15.172% |
Infilteration | 72 612 | 10.805% | 72 613 | 11.080% |
DoS attacks-Hulk | 72 599 | 10.804% | 72 600 | 11.078% |
Bot | 72 268 | 10.754% | 72 267 | 11.027% |
SSH-Bruteforce | 47 024 | 6.998% | 47 024 | 7.175% |
DoS attacks-GoldenEye | 20 703 | 3.081% | 20 703 | 3.159% |
DoS attacks-Slowloris | 4 954 | 0.737% | 4 954 | 0.756% |
DDOS attack-LOIC-UDP | 2 999 | 0.446% | 865 | 0.132% |
Brute Force-Web | 2 999 | 0.446% | 285 | 0.043% |
Brute Force-XSS | 2 999 | 0.446% | 114 | 0.017% |
SQL Injection | 2 999 | 0.446% | 43 | 0.007% |
FTP-BruteForce | 2 999 | 0.446% | 27 | 0.004% |
DoS attacks-SlowHTTPTest | 2 999 | 0.446% | 27 | 0.004% |
Total: | 671 992 | 655 360 |
Table 12
Class | Record count | Flow Bytes/s | Flow Packets/s |
Benign | 6 243 | 1.47e+09 | 4.0e+6 |
Infilteration | 1 129 | 2.74e+08 | 3.0e+06 |
FTP-BruteForce | 1 | 0.0e+00 | 2.0e+06 |
Total: | 7 373 |
4.3 LITNET-2020 Dataset Pre-Processing
Table 13
Traffic type | Share of removed records (%) | Resulting counts of records1 | Resulting share (%) |
Benign | 33.1% | 24 349 750 | 95.052% |
SYN Flood | 98.2% | 28 873 | 0.113% |
Code Red | 13.5% | 1 085 656 | 4.238% |
Smurf | 87.7% | 14 642 | 0.057% |
UDP Flood | 1.3% | 92 412 | 0.361% |
LAND DoS | 75.3% | 12 926 | 0.050% |
W32.Blaster | 99.2% | 200 | 0.001% |
ICMP Flood | 92.6% | 1 723 | 0.007% |
HTTP Flood | 1.7% | 22 578 | 0.088% |
Scan | 0.0% | 6 232 | 0.024% |
Reaper Worm | 0.3% | 1 173 | 0.005% |
Spam | 0.1% | 746 | 0.003% |
Fragmentation | 15.9% | 401 | 0.002% |
Table 14
Record label | Training records | Resulting share (%) | Testing records | Resulting share (%) |
Benign | 349 470 | 51.277% | 349 470 | 53.325% |
Code Red | 215 484 | 31.618% | 215 485 | 32.880% |
UDP Flood | 45 858 | 6.729% | 45 859 | 6.997% |
SYN Flood | 14 436 | 2.118% | 14 437 | 2.203% |
HTTP Flood | 11 289 | 1.656% | 11 289 | 1.723% |
Smurf | 9 999 | 1.467% | 7 321 | 1.117% |
Scan | 9 999 | 1.467% | 6 463 | 0.986% |
LAND DoS | 9 999 | 1.467% | 3 116 | 0.475% |
Spam | 2 999 | 0.440% | 710 | 0.108% |
Reaper Worm | 2 999 | 0.440% | 587 | 0.090% |
ICMP Flood | 2 999 | 0.440% | 373 | 0.057% |
Fragmentation | 2 999 | 0.440% | 153 | 0.023% |
W32.Blaster | 2 999 | 0.440% | 100 | 0.015% |
Total: | 681 529 | 655 363 |
4.4 Experiment Software Environment
4.5 Parameter Values Selection
-
1. ADA: n_estimators: (range(10, 256, 5)), learning_rate: [0.001, 0.005, 0.01, 0.5, 1], and base estimator – CART.
-
2. CART: criterion: (‘entropy’, ‘gini’), max_depth: range(4, 32), in_samples_leaf: range(6, 10, 1), max_features: [0.5, 0.6, 0.8, 1.0, ‘auto’].
-
3. GBC: max_depth: range(4, 32, 1),n_estimators: range(100, 256, 5), other parameters used from CART.
-
4. KNN: n_neighbors: range(3, 16, 1), algorithm: [‘ball_tree’, ‘auto’],leaf_size: range(15, 35, 5)
-
5. MLP: hidden_layer_sizes: tuple (32 ... 256, 32 ... 256) ($\textit{step}=1$), alpha: np.geomspace(1e−2, 2, 50, endpoint = True), activation: [‘identity’, ‘logistic’, ‘tanh’, ‘relu’], solver: [‘lbfgs’, ‘sgd’, ‘adam’], learning_rate: [‘constant’, ‘adaptive’], beta_1 : np.linspace(0.85, 0.95, 11, endpoint = True), learning_rate_init: np.geomspace(2e−4, 6e−4, 5, endpoint = True), max_iter: [200, 300], early_stopping: [True, False].
-
6. QDA: reg_param: np.geomspace(1e−19, 1e−1, 50, endpoint = True). Value of tol parameter only impacts threshold when warnings of variable collinearity should be suppressed.
-
7. RFC: n_estimators: range(100, 350, 5), other parameters in the same ranges as CART.
Table 15
Dataset | |||
Model | CIC-IDS2017 | CIC-IDS2018 | LITNET-2020 |
Parameters | |||
ADA | base_estimator = DecisionTreeClassifier, learning_rate = 11, n_estimators = 120, tree parameters as indicated for CART, next row | ||
CART | criterion = ‘entropy’, min_samples_leaf = 7, max_features = 0.5, max_depth = 32, ccp_alpha = 0.00001, class_weight = ‘balanced’ |
criterion = ‘entropy’, min_samples_leaf = 7, max_features = 0.5, max_depth = 32, ccp_alpha = 0.00001, class_weight = ‘balanced’ |
criterion = ‘entropy’, min_samples_leaf = 7, max_features = 0.5, max_depth = 15, ccp_alpha = 0.00001, class_weight = ‘balanced’ |
GBC | n_estimators = 120, min_samples_leaf = 7, max_features = 0.5, max_depth = 15, ccp_alpha = 0.00001, tree_method = ‘gpu_hist’ |
n_estimators = 120, min_samples_leaf = 7, max_features = 0.5, max_depth = 15, ccp_alpha = 0.00001, tree_method = ‘gpu_hist’ |
n_estimators = 120, min_samples_leaf = 7, max_features = 0.5, max_depth = 15, ccp_alpha = 0.00001, tree_method = ‘gpu_hist’ |
KNN | algorithm = ‘ball_tree’, leaf_size = 301, metric = ‘manhattan’ n_neighbors = 4, weights = ‘distance’ |
algorithm = ‘ball_tree’, leaf_size = 301, metric = ‘manhattan’, n_neighbors = 4, weights = ‘uniform’1 |
algorithm = ‘ball_tree’, leaf_size = 301, metric = ‘minkowski’1, n_neighbors = 4, p = 21, weights = ‘uniform’1 |
MLP | activation = ‘relu’1, solver = ‘adam’1, alpha = 0.01, beta_1 = 0.91, hidden_layer_sizes = (120, 60), learning_rate = ‘constant’1, learning_rate_init = 0.0011, early_stopping = True1, max_iter = 2001, warm_start = False1 |
activation = ‘relu’1, solver = ‘adam’1, alpha = 0.067, beta_1 = 0.86, hidden_layer_sizes = (32, 46), learning_rate = ‘adaptive’, learning_rate_init = 0.00045, early_stopping = False, max_iter = 300, warm_start = True |
activation = ‘relu’1, solver = ‘adam’1, alpha = 0.01, beta_1 = 0.91, hidden_layer_sizes = (120, 60), learning_rate = ‘adaptive’, learning_rate_init = 0.0011, early_stopping = True1, max_iter = 2001, warm_start = True |
QDA | priors = priors2, reg_param = 2.1e-8, tol = 0.1 |
priors = priors2, reg_param = 2.3e-5, tol = 0.1 |
priors = priors2, reg_param = 0.002, tol = 0.1 |
RFC | criterion = ‘entropy’, min_samples_leaf = 7, max_features = 0.5, max_depth = 15, n_estimators = 120, ccp_alpha = 0.01, class_weight = ‘balanced’ |
criterion = ‘entropy’, min_samples_leaf = 7, max_features = 1.0, max_depth = 15, n_estimators = 120, ccp_alpha = 0.01, class_weight = ‘balanced’ |
criterion = ‘entropy’, min_samples_leaf = 8, max_features = 0.5, max_depth = 15, n_estimators = 156, ccp_alpha = 0.00001, class_weight = ‘balanced’ |
5 Results and Discussion
5.1 Results of the Conducted Experiments
Table 16
CIC-IDS2017 | CIC-IDS2018 | LITNET-2020 | Rank by BAS | ||||||||
Model | ErR | BAS | Rank | ErR | BAS | Rank | ErR | BAS | Rank | Total | Best |
ADA1 | 0.001 | 0.995 | 1 | 0.060 | 0.887 | 4 | 0.003 | 0.996 | 1 | 5 | 1 |
CART | 0.004 | 0.984 | 5 | 0.064 | 0.897 | 3 | 0.005 | 0.985 | 4 | 12 | 4 |
GBC | 0.003 | 0.986 | 4 | 0.063 | 0.811 | 4 | 0.011 | 0.756 | 6 | 14 | 5 |
KNN | 0.006 | 0.989 | 3 | 0.060 | 0.917 | 1 | 0.044 | 0.864 | 5 | 9 | 3 |
MLP | 0.020 | 0.937 | 7 | 0.072 | 0.860 | 6 | 0.070 | 0.698 | 7 | 20 | 7 |
QDA | 0.068 | 0.951 | 6 | 0.090 | 0.843 | 7 | 0.022 | 0.992 | 2 | 15 | 6 |
RFC | 0.002 | 0.991 | 2 | 0.059 | 0.898 | 2 | 0.005 | 0.987 | 3 | 7 | 2 |
Table 17
CIC-IDS2017 | CIC-IDS2018 | LITNET-2020 | Rank | ||||||||
Model | Pr | $\bar{G}$ | Rank | Pr | $\bar{G}$ | Rank | Pr | $\bar{G}$ | Rank | Total | Best |
ADA | 0.928 | 0.919 | 1 | 0.991 | 0.990 | 1 | 0.970 | 0.994 | 1 | 3 | 1 |
CART | 0.868 | 0.886 | 5 | 0.971 | 0.977 | 6 | 0.828 | 0.989 | 4 | 15 | 5 |
GBC | 0.892 | 0.884 | 4 | 0.988 | 0.987 | 2 | 0.963 | 0.987 | 3 | 9 | 3 |
KNN | 0.906 | 0.912 | 2 | 0.988 | 0.987 | 2 | 0.674 | 0.519 | 7 | 11 | 4 |
MLP | 0.879 | 0.834 | 6 | 0.979 | 0.977 | 5 | 0.685 | 0.876 | 6 | 17 | 6 |
QDA | 0.713 | 0.839 | 7 | 0.936 | 0.881 | 7 | 0.915 | 0.978 | 5 | 19 | 7 |
RFC | 0.913 | 0.907 | 2 | 0.985 | 0.984 | 4 | 0.937 | 0.998 | 2 | 8 | 2 |
Table 18
CIC-IDS2017 | CIC-IDS2018 | LITNET-2020 | Rank1 | ||||||||
Model | Bias2 | Var | Rank | Bias2 | Var | Rank | Bias2 | Var | Rank | Total | Best |
ADA | 0.09 | 0.024 | 1 | 1.36 | 0.324 | 1 | 0.22 | 0.006 | 1 | 3 | 1 |
CART | 0.15 | 0.109 | 6 | 1.80 | 0.966 | 5 | 0.26 | 0.049 | 4 | 15 | 4 |
GBC | 0.08 | 0.025 | 1 | 1.96 | 0.201 | 2 | 0.22 | 0.041 | 3 | 6 | 2 |
KNN | 0.14 | 0.050 | 4 | 2.26 | 0.984 | 6 | 1.08 | 0.335 | 7 | 17 | 5 |
MLP | 0.16 | 0.051 | 5 | 2.77 | 0.477 | 7 | 0.54 | 0.231 | 5 | 17 | 5 |
QDA | 0.56 | 0.018 | 7 | 19.23 | 0.985 | 8 | 1.12 | 0.003 | 6 | 21 | 8 |
RFC | 0.11 | 0.034 | 3 | 1.90 | 0.279 | 3 | 0.25 | 0.006 | 2 | 8 | 3 |
5.2 Discussion and Comparison of the Results
Table 19
Algorithm | Dataset | Precision | Recall | F1 | Source1 |
ADA | CIC-IDS-2017 | 0.77 | 0.84 | 0.77 | (Sharafaldin et al., 2018) |
ADA | CSE-CIC-IDS2018 | 0.999 | 0.999 | 0.999 | (Kanimozhi and Jacob, 2019a) |
ADA | CIC-IDS-2017 | 0.818 | 1.0 | 0.900 | (Yulianto et al., 2019) |
ADA | CSE-CIC-IDS2018 | 0.997 | 0.997 | 0.997 | (Karatas et al., 2020) |
ADA | CIC-IDS2017 | 0.999 | 0.999 | 0.999 | This research |
ADA | CSE-CIC-IDS2018 | 0.999 | 0.999 | 0.999 | This research |
ADA | LITNET-2020 | 0.997 | 0.996 | 0.997 | This research |
ID3 | CIC-IDS-2017 | 0.98 | 0.98 | 0.98 | (Sharafaldin et al., 2018) |
DT | CSE-CIC-IDS2018 | 0.997 | 0.997 | 0.997 | (Karatas et al., 2020) |
DT | CSE-CIC-IDS2018 | 0.999 | 0.999 | 0.999 | (Kilincer et al., 2021) |
CART | CIC-IDS2017 | 0.997 | 0.997 | 0.997 | This research |
CART | CSE-CIC-IDS2018 | 0.997 | 0.998 | 0.998 | This research |
CART | LITNET-2020 | 0.995 | 0.985 | 0.995 | This research |
GBC | CSE-CIC-IDS2018 | 0.995 | 0.991 | 0.993 | (Karatas et al., 2020) |
GBC | CIC-IDS2017 | 0.997 | 0.997 | 0.997 | This research |
GBC | CSE-CIC-IDS2018 | 0.970 | 0.961 | 0.965 | This research |
GBC | LITNET-2020 | 0.987 | 0.756 | 0.987 | This research |
KNN | CIC-IDS-2017 | 0.96 | 0.96 | 0.96 | (Sharafaldin et al., 2018) |
KNN | CSE-CIC-IDS2018 | 0.998 | 0.999 | 0.998 | (Kanimozhi and Jacob, 2019a) |
KNN | CSE-CIC-IDS2018 | 0.993 | 0.985 | 0.979 | (Karatas et al., 2020) |
KNN | CSE-CIC-IDS2018 | 0.958 | 0.958 | 0.955 | (Kilincer et al., 2021) |
KNN | CIC-IDS2017 | 0.994 | 0.994 | 0.994 | This research |
KNN | CSE-CIC-IDS2018 | 0.989 | 0.989 | 0.985 | This research |
KNN | LITNET-2020 | 0.957 | 0.864 | 0.955 | This research |
MLP | CIC-IDS-2017 | 0.77 | 0.83 | 0.76 | (Sharafaldin et al., 2018) |
MLP | CSE-CIC-IDS2018 | 1.0 | 1.0 | 1.0 | (Kanimozhi and Jacob, 2019a) |
MLP | CIC-IDS2017 | 0.981 | 0.980 | 0.980 | This research |
MLP | CSE-CIC-IDS2018 | 0.960 | 0.959 | 0.958 | This research |
MLP | LITNET-2020 | 0.933 | 0.698 | 0.929 | This research |
LSTM | CSE-CIC-IDS2018 | 1.0 | 1.0 | 1.0 | Dutta et al. (2020) |
DNN | CSE-CIC-IDS2018 | 1.0 | 1.0 | 1.0 | Dutta et al. (2020) |
QDA | CIC-IDS-2017 | 0.97 | 0.88 | 0.92 | (Sharafaldin et al., 2018) |
LDA | CSE-CIC-IDS2018 | 0.989 | 0.991 | 0.990 | (Karatas et al., 2020) |
QDA | CIC-IDS2017 | 0.966 | 0.932 | 0.944 | This research |
QDA | CSE-CIC-IDS2018 | 0.712 | 0.648 | 0.597 | This research |
QDA | LITNET-2020 | 0.980 | 0.992 | 0.979 | This research |
RFC | CIC-IDS-2017 | 0.98 | 0.97 | 0.97 | (Sharafaldin et al., 2018) |
RFC | CIC-IDS-2017 | 0.999 | 0.999 | 0.999 | (Sharafaldin et al., 2019) |
RFC | CSE-CIC-IDS2018 | 0.999 | 0.999 | 0.999 | (Kanimozhi and Jacob, 2019a) |
RFC | CSE-CIC-IDS2018 | 0.993 | 0.992 | 0.993 | (Karatas et al., 2020) |
RFC | CIC-IDS2017 | 0.998 | 0.998 | 0.998 | This research |
RFC | CSE-CIC-IDS2018 | 0.991 | 0.993 | 0.992 | This research |
RFC | LITNET-2020 | 0. 996 | 0.997 | 0.996 | This research |