Study of Multi-Class Classification Algorithms’ Performance on Highly Imbalanced Network Intrusion Datasets

Bulavas, Viktoras; Marcinkevičius, Virginijus; Rumiński, Jacek

doi:10.15388/21-INFOR457

Informatica

Study of Multi-Class Classification Algorithms’ Performance on Highly Imbalanced Network Intrusion Datasets

Volume 32, Issue 3 (2021), pp. 441–475

Viktoras Bulavas

Virginijus Marcinkevičius

Jacek Rumiński

https://doi.org/10.15388/21-INFOR457

Pub. online: 7 September 2021 Type: Research Article

Open Access

Received
1 March 2021

Accepted
1 July 2021

Published
7 September 2021

Abstract

This paper is devoted to the problem of class imbalance in machine learning, focusing on the intrusion detection of rare classes in computer networks. The problem of class imbalance occurs when one class heavily outnumbers examples from the other classes. In this paper, we are particularly interested in classifiers, as pattern recognition and anomaly detection could be solved as a classification problem. As still a major part of data network traffic of any organization network is benign, and malignant traffic is rare, researchers therefore have to deal with a class imbalance problem. Substantial research has been undertaken in order to identify these methods or data features that allow to accurately identify these attacks. But the usual tactic to deal with the imbalance class problem is to label all malignant traffic as one class and then solve the binary classification problem. In this paper, however, we choose not to group or to drop rare classes but instead investigate what could be done in order to achieve good multi-class classification efficiency. Rare class records were up-sampled using SMOTE method (Chawla et al., 2002) to a preset ratio targets. Experiments with the 3 network traffic datasets, namely CIC-IDS2017, CSE-CIC-IDS2018 (Sharafaldin et al., 2018) and LITNET-2020 (Damasevicius et al., 2020) were performed aiming to achieve reliable recognition of rare malignant classes available in these datasets.

Popular machine learning algorithms were chosen for comparison of their readiness to support rare class detection. Related algorithm hyper parameters were tuned within a wide range of values, different data feature selection methods were used and tests were executed with and without over-sampling to test the multiple class problem classification performance of rare classes.

Machine learning algorithms ranking based on Precision, Balanced Accuracy Score, $\bar{G}$, and prediction error Bias and Variance decomposition, show that decision tree ensembles (Adaboost, Random Forest Trees and Gradient Boosting Classifier) performed best on the network intrusion datasets used in this research.

References

Adomavicius, G., Kwon, Y. (2011). Improving aggregate recommendation diversity using ranking-based techniques. IEEE Transactions on Knowledge and Data Engineering, 24(5), 896–911.

Batista, G.E.A.P.A., Prati, R.C., Monard, M.C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter. https://doi.org/10.1145/1007730.1007735.

Breiman, L. (2001). Random forests. Machine Learning, 45, 58–32 https://doi.org/10.1023/A:1010933404324.

Breiman, L., Friedman, J., Stone, C., Olshen, R. (1984). Classification and Regression Trees (Wadsworth Statistics/Probability), 0412048418. CRC Press, New York,

Brownlee, J. (2020). Imbalanced Classification with Python – Choose Better Metrics, Balance Skewed Classes, and Apply Cost-Sensitive Learning. Machine Learning Mastery, San Juan, pp. 463.

Buczak, A., Guven, E. (2016). A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys I& Tutorials, 18, 1153–1176. https://doi.org/10.1109/COMST.2015.2494502.

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953.

Chen, T., Guestrin, C. (2016). XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ‘16. ACM, New York, NY, USA, pp. 785–794. 978-1-4503-4232-2. https://doi.org/10.1145/2939672.2939785.

Chicco, D., Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1). https://doi.org/10.1186/s12864-019-6413-7.

Claise, B. (2004). RFC 3954, Cisco Systems NetFlow Services Export Version 9. Technical report, IETF. https://doi.org/10.17487/rfc3954.

Damasevicius, R., Venckauskas, A., Grigaliunas, S., Toldinas, J., Morkevicius, N., Aleliunas, T., Smuikys, P. (2020). Litnet-2020: An annotated real-world network flow dataset for network intrusion detection. Electronics (Switzerland), 9(5). https://doi.org/10.3390/electronics9050800.

Domingos, P. (2000). A unified bias-variance decomposition and its applications. In: Icml, pp. 231–238. 2065432969.

Draper-Gil, G., Lashkari, A.H., Mamun, M.S.I., Ghorbani, A.A. (2016). Characterization of encrypted and VPN traffic using time-related features. In: Proceedings of the 2nd International Conference on Information Systems Security and Privacy, PP. 407–414. https://doi.org/10.5220/0005740704070414.

Dudani, S.A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man and Cybernetics, pp. 325–327. https://doi.org/10.1109/TSMC.1976.5408784.

Dutta, V., Choraś, M., Pawlicki, M., Kozik, R. (2020). A deep learning ensemble for network anomaly and cyber-attack detection. Sensors (Switzerland), 20(16), 1–20. https://doi.org/10.3390/s20164583.

Ferri, C., Hernández-Orallo, J., Modroiu, R. (2009). An experimental comparison of performance measures for classification. Pattern Recognition Letters, 30(1), 27–38. https://doi.org/10.1016/j.patrec.2008.08.010.

Fisher, R. (1954). The analysis of variance with various binomial transformations. Biometrics, 10(1), 130–139.

Freund, Y., Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139. https://doi.org/10.1006/jcss.1997.1504.

Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451.

Friedman, J.H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38(4), 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2.

Garcia, V., Mollineda, R.A., Sanchez, J.S. (2010). Theoretical analysis of a performance measure for imbalanced data. In: 2010 20th International Conference on Pattern Recognition. IEEE, Istanbul, pp. 617–620. 978-1-4244-7542-1. https://doi.org/10.1109/ICPR.2010.156.

Geisser, S. (1964). Posterior odds for multivariate normal classifications. Journal of the Royal Statistical Society: Series B (Methodological), 26(1), 69–76.

Gharib, A., Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A. (2016). An evaluation framework for intrusion detection dataset. In: 2016 International Conference on Information Science and Security (ICISS). IEEE, Pattaya, Thailand, pp. 1–6. 978-1-5090-5493-0. https://doi.org/10.1109/ICISSEC.2016.7885840.

Hart, P.E. (1968). The condensed nearest neighbor rule (Corresp.). IEEE Transactions on Information Theory, 14(3), 515–516. https://doi.org/10.1109/TIT.1968.1054155.

He, H., Ma, Y. (2013). Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley, Piscataway, NJ, pp. 216. 9781118074626. https://doi.org/10.1002/9781118646106.

Hettich, S., Bay, S.D. (1999). The UCI KDD Archive http://kdd.ics.uci.edu. University of California, Department of Information and Computer Science.

Jurman, G., Riccadonna, S., Furlanello, C. (2012). A comparison of MCC and CEN error measures in multi-class prediction. PLoS ONE, 7(8), 41882. https://doi.org/10.1371/journal.pone.0041882.

Kanimozhi, V., Jacob, D.T.P. (2019a). Calibration of various optimized machine learning classifiers in network intrusion detection system on the realistic cyber dataset CSE-CIC-IDS2018 using cloud computing. International Journal of Engineering Applied Sciences and Technology, 04(06), 209–213. https://doi.org/10.33564/IJEAST.2019.v04i06.036.

Kanimozhi, V., Jacob, T.P. (2019b). Artificial intelligence based network intrusion detection with hyper-parameter optimization tuning on the realistic cyber dataset CSE-CIC-IDS2018 using cloud computing. ICT Express, 5(3), 211–214. 9781538675953. https://doi.org/10.1016/j.icte.2019.03.003.

Karatas, G., Demir, O., Sahingoz, O.K. (2020). Increasing the performance of machine learning-based IDSs on an imbalanced and up-to-date dataset. IEEE Access, 8, 32150–32162. https://doi.org/10.1109/ACCESS.2020.2973219.

Kilincer, I.F., Ertam, F., Sengur, A. (2021). Machine learning methods for cyber security intrusion detection: datasets and comparative study. Computer Networks, 188(January), 107840. https://doi.org/10.1016/j.comnet.2021.107840.

Koch, R. (2011). Towards next-generation intrusion detection. In: 2011 3rd International Conference on Cyber Conflict, pp. 151–168.

Kubat, M., Matwin, S. (1997). Addressing the curse of imbalanced data sets: one-sided sampling. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186. https://doi.org/10.1007/3-540-62858-4_79.

Kurniabudi, Stiawan, D., Darmawijoyo, Bin Idris, M.Y.B., Bamhdi, A.M., Budiarto, R. (2020). CICIDS-2017 dataset feature analysis with information gain for anomaly detection. In: IEEE Access, pp. 132911–132921 https://doi.org/10.1109/ACCESS.2020.3009843.

Lashkari, A.H., Gil, G.D., Mamun, M.S.I., Ghorbani, A.A. (2017). Characterization of tor traffic using time based features. In: Proceedings of the 3rd International Conference on Information Systems Security and Privacy, pp. 253–262. 978-989-758-209-7. https://doi.org/10.5220/0006105602530262.

Laurikkala, J. (2001). Improving Identification of Difficult Small Classes by Balancing Class Distribution. Springer. 3540422943. https://doi.org/10.1007/3-540-48229-6_9.

LaValle, S.M., Branicky, M.S., Lindemann, S.R. (2004). On the relationship between classical grid search and probabilistic roadmaps. The International Journal of Robotics Research, 23(7–8), 673–692.

Lawrence Berkeley National Laboratory (2010). The Internet Traffic Archive. http://ita.ee.lbl.gov/index.html.

Lemaitre, G., Nogueira, F., Aridas, C.K. (2016). Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18, 1–5.

Lemaître, G., Nogueira, F., Aridas, C.K. (2017). Imbalanced-learn: a python toolbox to tackle the urse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17), 1–5.

Lin, D., Foster, D.P., Ungar, L.H. (2011). VIF regression: a fast regression algorithm for large data. Journal of the American Statistical Association, 106(493), 232–247. https://doi.org/10.1198/jasa.2011.tm10113.

Lippmann, R.P., Fried, D.J., Graf, I., Haines, J.W., Kendall, K.R., McClung, D., Weber, D., Webster, S.E., Wyschogrod, D., Cunningham, R.K., Zissman, M.A. (1999). Evaluating intrusion detection systems without attacking your friends: the 1998 DARPA intrusion detection evaluation. In: Proceedings DARPA Information Survivability Conference and Exposition, 2000. DISCEX‘00, PP. 12–26. https://doi.org/10.1109/DISCEX.2000.821506.

Maciá-Fernández, G., Camacho, J., Magán-Carrión, R., García-Teodoro, P., Therón, R. (2018). UGR‘16: a new dataset for the evaluation of cyclostationarity-based network IDSs. Computers and Security, 73, 411–424. https://doi.org/10.1016/j.cose.2017.11.004.

Małowidzki, M., Berezinski, P., Mazur, M. (2015). Network intrusion detection: Half a kingdom for a good dataset. In: Proceedings of NATO STO SAS-139 Workshop, Portugal.

Matthews, B.W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) – Protein Structure, 405(2), 442–451. https://doi.org/10.1016/0005-2795(75)90109-9.

Mosley, L. (2013). A balanced approach to the multi-class imbalance problem. Iowa State University, Ames, Iowa. https://doi.org/10.31274/etd-180810-3375.

Ortigosa-Hernández, J., Inza, I., Lozano, J.A. (2017). Measuring the class-imbalance extent of multi-class problems. Pattern Recognition Letters, 98, 32–38. https://doi.org/10.1016/j.patrec.2017.08.002.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., VanderPlas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E. (2011). Scikit-learn: machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.

Raschka, S. (2018). Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. http://arxiv.org/abs/1811.12808.

Ring, M., Wunderlich, S., Grudl, D. (2017). Technical Report CIDDS-001 data set, 001, pp. 1–13.

Ring, M., Wunderlich, S., Scheuring, D., Landes, D., Hotho, A. (2019). A survey of network-based intrusion detection data sets. Computers & Security, 86, 147–167. https://doi.org/10.1016/j.cose.2019.06.005.

Rosenblatt, F. (1957). The perceptron, a perceiving and recognizing automaton. Cornell Aeronautical Laboratory.

Rosenblatt, F. (1962). Principles of Neurodynamics; Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington.

Ross, B.C. (2014). Mutual information between discrete and continuous data sets. PLoS ONE, 9(2), 87357. https://doi.org/10.1371/journal.pone.0087357.

Seabold, S., Perktold, J. (2010). Statsmodels: econometric and statistical modeling with python. In: 9th Python in Science Conference.

Sharafaldin, I., Habibi Lashkari, A., Ghorbani, A.A. (2019). A detailed analysis of the CICIDS2017 data set. In: Mori, P., Furnell, S., Camp, O. (Eds.), Information Systems Security and Privacy. Springer International Publishing, Cham, pp. 172–188. 978-3-030-25109-3.

Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, Vol. 1. ICISSP, Funchal, Madeira, Portugal, pp. 108–116. 978-989-758-282-0. https://doi.org/10.5220/0006639801080116.

Shetye, A. (2019). Feature Selection with Sklearn and Pandas. https://towardsdatascience.com/feature-selection-with-pandas-e3690ad8504b.

Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A. (2012). Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers & Security, 31(3), 357–374. https://doi.org/10.1016/J.COSE.2011.12.012.

Smith, M.R., Martinez, T., Giraud-Carrier, C. (2014). An instance level analysis of data complexity. Machine Learning, 95(2), 225–256. https://doi.org/10.1007/s10994-013-5422-z.

Sokolova, M., Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing and Management, 45, 427–437. https://doi.org/10.1016/j.ipm.2009.03.002.

Thakkar, A., Lohiya, R. (2020). A review of the advancement in intrusion detection datasets. Procedia Computer Science, 167(2019), 636–645. https://doi.org/10.1016/j.procs.2020.03.330.

Tharwat, A. (2018). Classification assessment methods. Applied Computing and Informatics. https://doi.org/10.1016/j.aci.2018.08.003.

The Cooperative Association for Internet Data Analysis (2010). CAIDA – The Cooperative Association for Internet Data Analysis. http://www.caida.org/home/.

The Shmoo Group (2011). Defcon.

Tomek, I. (1976). Two modifications of CNN. IEEE Transactions on Systems, Man and Cybernetics. https://doi.org/10.1109/TSMC.1976.4309452.

Wei, J.M., Yuan, X.J., Hu, Q.H., Wang, S.Q. (2010). A novel measure for evaluating classifiers. Expert Systems with Applications, 37(5), 3799–3809. https://doi.org/10.1016/j.eswa.2009.11.040.

Wilson, D.L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3), 408–421. https://doi.org/10.1109/TSMC.1972.4309137.

Witten, I.H., Frank, E. (2002). Data mining: practical machine learning tools and techniques with Java implementations. ACM SIGMOD Record, 31(1), 76–77. https://doi.org/10.1145/507338.507355.

Witten, I.H., Frank, E., Hall, M.A., Pal, C.J. (2005). Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. Morgan Kaufmann Publishers, San Francisco, pp. 558. 0-12-088407-0.

Yulianto, A., Sukarno, P., Suwastika, N.A. (2019). Improving AdaBoost-based intrusion detection system (IDS) performance on CIC IDS 2017 dataset. Journal of Physics: Conference Series, 1192(1). https://doi.org/10.1088/1742-6596/1192/1/012018.

Zhang, C., Cheng, X., Liu, J., He, J., Liu, G. (2018). Deep sparse autoencoder for feature extraction and diagnosis of locomotive adhesion status. Journal of Control Science and Engineering, 1–9. https://doi.org/10.1155/2018/8676387.

Biographies

Bulavas Viktoras

https://orcid.org/0000-0001-8331-4352

viktoras.bulavas@itpc.vu.lt

V. Bulavas is a data privacy and information security officer at Vilnius University. His research interests include machine learning, information security and privacy. His academic background includes MSc in physics from Vilnius University and a MSc in public management from the Norwegian School of Management. He is certified with CISA, CGEIT, CRISC and CSM.

Marcinkevičius Virginijus

https://orcid.org/0000-0002-2281-4035

virginijus.marcinkevicius@mif.vu.lt

V. Marcinkevičius is a senior researcher, head of the Intelligent Technologies Research Group, and head of Artificial Intelligence Laboratory at Vilnius University, Institute of Data Science and Digital Technologies. His research interests include machine learning, information security and natural language processing. His academic background includes MSc in mathematics from Vilnius Educational University an PhD in informatics from Vytautas Magnus University.

Rumiński Jacek

https://orcid.org/0000-0003-2266-0088

jacek.ruminski@pg.edu.pl

J. Rumiński is a professor at Gdańsk University of Technology and a head of the Department of Biomedical Engineering, also a head of Gdańsk AI Bay club. His research interests include biomedical engineering and information security. His academic background includes MSc in medical devices, PhD in healthcare informatics and habilitation in biocybernetics and biomedical engineering from Gdańsk University of Technology.

Full article Related articles Cited by

Open access article under the CC BY license.

Keywords

network intrusion detection multi-class classification imbalanced learning bias and variance decomposition SMOTE

Metrics

since January 2020

2616

Article info
views

3303

Full article
views

1658

PDF
downloads

393

XML
downloads

RSS

Authors

Abstract

References

Biographies

Export citation

Copy and paste formatted citation

Download citation in file