An Effective Approach to Outlier Detection Based on Centrality and Centre-Proximity

Bae, Duck-Ho; Jeong, Seo; Hong, Jiwon; Lee, Minsoo; Ivanović, Mirjana; Savić, Miloš; Kim, Sang-Wook

doi:10.15388/20-INFOR413

Informatica

An Effective Approach to Outlier Detection Based on Centrality and Centre-Proximity

Volume 31, Issue 3 (2020), pp. 435–458

Duck-Ho Bae Seo Jeong Jiwon Hong Minsoo Lee Mirjana Ivanović Miloš Savić Sang-Wook Kim

https://doi.org/10.15388/20-INFOR413

Pub. online: 6 May 2020 Type: Research Article

Open Access

Received
1 January 2019

Accepted
1 March 2020

Published
6 May 2020

Abstract

In data mining research, outliers usually represent extreme values that deviate from other observations on data. The significant issue of existing outlier detection methods is that they only consider the object itself not taking its neighbouring objects into account to extract location features. In this paper, we propose an innovative approach to this issue. First, we propose the notions of centrality and centre-proximity for determining the degree of outlierness considering the distribution of all objects. We also propose a novel graph-based algorithm for outlier detection based on the notions. The algorithm solves the problems of existing methods, i.e. the problems of local density, micro-cluster, and fringe objects. We performed extensive experiments in order to confirm the effectiveness and efficiency of our proposed method. The obtained experimental results showed that the proposed method uncovers outliers successfully, and outperforms previous outlier detection methods.

References

Akoglu, L., McGlohon, M., Faloutsos, C. (2010). OddBall: spotting anomalies in weighted graphs. In: Proceedings of the 14th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 410–421.

Bae, D.-H., Jeong, S., Kim, S.-W., Lee, M. (2012). Outlier detection using centrality and center-proximity. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2251–2254.

Barnett, V., Lewis, T. (1994). Outliers in Statistical Data. John Wiley & Sons.

Bay, S.D., Schwabacher, M. (2003). Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29–38.

Böhm, C., Haegler, K., Müller, N.S., Plant, C. (2009). CoCo: coding cost for parameter-free outlier detection. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 149–158.

Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J. (2000). LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104.

Brin, S., Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. In: Proceedings of the 7th International World Wide Web Conference, pp. 107–117.

Chan, K.Y., Kwong, C.K., Fogarty, T.C. (2010). Modeling manufacturing processes using a genetic programming-based fuzzy regression with detection of outliers. Information Sciences, 180(4), 506–518.

Chandola, V., Banerjee, A., Kumar, V. (2009). Anomaly detection: a survey. ACM Computing Surveys, 41(3), 15:1–15:58.

Domingues, R., Filippone, M., Michiardi, P., Zouaoui, J. (2018). A comparative evaluation of outlier detection algorithms: experiments and analyses. Pattern Recognition, 74, 406–421.

Fanaee-T, H., Gama, J. (2016). Tensor-based anomaly detection: an interdisciplinary survey. Knowledge-Based Systems, 98, 130–147.

Friedman, M., Last, M., Makover, Y., Kandel, A. (2007). Anomaly detection in web documents using crisp and fuzzy-based cosine clustering methodology. Information Sciences, 177(2), 467–475.

Ha, J., Bae, D.-H., Kim, S.-W., Baek, S.C., Jeong, B.S. (2011). Analyzing a Korean blogosphere: a social network analysis perspective. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 773–777.

Han, J., Kamber, M. (2000). Data Mining: Concepts and Techniques. Morgan Kaufmann.

Hawkins, D.M. (1980). Identification of Outliers. Chapman and Hall.

Huang, J., Zhu, Q., Yang, L., Feng, J. (2016). A non-parameter outlier detection algorithm based on natural neighbor. Knowledge-Based Systems, 92, 71–77.

Karypis, G., Han, E.H., Kumar, V. (1999). Chameleon: hierarchical clustering using dynamic modeling. IEEE Computer, 32(8), 68–75.

Kieu, T., Yang, B., Jensen, C.S. (2018). Outlier detection for multidimensional time series using deep neural networks. In: Proceedings of the 19th IEEE International Conference on Mobile Data Management, pp. 125–134.

Kleinberg, J.M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.

Knorr, E.M., Ng, R.T. (1999). Finding intensional knowledge of distance-based outliers. In: Proceedings of 25th International Conference on Very Large Data Bases, pp. 211–222.

Knorr, E.M., Ng, R.T., Tucakov, V. (2000). Distance-based outliers: algorithms and applications. The VLDB Journal, 8(3–4), 237–253.

Moonesinghe, H.D.K., Tan, P.N. (2008). OutRank: a graph-based outlier detection framework using random walk. International Journal on Artificial Intelligence Tools, 17(01), 19–36.

Na, G.S., Kim, D., Yu, H. (2018). DILOF: effective and memory efficient local outlier detection in data streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1993–2002.

Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C. (2003). LOCI: fast outlier detection using the local correlation integral. In: Proceedings of the 19th International Conference on Data Engineering, pp. 315–326.

Ramaswamy, S., Rastogi, R., Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. SIGMOD Record, 29(2), 427–438.

Song, J., Takakura, H., Okabe, Y., Nakao, K. (2013). Toward a more practical unsupervised anomaly detection system. Information Sciences, 231, 4–14.

Tan, P.N., Steinbach, M., Kumar, V. (2005). Introduction to Data Mining. Addison-Wesley.

Widyantoro, D.H., Ioerger, T.R., Yen, J. (2002). An incremental approach to building a cluster hierarchy. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 705–708.

Yerlikaya-Ö, F., Askan, A., Weber, G.W. (2016). A hybrid computational method based on convex optimization for outlier problems: application to earthquake ground motion prediction. Informatica, 27(4), 893–910.

Basketball-Reference.com. Available from: http://www.basketball-reference.com/.

ESPN Fantasy Basketball. Available from: http://www.espn.com/fantasy/basketball/.

Biographies

Bae Duck-Ho

D.-H. Bae received his BS, MS, and PhD degrees in electronics and computer engineering from the Hanyang University, Seoul, Korea, in 2006, 2008, and 2013, respectively. Currently, he is a principle engineer at Samsung Electronics. His research interests include data mining, databases, and storage systems.

Jeong Seo

S. Jeong received his BS in computer science and engineering from the Chung-ang University, Seoul, Korea, in 2009. He received MS in computer engineering from the Hanyang University, Seoul, Korea, in 2011. He was researching clustering and outlier detection methodologies. After graduation, he worked as a SDE for LG Electronics, EA Games, and is currently working for Amazon.

Hong Jiwon

J. Hong received his BS in computer science from the Hanyang University, Seoul, Korea, in 2009. He is currently pursuing a PhD degree in computer and software at the Hanyang University. His research interests include data mining, database, social network analysis and recommender system.

Lee Minsoo

M. Lee received his PhD degree from the University of Florida, and his MS and BS from the Department of Computer Science and Engineering, Seoul National University, in 2000, 1995, 1992, respectively. He is currently a professor at the Department of Computer Science and Engineering, Ewha Womans University, Seoul, Korea, since 2002. He worked for LG Electronics from 1995 to 1996. He also worked for Oracle Corporation in the US as a Senior Member of Technical Staff from 2000 to 2002. His research interests include data mining, data warehouse, web information infrastructures, stream data processing, and deep learning.

Ivanović Mirjana

M. Ivanović, PhD since 2002, holds position of full professor at the Faculty of Sciences, University of Novi Sad, Serbia. She is a member of University Council for informatics for more than 10 years. Author or co-author of 13 textbooks, 13 edited proceedings, 3 monographs, and of more than 440 research papers on multi-agent systems, e-learning and web-based learning, applications of intelligent techniques (CBR, data and web mining), software engineering education, most of which are published in international journals and proceedings of high-quality international conferences. She is/was a member of Program Committees of more than 200 international conferences and General Chair and Program Committee Chair of numerous international conferences. Also she has been an invited speaker at several international conferences and a visiting lecturer in Australia, Thailand and China. As a leader and researcher she has participated in numerous international projects. Currently she is the editor-in-chief of Computer Science and Information Systems Journal.

Savić Miloš

M. Savić is an assistant professor at the Department of Mathematics and Informatics, Faculty of Sciences, University of Novi Sad, where he received his BSc, MSc and PhD degrees in the field of computer science in 2010, 2011 and 2015, respectively. His research interests are in the field of complex network analysis, graph-based machine learning techniques and scientometrics.

Kim Sang-Wook

wook@hanyang.ac.kr

S.-W. Kim received the BS degree in computer engineering from Seoul National University, in 1989, and the MS and PhD degrees in computer science from the Korea Advanced Institute of Science and Technology (KAIST), in 1991 and 1994, respectively. From 1995 to 2003, he served as an associate professor at the Kangwon National University. In 2003, he joined the Hanyang University, Seoul, Korea, where he is currently a professor at the Department of Computer and Software and the director of the Brain-Korea-21-Plus research program. He is also leading a National Research Lab (NRL) Project funded by the National Research Foundation since 2015. His research interests include databases, data mining, multimedia information retrieval, social network analysis, recommendation, and web data analysis.

Full article

Open access article under the CC BY license.

Keywords

graph-based outlier detection centrality centre-proximity

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Ministry of Science and ICT (MSIT) (No. NRF-2020R1A2B5B03001960) and also by the Next-Generation Information Computing Development Program through the NRF funded by the MSIT (No. NRF-2017M3C4A7069440 and No. NRF-2017M3C4A7083678).

Metrics

since January 2020

2580

Article info
views

1675

Full article
views

1346

PDF
downloads

469

XML
downloads

RSS

Authors

Abstract

References

Biographies

Export citation

Copy and paste formatted citation

Download citation in file