Fuzzifier Selection in Fuzzy C-Means from Cluster Size Distribution Perspective

Zhou, Kaile; Yang, Shanlin

doi:10.15388/Informatica.2019.221

Informatica

Fuzzifier Selection in Fuzzy C-Means from Cluster Size Distribution Perspective

Volume 30, Issue 3 (2019), pp. 613–628

Kaile Zhou Shanlin Yang

https://doi.org/10.15388/Informatica.2019.221

Pub. online: 1 January 2019 Type: Research Article

Open Access

Received
1 August 2018

Accepted
1 March 2019

Published
1 January 2019

Abstract

Fuzzy c-means (FCM) is a well-known and widely applied fuzzy clustering method. Although there have been considerable studies which focused on the selection of better fuzzifier values in FCM, there is still not one widely accepted criterion. Also, in practical applications, the distributions of many data sets are not uniform. Hence, it is necessary to understand the impact of cluster size distribution on the selection of fuzzifier value. In this paper, the coefficient of variation (CV) is used to measure the variation of cluster sizes in a data set, and the difference of coefficient of variation (DCV) is the change of variation in cluster sizes after FCM clustering. Then, considering that the fuzzifier value with which FCM clustering produces minor change in cluster variation is better, a criterion for fuzzifier selection in FCM is presented from cluster size distribution perspective, followed by a fuzzifier selection algorithm called CSD-m (cluster size distribution for fuzzifier selection) algorithm. Also, we developed an indicator called Influence Coefficient of Fuzzifier ($\mathit{ICF}$) to measure the influence of fuzzifier values on FCM clustering results. Finally, experimental results on 8 synthetic data sets and 4 real-world data sets illustrate the effectiveness of the proposed criterion and CSD-m algorithm. The results also demonstrate that the widely used fuzzifier value $m=2$ is not optimal for many data sets with large variation in cluster sizes. Based on the relationship between ${\mathit{CV}_{0}}$ and $\mathit{ICF}$, we further found that there is a linear correlation between the extent of fuzzifier value influence and the original cluster size distributions.

References

Ahmed, M.N., Yamany, S.M., Mohamed, N., Farag, A.A., Moriarty, T. (2002). A modified fuzzy c-means algorithm for bias field estimation and segmentation of MRI data. IEEE Transactions on Medical Imaging, 21(3), 193–199.

Bache, K., Lichman, M. (2013). UCI machine learning repository. Available at: http://archive.ics.uci.edu/ml (Accessed March 10, 2019).

Benati, S., Puerto, J., Rodríguez-Chía, A.M. (2017). Clustering data that are graph connected. European Journal of Operational Research, 261(1), 43–53.

Bezdek, J.C. (1976). A physical interpretation of fuzzy ISODATA. IEEE Transactions on Systems, Man, and Cybernetics, 6, 387–390.

Bezdek, J.C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms. Springer US, Boston.

Bezdek, J.C., Ehrlich, R., Full, W. (1984). FCM: the fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2–3), 191–203.

Bezdek, J.C., Hathaway, R.J., Sabin, M.J., Tucker, W.T. (1987). Convergence theory for fuzzy c-means: counterexamples and repairs. IEEE Transactions on Systems, Man, and Cybernetics, 17(5), 873–877.

Borg, A., Boldt, M. (2016). Clustering residential burglaries using modus operandi and spatiotemporal information. International Journal of Information Technology & Decision Making, 15(1), 23–42.

Cannon, R.L., Dave, J.V., Bezdek, J.C. (1986). Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI–8(2), 248–255.

Chan, K.P., Cheung, Y.S. (1992). Clustering of clusters. Pattern Recognition, 25(2), 211–217.

Choe, H., Jordan, J.B. (1992). On the optimal choice of parameters in a fuzzy c-means algorithm. In: [1992 Proceedings] IEEE International Conference on Fuzzy Systems. IEEE, New York, pp. 349–354.

Dembélé, D., Kastner, P. (2003). Fuzzy C-means method for clustering microarray data. Bioinformatics, 19(8), 973–980.

Dunn, J.C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3), 32–57.

Fadili, M.J., Ruan, S., Bloyet, D., Mazoyer, B. (2001). On the number of clusters and the fuzziness index for unsupervised FCA application to BOLD fMRI time series. Medical Image Analysis, 5(1), 55–67.

Hall, L.O., Bensaid, A.M., Clarke, L.P., Velthuizen, R.P., Silbiger, M.S., Bezdek, J.C. (1992). A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain. IEEE Transactions on Neural Networks, 3(5), 672–682.

Hartigan, J.A. (1975). Clustering Algorithms. John Wiley & Sons, New York.

Hathaway, R.J., Bezdek, J.C., Hu, Y. (2000). Generalized fuzzy c-means clustering strategies using L_p norm distances. IEEE Transactions on Fuzzy Systems, 8(5), 576–582.

Hou, Z., Qian, W., Huang, S., Hu, Q., Nowinski, W.L. (2007). Regularized fuzzy c-means method for brain tissue clustering. Pattern Recognition Letters, 28(13), 1788–1794.

Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666.

Johnson, S.C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241–254.

Kersten, P.R. (1999). Fuzzy order statistics and their application to fuzzy clustering. IEEE Transactions on Fuzzy Systems, 7(6), 708–712.

Khemchandani, R., Pal, A. (2019). Fuzzy semi-supervised weighted linear loss twin support vector clustering. Knowledge-Based Systems, 165, 132–148.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1: Statistics. University of California Press, Berkeley, pp. 281–297.

Mehdizadeh, E., Teimouri, M., Zaretalab, A., Niaki, S.T.A. (2017). A combined approach based on k-means and modified electromagnetism-like mechanism for data clustering. International Journal of Information Technology & Decision Making, 16(5), 1279–1307.

Mokhtari, H., Salmasnia, A. (2015). An evolutionary clustering-based optimization to minimize total weighted completion time variance in a multiple machine manufacturing system. International Journal of Information Technology & Decision Making, 14(5), 971–991.

Motlagh, O., Berry, A., O’Neil, L. (2019). Clustering of residential electricity customers using load time series. Applied Energy, 237, 11–24.

Olde Keizer, M.C.A., Teunter, R.H., Veldman, J. (2016). Clustering condition-based maintenance for systems with redundancy and economic dependencies. European Journal of Operational Research, 251(2), 531–540.

Ozkan, I., Turksen, I.B. (2004). Entropy assessment for type-2 fuzziness. In: 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No. 04CH37542). IEEE, New York, pp. 1111–1115.

Ozkan, I., Turksen, I.B. (2007). Upper and lower values for the level of fuzziness in FCM. Information Sciences, 177(23), 5143–5152.

Pal, N.R., Bezdek, J.C. (1995). On cluster validity for the fuzzy c-means model. IEEE Transactions on Fuzzy Systems, 3(3), 370–379.

Papoulis, A. (1990). Probability and Statistics. Prentice-Hall, Upper Saddle River.

Park, D.C. (2009). Classification of audio signals using Fuzzy c-Means with divergence-based Kernel. Pattern Recognition Letters, 30(9), 794–798.

Pham, N.V., Pham, L.T., Nguyen, T.D., Ngo, L.T. (2018). A new cluster tendency assessment method for fuzzy co-clustering in hyperspectral image analysis. Neurocomputing, 307, 213–226.

Shen, Y., Shi, H., Zhang, J.Q. (2001). Improvement and optimization of a fuzzy C-means clustering algorithm. In: Proceedings of the 18th IEEE Instrumentation and Measurement Technology Conference. Rediscovering Measurement in the Age of Informatics (Cat. No. 01CH 37188). IEEE, New York, pp. 1430–1433.

Truong, H.Q., Ngo, L.T., Pedrycz, W. (2017). Granular fuzzy possibilistic C-means clustering approach to DNA microarray problem. Knowledge-Based Systems, 133, 53–65.

Wu, J., Chen, J., Xiong, H., Xie, M. (2009a). External validation measures for K-means clustering: a data distribution perspective. Expert Systems with Applications, 36(3), 6050–6061.

Wu, J., Xiong, H., Chen, J. (2009b). Adapting the right measures for K-means clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – KDD 09. ACM Press, New York, pp. 877–886.

Wu, J., Xiong, H., Chen, J. (2009c). Towards understanding hierarchical clustering: a data distribution perspective. Neurocomputing, 72(10–12), 2319–2330.

Wu, J., Xiong, H., Liu, C., Chen, J. (2012). A generalization of distance functions for fuzzy c-means clustering with centroids of arithmetic means. IEEE Transactions on Fuzzy Systems, 20(3), 557–571.

Wu, K.L. (2012). Analysis of parameter selections for fuzzy c-means. Pattern Recognition, 45(1), 407–415.

Xiong, H., Wu, J., Chen, J. (2009). K-means clustering versus validation measures: a data-distribution perspective. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2), 318–331.

Yu, J., Cheng, Q., Huang, H. (2004). Analysis of the weighting exponent in the FCM. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 34(1), 634–639.

Zhao, H., Xu, Z., Wang, Z. (2013). Intuitionistic fuzzy clustering algorithm based on boole matrix and association measure. International Journal of Information Technology & Decision Making, 12(1), 95–118.

Zhou, K., Yang, S. (2016). Exploring the uniform effect of FCM clustering: a data distribution perspective. Knowledge-Based Systems, 96, 76–83.

Biographies

Zhou Kaile

zhoukaile@hfut.edu.cn

K. Zhou received the BS and PhD degrees from the School of Management, Hefei University of Technology, Hefei, China, in 2010 and 2014, respectively. From 2013 to 2014, he was a visiting scholar in the Eller College of Management, The University of Arizona, Tucson, AZ, USA. He is currently an associate professor with the School of Management, Hefei University of Technology. His research interests include clustering algorithm, data analysis, and smart energy management.

Yang Shanlin

yangsl@hfut.edu.cn

S. Yang is currently a distinguished professor with the School of Management, Hefei University of Technology, Hefei, China. He has authored over 300 referred journal papers and over 200 conference papers. His research interests include engineering management, information management, and decision support systems. He is a member of the Chinese Academy of Engineering. He is a fellow of the Asian Pacific Industrial Engineering and Management Society. He is also the vice chairman of the China Branch of the Association of Information Systems.

Full article Related articles Cited by

Open access article under the CC BY license.

Keywords

fuzzy c-means fuzzifier CSD-m algorithm cluster size distribution

Funding

This work is supported by the National Natural Science Foundation of China (Nos. 71822104, 71521001), Anhui Science and Technology Major Project (No. 17030901024), Hong Kong Scholars Program (No. 2017-167), and China Postdoctoral Science Foundation (No. 2017M612072).

Metrics

since January 2020

1236

Article info
views

1829

Full article
views

723

PDF
downloads

227

XML
downloads

RSS

Authors

Abstract

References

Biographies

Export citation

Copy and paste formatted citation

Download citation in file