Pub. online:1 Jan 2019Type:Research ArticleOpen Access
Journal:Informatica
Volume 30, Issue 3 (2019), pp. 613–628
Abstract
Fuzzy c-means (FCM) is a well-known and widely applied fuzzy clustering method. Although there have been considerable studies which focused on the selection of better fuzzifier values in FCM, there is still not one widely accepted criterion. Also, in practical applications, the distributions of many data sets are not uniform. Hence, it is necessary to understand the impact of cluster size distribution on the selection of fuzzifier value. In this paper, the coefficient of variation (CV) is used to measure the variation of cluster sizes in a data set, and the difference of coefficient of variation (DCV) is the change of variation in cluster sizes after FCM clustering. Then, considering that the fuzzifier value with which FCM clustering produces minor change in cluster variation is better, a criterion for fuzzifier selection in FCM is presented from cluster size distribution perspective, followed by a fuzzifier selection algorithm called CSD-m (cluster size distribution for fuzzifier selection) algorithm. Also, we developed an indicator called Influence Coefficient of Fuzzifier ($\mathit{ICF}$) to measure the influence of fuzzifier values on FCM clustering results. Finally, experimental results on 8 synthetic data sets and 4 real-world data sets illustrate the effectiveness of the proposed criterion and CSD-m algorithm. The results also demonstrate that the widely used fuzzifier value $m=2$ is not optimal for many data sets with large variation in cluster sizes. Based on the relationship between ${\mathit{CV}_{0}}$ and $\mathit{ICF}$, we further found that there is a linear correlation between the extent of fuzzifier value influence and the original cluster size distributions.