In this paper we propose modifications of the well-known algorithm of particle swarm optimization (PSO). These changes affect the mapping of the motion of particles from continuous space to binary space for searching in it, which is widely used to solve the problem of feature selection. The modified binary PSO variations were tested on the dataset SVC2004 dedicated to the problem of user authentication based on dynamic features of a handwritten signature. In the example of k-nearest neighbours (kNN), experiments were carried out to find the optimal subset of features. The search for the subset was considered as a multicriteria optimization problem, taking into account the accuracy of the model and the number of features.

A handwritten signature is one of the most commonly used instruments of person authenticating. There are two types of handwritten signature recognition: static (offline) and dynamic (online) (Mailah and Han,

To obtain initial dynamic signature signals, various technical devices are used, for example, a graphic tablet (Nelson and Kishon,

Features can be extracted from the time domain, frequency domain or wavelet transform domain (Wu

The selection of the most informative features when constructing a machine learning model is not only a way to reduce the computational complexity of the model, but also to improve the forecast accuracy. This procedure is particularly important for data describing digitized signals, which generally contain a large number of variables with different informativeness. In particular, some of these variables have a crucial role to recognize the corresponding signals, while others may be useless or even may have a negative effect for the recognition process.

There are three groups of feature selection methods: filters, wrappers and embedded methods. While the last group is particular to specific decision algorithms, the rest are more general. Filters are applied during a stage of data preprocessing. They rely on analysis of the relationship between feature space and output variable. The criteria for assessing the relationship can be mutual information, chi-square test, Fisher’s test, correlation coefficient and other characteristics. Disadvantages of this group of methods include the complexity of assessing a synergy between subsets of features and some separation of filters from decision algorithms.

In contrast, methods from the wrapper group function in conjunction with the decision algorithm. They select a subset of features that allows achieving the best value of the objective function of the model. As a rule, optimization algorithms are used as wrappers, and, above all, metaheuristics. However, not all metaheuristics in their original version are capable of working in a binary search space, such as, for example, the genetic algorithm. The development of effective variations of binary algorithms is an urgent task.

One of the most famous metaheuristics is the particle swarm algorithm (PSO). It is a stochastic search method based on the iteratively repeated interaction of a group of particles (Shi and Eberhart,

This article is devoted to the task of PSO binarization. Binarization is a modification of the algorithm to perform a search in binary space. The contributions of this article are as follows:

We propose the following new methods for binarization of continuous metaheuristics: Local search, New fitness, The merge procedure, The hybrid method of the modified arithmetic operation and the merge procedure.

We modify the continuous PSO with new binarization methods to select features of the user’s classification based on a handwritten signature.

We propose and demonstrate the effectiveness of our binary PSOs for classifying a user by handwritten signature. We will show that these new enhancements have greater authentication efficiency compared to standard binary PSO with transform functions.

The rest of the article is organized as follows. Section

Each solution vector in the algorithm is a particle moving due to the combined effect of the force of inertia, memory and acceleration reported by the best particle. The coordinates of the

Feature selection can be represented as an NP-hard binary optimization problem (Kohavi and John,

Most metaheuristic algorithms are designed to operate in a continuous search space. However, binary search space may be considered as a more suitable way to deal with feature selection task due to its structure. Accordingly, in order to use metaheuristics for addressing the feature selection task in binary search space, their modification is required. The modification should allow the algorithm to function in a binary space, while maximally preserving the original idea of metaheuristics. There are many ways to binarize metaheuristic algorithms. The main ones will be considered below: the method of modified algebraic operations, the method of transformation functions, and the quantum method.

1)

However, there are two disadvantages that make it difficult to apply this method to some algorithms. The first is that the method is practically inapplicable if the algorithm uses any operations other than addition, subtraction, multiplication, and division. The second disadvantage is typical for all methods that require working with binary vectors. Many metaheuristic algorithms, in addition to operations directly on vectors, also use a variety of coefficients represented by real numbers. When switching to binary vectors, these coefficients must also be changed or abandoned, which leads to a change in the original idea of the algorithm and, accordingly, may affect the quality of its work. So, when modifying the PSO algorithm, the authors of Yuan

Despite the external similarity of the algorithm binarized by the method of modified algebraic operations with the original continuous algorithm, its results are not always satisfactory. In such cases, the authors of some works use only a part of logical operations or use their own modified versions instead of some “classic” logical operations. In Kıran and Gündüz (

2)

This method avoids the limitations of the previous method, since it does not require any changes to the original algorithm. However, this method has another complexity, which consists in the problem of choosing a specific function as a transformation one. The research (Mirjalili and Lewis,

3)

The set of instances is a list of objects of the type

In this paper, k-nearest neighbour (KNN) is used as a decision algorithm, as one of the simplest and most frequently used classifiers. The algorithm has only one parameter

The algorithm operation process can be described as follows:

To determine the value

Save all positions and class labels of all training data (

For each new instance of data

Calculate distances from

Choose from

Define a class

Assign the

The feature selection problem is formulated as follows: on the given set of features, find a feature subset that does not cause a significant decrease in classification accuracy, or even increases it, when the number of features is decreased. The solution is represented as a vector

Since an algorithm that works as a wrapper has been chosen to select features, a fitness function that evaluates the quality of a subset of selected variables must be used. We have applied the following function:

The task of the binary optimization algorithm is to search for the minimum of the given function.

Since first founded by Shannon (

The first fuzziness extension of the Shannon’s entropy was first considered by De Luca and Termini (

Based on the membership function defined by Eq. (

Assuming

In case of fuzzy entropy measures, the fuzzy version of symmetry uncertainty (SU) (Witten and Frank,

This subsection describes the investigated modifications of the PSO algorithm for finding suboptimal subsets of features. A population of randomly generated binary vectors

This method uses an S-shaped transformation function to update elements of binary vectors. The transformation function takes as input the velocity of the

V-shaped transformation functions are symmetric about the ordinate axis. In this paper, two such functions were tested. The first one is calculated using hyperbolic tangent:

Due to its promising performance with differential evolution for feature selection, the local search module (Hancer,

A set of unselected features in the data, which are not present in the selected subset, are ranked in descending order in terms of the fuzzy version of the SU criterion, defined by Eq. (

A predefined percentage of top features from the unselected feature subset are determined as candidate features for the selected feature subset.

The mean value of the fuzzy SU (called mean(FSU)) criterion within the selected feature subset is computed.

If the fuzzy SU score of each candidate feature is greater than mean(FSU), it is added to the selected feature subset.

The features from the selected subset are ranked in terms of the fuzzy version of the MIFS criterion, defined by Eq. (

A predefined percentage of selected features from the selected feature subset which own lower fuzzy MIFS scores are determined as candidates for the elimination process.

A random number between 0 and 1 is generated for each candidate.

If the random number of each candidate is smaller than a user-specified threshold, it is eliminated from the selected feature subset.

The overall aim of a wrapper method is to minimize both the classification error rate and the number of features using a classifier. From an optimization perspective, a wrapper method tries to minimize the objective function, defined by Eq. (

1) Two-stage feature selection: In this case, a filter approach is initially carried out to reduce a feature space, and then a wrapper approach is performed on the reduced feature space. However, no interaction is built between the stages, i.e. it does not guarantee high-performing feature subsets.

2) Locally hybridized feature selection: In this case, a local elimination module based on a filter (wrapper) evaluation is adopted in a wrapper (filter) method to increase the effectiveness of the method in the feature space. Compared to the previous case, this case can help a feature selection method to obtain high-performing feature subsets. The aforementioned method, called local search, is an example of this case.

To bring the positive effects of both wrappers and filters, different from these cases, we develop a new fitness function which does not require any additional module or computations during the selection process. The newly developed function is adopted in the standard binary PSO with an S-shaped transformation function, defined as follows.

The essence of this operation is to save matching elements of two binary vectors and randomly select from non-matching elements. Getting the value of the

An example of how this operation works is shown as follows. Assume that

This approach to binarization is a combination of the modified algebraic operations and the merge procedure. As in the method of modified algebraic operations, instead of multiplication you need to use conjunction (AND), instead of subtraction – exclusive or (XOR), but instead of addition – the merge procedure.

After the binarization of the PSO algorithm by this method, change of the particle position can be described by the following expressions:

As already mentioned when describing the method of modified algebraic operations, the disadvantage of this method is the impossibility to use real coefficients. That is why in this case the coefficients

The graphics tablets produce signals of the

1.

2.

3.

4.

A set of dynamic signature features described by authors of Fierrez-Aguilar

The signature signals were taken from the SVC2004 dataset (Task 1) (Yeung

Table

The brief notation of the tested modifications of the PSO algorithm.

Method | Designation |

The S-shaped transformation function | |

Previous method with new fitness function | |

New version of local search | |

The merge operation | |

The hybrid method of the modified arithmetic and merging operation | |

The FV1 transformation functions | |

The FV2 transformation function |

Averaged classification results after feature selection by the PSO algorithm.

Metric | Accuracy | |||

Coefficient | ||||

88.34 ± 5.22 | 89.18 ± 5.57 | 88.75 ± 5.76 | 88.65 ± 5.69 | |

87.69 ± 3.65 | 89.18 ± 3.69 | 89.93 ± 4.07 | 89.58 ± 4.77 | |

88.29 ± 5.09 | ||||

87.20 ± 6.83 | 87.62 ± 7.14 | 87.54 ± 7.17 | 87.68 ± 7.07 | |

88.35 ± 7.94 | 88.49 ± 7.82 | 88.07 ± 8.07 | ||

88.38 ± 6.51 | 87.88 ± 7.73 | 87.99 ± 7.63 | 87.64 ± 8.10 | |

88.80 ± 6.70 | 87.65 ± 7.91 | 87.49 ± 8.25 | 87.65 ± 8.05 |

Metric | FRR | |||

Coefficient | ||||

2.84 ± 3.07 | 3.90 ± 4.64 | 4.02 ± 4.36 | 4.10 ± 4.71 | |

2.92 ± 3.16 | 5.08 ± 5.23 | 5.43 ± 5.44 | 5.35 ± 5.63 | |

2.73 ± 3.62 | 3.82 ± 5.55 | 3.97 ± 5.52 | 3.73 ± 5.47 | |

4.35 ± 5.67 | 9.08 ± 9.16 | 9.05 ± 9.13 | 9.04 ± 8.93 | |

3.08 ± 4.07 | 5.48 ± 7.36 | 5.37 ± 7.46 | 6.05 ± 8.21 | |

3.47 ± 4.72 | 6.59 ± 8.49 | 6.67 ± 8.56 | 6.77 ± 8.54 |

Metric | FAR | |||

Coefficient | ||||

20.48 ± 9.62 | 17.74 ± 9.88 | 18.49 ± 10.40 | 18.60 ± 10.21 | |

22.35 ± 6.74 | 19.39 ± 6.73 | 17.83 ± 7.51 | 18.15 ± 8.64 | |

20.49 ± 9.30 | 15.22 ± 9.96 | |||

22.88 ± 12.74 | 20.94 ± 13.10 | 20.95 ± 13.21 | 20.91 ± 12.96 | |

14.22 ± 12.40 | 13.97 ± 12.76 | |||

20.16 ± 12.11 | 18.77 ± 13.37 | 18.65 ± 13.41 | 18.68 ± 13.79 | |

18.93 ± 12.17 | 18.12 ± 13.45 | 18.35 ± 14.17 | 17.93 ± 13.43 |

Metric | Features | |||

48.17 ± 4.99 | 28.97 ± 2.76 | 28.65 ± 2.72 | 26.27 ± 2.46 | |

66.54 ± 3.75 | 49.25 ± 3.26 | 36.16 ± 2.81 | 29.12 ± 2.55 | |

46.65 ± 5.28 | 9.19 ± 4.64 | 8.11 ±4.45 | 6.95 ± 3.83 | |

55.64 ± 8.84 | 30.27 ± 4.35 | 30.29 ± 4.33 | 29.53 ± 4.24 | |

47.33 ± 9.98 | 16.07 ± 5.58 | 16.07 ± 5.58 | 12.95 ± 5.59 | |

44.21 ± 10.35 | 10.10 ± 4.13 | 10.25 ± 4.06 | 9.57 ± 3.71 |

The experiment was conducted in accordance with a 10-fold cross-validation scheme. The data was split into ten training and ten test samples. However, instances in different test samples are not repeated. Due to the stochasticity of the algorithm, each version was run 30 times, and then the results were averaged.

Table

The presented results show that with a value of

Ranks of Friedman’s two-way analyses for investigated metrics.

Metric | 1 – Accuracy | |||

Coefficient | ||||

4.01 | 3.05 | 2.67 | 2.91 | |

4.54 | 3.34 | 2.64 | 2.67 | |

3.97 | 2.55 | 3.61 | 3.61 | |

5.15 | 4.85 | 4.85 | 4.64 | |

2.97 | 4.22 | 4.32 | 4.11 | |

3.9 | 4.9 | 4.74 | 4.91 | |

3.45 | 5.09 | 5.16 | 5.14 |

Metric | FRR | |||

Coefficient | ||||

3.79 | 4.11 | 4.21 | 4.38 | |

2.35 | 1.48 | 1.46 | 1.78 | |

3.64 | 3.3 | 2.96 | 2.94 | |

3.59 | 3.03 | 3.23 | 2.83 | |

5.74 | 5.69 | 5.85 | 5.53 | |

4.13 | 4.73 | 4.63 | 4.98 | |

4.78 | 5.68 | 5.66 | 5.59 |

Metric | FAR | |||

Coefficient | ||||

4.18 | 2.43 | 2.36 | 2.75 | |

4.71 | 4.24 | 3.8 | 3.78 | |

4.04 | 3.48 | 4.4 | 4.23 | |

5.28 | 5.61 | 5.2 | 5.36 | |

2.63 | 3.04 | 3.11 | 3.06 | |

3.91 | 4.58 | 4.6 | 4.5 | |

3.26 | 4.64 | 4.53 | 4.33 |

Metric | ||||

Coefficient | ||||

3.4 | 2.38 | 2.18 | 2 | |

6.4 | 7 | 6.9 | 6.3 | |

4.35 | 5.25 | 5.18 | 5.13 | |

5.85 | 5.75 | 5.93 | 6.58 | |

2.15 | 1 | 1 | 1 | |

3.33 | 3.98 | 4 | 4 | |

2.53 | 2.65 | 2.83 | 3 |

Table

Pareto front based on the ranks of the classification error and the number of features.

Graphs have been created on the basis of the obtained average rank values in order to assess the Pareto front in terms of the ratio of metrics to the number of selected features. In the first group of graphs shown in Fig.

From the graphs obtained, it is clear that in all cases the use of the opmerge method provides the smallest set of features. With

Fig.

Fig.

Pareto front based on the ranks of the type I error and the number of features.

Pareto front based on the ranks of the type II error and the number of features.

The work considered seven approaches for PSO modification when solving the task of feature selection for a handwritten signature authentication. The investigated methods made it possible in the considered set SVC2004, containing 100 input variables, to select subsets of size from 66.54 (using new fitness function with coefficient

The determination of the