2.1 Metaheuristic Algorithm PSO
Each solution vector in the algorithm is a particle moving due to the combined effect of the force of inertia, memory and acceleration reported by the best particle. The coordinates of the
ith particle at the time of iteration
t are determined by the vector
${\boldsymbol{x}_{i}}(t)=({x_{i1}}(t),{x_{i2}}(t),\dots ,{x_{iD}}(t))$, where
$i=1,2,\dots ,N$,
N is a particle count in a swarm. All particles store their best position
${\boldsymbol{p}_{i}}(t)=({p_{i1}}(t),{p_{i2}}(t),\dots ,{p_{iD}}(t))$ for the iterations they have passed. The velocity of a particle is set by a vector
${\boldsymbol{v}_{i}}(t)=({v_{i1}}(t),{v_{i2}}(t),\dots ,{v_{iD}}(t))$. The speed and position of the
ith particle at
$t+1$ iteration are determined by the following expressions:
where
${c_{1}}$,
${c_{2}}$ are positive acceleration coefficients;
${p_{i}}(t)$ is a vector of previous coordinates of the particle
${x_{i}}(t)$ that had the best value of the fitness function,
${p_{g}}(t)$ is a vector of coordinates of the particle that reached the best value of the fitness function in the entire swarm for
t iterations;
w is the empirical coefficient of inertia;
${\textit{rand}_{1}}$ and
${\textit{rand}_{2}}$ are random numbers from the interval
$[0,1]$. The swarm stops moving when at least one of the following conditions is met: the swarm has reached a state of equilibrium, an optimal solution has been found, or a specified number of iterations has been performed.
2.2 Feature Selection
Feature selection can be represented as an NP-hard binary optimization problem (Kohavi and John,
1997). The most optimal solution of this problem can be guaranteed to be found only by a brute-force search. However, brute-force approach is ineffective for solving this problem, since the size of the feature space in modern datasets can amount to several tens or even hundreds. Going through all possible combinations of such number of elements is a very time- and resource-consuming task. The decision to this problem can be the use of metaheuristic algorithms that allow finding a suboptimal solution in an acceptable time.
Most metaheuristic algorithms are designed to operate in a continuous search space. However, binary search space may be considered as a more suitable way to deal with feature selection task due to its structure. Accordingly, in order to use metaheuristics for addressing the feature selection task in binary search space, their modification is required. The modification should allow the algorithm to function in a binary space, while maximally preserving the original idea of metaheuristics. There are many ways to binarize metaheuristic algorithms. The main ones will be considered below: the method of modified algebraic operations, the method of transformation functions, and the quantum method.
1) Modified algebraic operations: The idea of the method is that to convert the algorithm to the binary version, it is enough to put their logical analogues in accordance with the original mathematical operations of the formulas used. For example, to use disjunction (OR) instead of addition, conjunction (AND) instead of multiplication, exclusive or (XOR) instead of subtraction, and logical negation (NOT). Thus, the search space of the algorithm changes from continuous to binary, but the original idea of changing the position of particles is formally preserved.
However, there are two disadvantages that make it difficult to apply this method to some algorithms. The first is that the method is practically inapplicable if the algorithm uses any operations other than addition, subtraction, multiplication, and division. The second disadvantage is typical for all methods that require working with binary vectors. Many metaheuristic algorithms, in addition to operations directly on vectors, also use a variety of coefficients represented by real numbers. When switching to binary vectors, these coefficients must also be changed or abandoned, which leads to a change in the original idea of the algorithm and, accordingly, may affect the quality of its work. So, when modifying the PSO algorithm, the authors of Yuan
et al. (
2009) refused to use the inertia coefficient, and used randomly generated binary vectors as speed coefficients.
Despite the external similarity of the algorithm binarized by the method of modified algebraic operations with the original continuous algorithm, its results are not always satisfactory. In such cases, the authors of some works use only a part of logical operations or use their own modified versions instead of some “classic” logical operations. In Kıran and Gündüz (
2012) when binarizing the artificial bee colony algorithm, the authors analysed the use of logical operations and came to the conclusion that it is necessary to use only two of them: XOR and NOT. In this case, the NOT operation was modified as follows: a random real number from 0 to 1 was generated, and if it was greater than 0.5, a logical negation was performed, otherwise the vector remained unchanged.
2)
Transfer functions: Transformation functions are often used to switch from a continuous space to a binary one. They convert a real number into the range between zero and one. If the calculated function value is greater than a random number, then the corresponding element of the vector either takes the value of one, or is replaced with the opposite (Mirjalili and Lewis,
2013).
This method avoids the limitations of the previous method, since it does not require any changes to the original algorithm. However, this method has another complexity, which consists in the problem of choosing a specific function as a transformation one. The research (Mirjalili and Lewis,
2013), which examined six transformation functions from two families (S-shaped and V-shaped), showed that the use of V-shaped functions significantly improves the performance of the PSO algorithm. In Souza
et al. (
2020), the PSO algorithm, binarized with a V-shaped transformation function, was used to select features when solving the problem of static verification of a handwritten signature. S-shaped and V-shaped transform functions have also been successfully applied to binarize gray wolf algorithm (Chantar
et al.,
2020), bat swarm (Qasim and Algamal,
2020), dragonfly algorithm (Mafarja
et al.,
2017), and butterfly swarm (Arora and Anand,
2019). In Ghosh
et al. (
2020), a new X-shaped transformation function, consisting of two S-shaped ones, was proposed and applied to binarize the Social Mimic algorithm. With this approach, one vector of real numbers is converted into two binary vectors, from which the best is selected.
3)
Quantum Method: The idea of this method is to use vectors represented by Q-bits instead of binary vectors, and principles that apply to quantum mechanics, such as quantum superposition. In Hamed
et al. (
2009), it was shown that using the quantum method for binarization of the PSO algorithm allows increasing the speed of searching for the optimal set of features. In Barani
et al. (
2017), the authors also explored a quantum approach for binarization of the gravitational search algorithm when solving the feature selection task using the kNN classifier. The principle of superposition was used in the work, and a quantum rotation gate was applied to update the solutions. The researchers concluded that the proposed algorithm can significantly reduce the total number of features, while maintaining a comparable classification accuracy relative to other algorithms. The authors (Ashry
et al.,
2020) compared binary gray wolf algorithms using the quantum method (BQI-GWO) and the S-shaped transformation function (BGWO). Comparison of the algorithms showed that the BQI-GWO makes it possible to exclude a larger number of features, while increasing the classification accuracy. Researchers in Zouache and Ben Abdelaziz (
2018) proposed a hybrid swarm intelligence algorithm based on quantum computations and a combination of the firefly algorithm and PSO for feature selection. Quantum computations provided a good tradeoff between the intensification and diversification of search, while the combination of the firefly algorithm and PSO made it possible to efficiently investigate the generated subsets of features. To evaluate the relevance of these subsets, rough set theory was employed.