Convolutional neural networks (CNNs) were popular in ImageNet large scale visual recognition competition (ILSVRC 2012) because of their identification ability and computational efficiency. This paper proposes a palm vein recognition method based on CNN. The four main steps of palm vein recognition are image acquisition, image preprocessing, feature extraction, and matching. To reduce the processing steps in the recognition of palm vein images, a palm vein recognition method using a CNN is proposed. CNN is a deep learning network. Palm vein images are acquired using near-infrared light, under which the veins in the palm of the hand are relatively prominent. To obtain a good vein image, many previous methods used preprocessing to further enhance the image before using feature extraction to find feature matches for further comparison. In recent years, CNNs have been shown to have great advantages and have performed well in image classification. To reduce early-stage image processing, a CNN is used to classify and recognize palm vein images. The networks AlexNet and VGG depth CNN were trained to extract image features. The palm vein recognition rates by VGG-19, VGG-16, and AlexNet were 98.5%, 97.5%, and 96%, respectively.
User authentication identification methods include passwords, cards, and biometrics (O’Gorman,
The process of recognizing palm veins is complicated by image preprocessing, feature extraction, and matching methods. The process is shown in Fig.
The major contributions of this paper are as follows:
We propose a palm vein recognition method based on a CNN that uses palm vein features to identify users. CNNs reduce the labourious manual selection process and simplifies authentication.
Palm vein recognition process. Finally, we test the performance metrics of CNN, such as the False Acceptance Rate (FAR) and the False Rejection Rate (FRR).
Vein recognition includes palm veins (Kang and Wu, Structure-based methods use line features and point features. This information is highly dependent on the selected coordinates and very sensitive to spatial occlusion; thus, the region of interest (ROI) must be carefully selected. Usually, when poorly performing images and feature points are used in feature extraction, difficulty in matching occurs. This method is also relatively sensitive to scaling, rotation, and displacement (Akinsowon and Alese, Appearance-based (subspace-based) methods use features to reduce dimensionality from high-dimensional space to low-dimensional space and retain the required information. Common methods include PCA, LDA, ICA, and subspace clustering (SSC) (Liu and Zhang, Statistical-based methods use local statistics, such as the mean and variance of each small area, which are calculated and considered as characteristics. Gabor, wavelet, and Fourier transforms have been applied, including local binary patterns (LBP) or local derivative patterns (LDP) (Aglio-Caballero Local invariant-based methods use an image pyramid that is constructed to form a three-dimensional image space. The local maxima of each layer are obtained using a Hessian matrix, and a neighbourhood corresponding to the scale is selected at the feature point to find the main direction. With the main direction as the axis, coordinates can be established at each feature point with methods including scale invariant feature transform (SIFT) and speeded up robust features (SURF) (Gurunathan Fusion, also a very popular method, can improve the accuracy and security of the system, but the relative amount of data required is large. There are many fusion methods, including the use of palm, finger, vein, and face and many fusion rules, including algorithms, SVM, and neural networks (Garg Deep learning. In recent years, the prevalence of neural networks has made CNNs the most popular image classification method; the recognition rate is not inferior to previous methods (Huang
With the many research topics discussed above, because of the convenience and practicality of CNNs, we propose a palm vein recognition method based on CNN. The main contributions of our work are as follows. The vein pattern of the palm of the hand was photographed with a near-infrared camera using a pretrained CNN model called AlexNet (Krizhevsky
Using the CNN framework for palm vein recognition.
In this section, we introduce the CNN model and the network architecture. In the proposed scheme, the model is trained and used for identification. A CNN has advantages in image recognition. According to the pioneering work of Lenet (Lécun
In this paper, a CNN is used to analyse the important parts of the network and the overall architecture. A CNN is a type of deep neural network. It has important advantages in image recognition processing. CNN feature extraction is different in each layer, and as the number of layers increases, feature classification improves.
Previous neural networks were still very limited when dealing with problems, such as computer vision, natural language processing and voice recognition. Due to the appearance of the convolution kernel, CNN has advantages for processing multidimensional data. The most important concepts of the CNN are the convolution layer, activation function, pooling layer, and fully connected layer.
Convolution operation of
The processing performed by the convolutional layer is called the “convolutional operation”. It uses filters to process and extract features and reduce noise. As the local links and weights are shared, the training parameters are significantly reduced. In high-dimensional images, all connected neurons will need to train a large number of parameters, which causes the calculation to be time consuming and leads to severe overfitting. The characteristics of local links and weight sharing simplifies parameter training. As shown in Fig.
The activation function is used to add nonlinear factors. If the activation function is not used in the neural network, the output and input cannot be separated from each other using a linear relationship. Therefore, it is meaningless to make a deep neural network without the activation function. The Sigmoid and Tanh functions were previously used as the activation function, but gradient disappearance easily occurs when using these functions. When the neural network is deepened, training obstacles often occur. The ReLU function can effectively overcome gradient disappearance and requires less computation. The ReLU function gradually replaced the Sigmoid and Tanh functions.
The activation functions.
The data extracted by the convolutional layer have a high data dimension. The primary role of the pooling layer is to reduce the data dimension to avoid overfitting. The commonly used methods are max pooling and average pooling. In addition to reducing data dimensionality, pooling is resistant to image translation and slight deformation. Figure
The max pooling processing.
The entire CNN acts as a classifier. If the operations of the convolutional layer, pooling layer, and activation function map the original data to the hidden layer feature space, the fully connected layer will learn the “distributed feature representation” maps to the role of the sample tag space. Among them, the result is classified by Softmax and normalized to obtain the probability value, as shown in (
The two palm vein databases currently published have a multispectral palm vein database. The database equipment used by the Hong Kong Polytechnic University has a fixed contact type acquisition device (Zhang
Original palm vein images.
Krizhevsky
Structure of AlexNet.
Structure of VGG16 and VGG19.
In 2014, an image recognition method based on a VGG network was developed by the University of Oxford. VGG was the first to use small
In most machine learning, deep learning, and data mining tasks, we assume the training and inference use data following the same distribution and coming from the same feature space. However, in practical applications, this assumption is difficult to establish and often encounters some problems:
The number of marked training samples is limited. For example, when dealing with the classification of the target domain, there is a lack of sufficient training samples. Additionally, there is a large number of training samples in the B (source) domain related to the A field, but the B field and the A field are in different feature spaces or their samples obey different distributions. The data distribution will change. The data distribution can be related to time, location or other dynamic factors. With a change in dynamic factors, the data distribution will change. Thus, the previously collected data are outdated, and so it must be recollected and the model must be rebuilt.
At this time, knowledge transfer, i.e. transferring the knowledge in the B field to the A field and improving the classification in the A field, whose data do not require much time to mark, is a good choice. Migration learning, a new method of transfer learning, was proposed to solve this problem (Pan and Yang,
Transfer learning is a research issue in machine learning. The model will be trained by transferring the trained model parameters to new problems. Considering that most of the data or tasks are relevant, we can learn the model parameters (this can also be understood as model learned knowledge) through transfer learning in a way that optimizes and improves the speed of the learning efficiency of the model and does not require learning from zero, as with most networks.
We use pretrained AlexNet and VGG to move from the 1 000 classes of the original model to our palm vein categories.
Table
Architecture network of AlexNet.
Table
Fully connected layer and output layer adjustment.
In Table
In this section we discuss the results of the CNN palm vein recognition experiments and analysis. We constructed AlexNet and VGG using MATLAB and the deep learning convolutional neural network framework. The main process includes the following: image acquisition, model training, parameter adjustment, and result identification.
We established two set of datasets of palm veins. The first dataset contains twenty images from each of 50 individuals, for a total of 1 000 experimental images. The second dataset contains twenty images from each of 63 individuals, for a total of 1 260 experimental images. The size of a captured image is
Equipment of the first dataset.
Equipment | Device configuration |
System | Windows 10 64-bit |
CPU | Intel(R)Core(TM)i7-6500 CPU @ 2.50 GHz |
RAM | 8 GB |
GPU | GTX940MX |
Framework | MATLAB |
Equipment of the second dataset.
Equipment | Device configuration |
System | Windows 10 64-bit |
CPU | Intel(R)Core(TM)i7-8700 CPU @ 3.20 GHz |
RAM | 32 GB |
GPU | GTX1080 |
Framework | MATLAB |
The experimental environment for our equipment of the first dataset is shown in Table
Training process of AlexNet.
The training of CNN is to optimize various parameters. Different parameters, as well as the network structure and usage, will affect the effectiveness and training speed of CNN recognition. The AlexNet neural network model is used as an example to illustrate the training process of palm vein recognition, whose network structure is shown in Fig.
The first convolution layer (
Training iterations results of the first dataset.
Models | Iterations | Result |
AlexNet | 800 | 96% |
VGG-16 | 800 | 97.5% |
VGG-19 | 1000 | 98.5% |
In this paper, the experimental process includes the following: randomly selecting the training image and validation sets, adjusting the batch value according to the performance of the hardware, and adjusting the learning rate. The hyperparameter adjustment on the model will affect the experimental results. In this process, the final numbers of iterations were 800, 800, and 1 000 in AlexNet, VGG-16 and VGG-19, respectively, as shown in Table
The number of iterations in Table
Accuracy, iterations and loss for iterations of AlexNet of the first dataset.
Accuracy, iterations and loss for iterations of VGG-16 of the first dataset.
Accuracy, iterations and loss for iterations of VGG-19 of the first dataset.
The comparison with other authors’ research results are shown in Table
Test results for various algorithms of the first dataset.
Algorithm | Result |
LBP+PNN (Fronitasari and Gunawan, |
98% |
SURF (Badrinath |
96.7% |
Gabor (Lee, |
96.37% |
LBPROT (Pratiwi |
96% |
AlexNet (ours) | 96% |
VGG-16 (ours) | 97.5% |
VGG-19 (ours) | 98.5% |
Based on the performance of biometrics-based verification systems (Seshikala
Performance metrics of the first dataset.
Performance metrics | |||
No. | Models | FAR (%) | FRR (%) |
1 | AlexNet | 0 | 0.77 |
2 | VGG-16 | 0 | 0.65 |
3 | VGG-19 | 0 | 0.6 |
From Table
In this study, we divided the image dataset into two sets, namely a training set and a validation set. For the training and validation of the three models AlexNet, VGG-16, and VGG-19, we set the number of iterations to be 1000 times.
The palm vein images in the database were collected by using a near-infrared light camera. Since these were contactless shots, each image had a different resolution and needed to be preprocessed. To obtain the best result of image contrast enhancement while avoiding noise amplification, we used a new technique for contactless palm vein recognition. To preprocess the input image, our design uses contrast limited adaptive histogram equalization (CLAHE) to enhance the image quality and thus the feature distinguishability. Figure
(a) Original image; (b) Enhanced image.
Training processes and accuracy of the models of the second dataset.
The training processes of the three models are shown in Fig.
Training iteration results of the second dataset.
Models | Iterations | Result |
AlexNet | 1000 | 99.35% |
VGG-16 | 1000 | 99.45% |
VGG-19 | 1000 | 99.5% |
The smaller the loss function, the better the stability. The loss function is the core part of the empirical risk function and an important part of the structural risk function. The common loss function includes the following factors:
Hinge Loss: Mainly used in support vector machines (SVM). Cross Entropy Loss; Softmax Loss: Used in Logistic Regression and Softmax Classification. Square Loss: Mainly in Ordinary Least Squares (OLS). Exponential Loss: Mainly used in the Adaboost integrated learning algorithm. Other loss (such as 0–1 loss, absolute value loss)
Here we mainly use Cross Entropy Loss, or Softmax Loss. Figure
Training process and loss of the models of the second dataset.
Training results of four random users.
In addition, we also tested the models with two different graphics cards, and the training time and accuracy differed accordingly. The results are shown in Table
Performance comparison among models when using different graphics cards.
Models | GTX940 | GTX1080 | ||
Training time | Accuracy | Training time | Accuracy | |
AlexNet | 695 s | 96% | 83 s | 97% |
VGG-16 | 757 s | 97.5% | 515 s | 98% |
VGG-19 | 832 s | 98.5% | 560 s | 98.5% |
The statistics of our performance metrics for the three models of the second dataset are shown in Table
Performance metrics of the second dataset.
Performance metrics | |||
No. | Models | FAR (%) | FRR (%) |
1 | AlexNet | 0 | 0.5 |
2 | VGG-16 | 0 | 0.35 |
3 | VGG-19 | 0 | 0.3 |
The performance comparison between our approaches with others are shown in Table
Performance comparison among similar methods.
Methods | FAR (%) | FRR (%) | Accuracy (%) |
Gabor (Cancian |
0.32 | 1.75 | 99.6 |
LeNet (Zheng |
0 | 0.86 | 99.1 |
AlexNet (Zheng and Han, |
0 | 0.7 | 99.2 |
AlexNet (ours) | 0 | 0.5 | 99.35 |
VGG-16 (ours) | 0 | 0.35 | 99.45 |
VGG-19 (ours) | 0 | 0.3 | 99.5 |
This paper proposed a palm vein recognition method based on a convolutional neural network. Through AlexNet and VGG, the entire palm vein recognition process is simplified, and image processing, feature extraction, and matching methods are no longer as complicated. The unique advantages of CNNs compared to other methods include greater convenience and excellent accuracy rates; there are also good results for the performance metrics. In our first palm vein dataset, 1000 images from 50 individuals were adopted for testing, and an FRR of 0.6% was achieved. The above three models provide a new research method for palm vein recognition and prove the advantages of deep learning in the image field. The second dataset, including 1260 images from 63 individuals, was adopted for testing, and an FRR of 0.3% was achieved. We tried to preprocess the palm vein image, use the CLAHE method to increase the contrast of the image, highlight its feature, and improve the accuracy of the three types of CNN to 99%. When using different graphics cards, the training time will have an enormous impact, and the accuracy will be slightly affected.
In the future, this study can be improved in two directions. First, the palm vein images used in our method need to be clear and uncontaminated files, and the effect on shadows is not very good. In the future, we can further improve the pre-processing method of CLAHE proposed in this paper to remove some noise and shadows, which should improve the recognition rate. The second one is towards the finger vein recognition so that it can be applied to the security system control of handheld mobile devices.