<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">INFORMATICA</journal-id>
<journal-title-group><journal-title>Informatica</journal-title></journal-title-group>
<issn pub-type="epub">1822-8844</issn>
<issn pub-type="ppub">0868-4952</issn>
<issn-l>0868-4952</issn-l>
<publisher>
<publisher-name>Vilnius University</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">INFOR443</article-id>
<article-id pub-id-type="doi">10.15388/21-INFOR443</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Research Article</subject></subj-group></article-categories>
<title-group>
<article-title>PFA-GAN: Pose Face Augmentation Based on Generative Adversarial Network</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Zeno</surname><given-names>Bassel</given-names></name><email xlink:href="basilzeno@gmail.com">basilzeno@gmail.com</email><xref ref-type="aff" rid="j_infor443_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Kalinovskiy</surname><given-names>Ilya</given-names></name><xref ref-type="aff" rid="j_infor443_aff_002">2</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Matveev</surname><given-names>Yuri</given-names></name><xref ref-type="aff" rid="j_infor443_aff_001">1</xref><xref ref-type="aff" rid="j_infor443_aff_002">2</xref>
</contrib>
<aff id="j_infor443_aff_001"><label>1</label><institution>ITMO University</institution>, Kronverkskiy Prospekt 49, St. Petersburg 197101, <country>Russia</country></aff>
<aff id="j_infor443_aff_002"><label>2</label><institution>STC-innovations Ltd.</institution>, Gelsingforsskaya Street 3, Building 11D, St. Petersburg 194044, <country>Russia</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2021</year></pub-date><pub-date pub-type="epub"><day>29</day><month>1</month><year>2021</year></pub-date>
<volume>32</volume><issue>2</issue><fpage>425</fpage><lpage>440</lpage>
<history>
<date date-type="received"><month>6</month><year>2020</year></date>
<date date-type="accepted"><month>1</month><year>2021</year></date>
</history>
<permissions><copyright-statement>© 2021 Vilnius University</copyright-statement><copyright-year>2021</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>In this work, we propose a novel framework based on Generative Adversarial Networks for pose face augmentation (PFA-GAN). It enables a controlled pose synthesis of a new face image from a source face given a driving one while preserving the identity of the source face. We introduce a method for training the framework in a fully self-supervised mode using a large-scale dataset of unconstrained face images. Besides, some augmentation strategies are presented to expand the training set. The face verification experimental results demonstrate the effectiveness of the presented augmentation strategies as all augmented datasets outperform the baseline.</p>
</abstract>
<kwd-group>
<label>Key words</label>
<kwd>generative adversarial networks</kwd>
<kwd>face verification</kwd>
<kwd>visual data augmentation</kwd>
</kwd-group>
<funding-group>
<funding-statement>This work was financially supported by the Government of the Russian Federation (Grant 08-08).</funding-statement>
</funding-group>
</article-meta>
</front>
<body>
<sec id="j_infor443_s_001">
<label>1</label>
<title>Introduction</title>
<p>A person’s face plays a key role in the identification of individual members of our highly social species due to delicate differences that make every human face unique. These variations of a face pattern inform us also about characteristics such as age, gender, and race. Over the last decade, many remarkable works based on Deep Neural Networks have demonstrated unprecedented performance on several computer vision tasks, such as facial landmark detection, face identification, face verification, face alignment, emotion classification, etc. In addition, they showed that achieving a good generalization in unconstrained conditions strongly relies on training them on large and complex datasets. Well-annotated large-scale dataset can be both expensive and time-consuming to acquire. Hiring people to manually collect images and annotate them is not efficient at all since this manual process is widely recognized as error-prone. Furthermore, the existing face image datasets suffer from the problem of insufficient data amount for each person and the unbalanced pose data distribution between the classes. In addition, there is a lack of variations comparing to the real samples in the world. To cope with insufficient facial training data, visual data augmentation provides an effective alternative. It is a technique that enables practitioners to significantly increase the diversity of data available for training models, by transforming collected real face samples. The traditional visual data augmentation methods alter the entire face image by transferring image pixel values to new positions or by shifting pixel colours to new values. For instance, zooming in and out, rotating or reflecting the original image, translating, applying distortion and cropping. These generic methods have some limitations. (1) They do not scale well the number of variations of facial appearances, such as make-up, lighting, and skin color. (2) Creating high-level content such as rotating head while preserving the identity is a challenging problem (Zeno <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_025">2019b</xref>) and it is still under study. The large discrepancy of head poses in the real world is a big challenge in face detection, identification (Farahani and Mohseni, <xref ref-type="bibr" rid="j_infor443_ref_007">2019</xref>), and verification (Ribarić <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_018">2008</xref>), due to lighting variations and self-occlusion. Therefore, many methods were proposed to generate face images with new poses. Pose synthesis methods can be classified into a 2D geometry-based approach, a 3D geometry-based approach, and a learning-based approach. The 2D and 3D based methods appeared earlier than learning-based approaches, have obvious advantage in that they need a small amount of training data. The 2D-based methods rely on building a PCA model for a face shape to control only yaw rotations (Feng <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_008">2017</xref>), while the 3D based methods synthesize face images with new variations of poses using a 3D morphable face model (Crispell <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_005">2017</xref>; Blanz and Vetter, <xref ref-type="bibr" rid="j_infor443_ref_002">1999</xref>; Zhu <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_029">2016</xref>; Guo <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_010">2017</xref>). In recent years, many learning-based methods have been proposed for face rotation, where most of them rely on a generative adversarial network (Tran <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_022">2017</xref>; Tian <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_021">2018</xref>; Cao <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_003">2018a</xref>; Antoniou <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_001">2018</xref>; Yin <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_023">2017</xref>; Huang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_013">2017</xref>; Zeno <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_024">2019a</xref>). For example, the methods DRGAN (Tran <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_022">2017</xref>), CRGAN (Tian <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_021">2018</xref>) and LB-GAN (Cao <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_003">2018a</xref>) were proposed to rotate a face image around the yaw axis only. While DRGAN synthesizes a new pose even for extreme profiles (<inline-formula id="j_infor443_ineq_001"><alternatives>
<mml:math><mml:mo>±</mml:mo><mml:msup><mml:mrow><mml:mn>90</mml:mn></mml:mrow><mml:mrow><mml:mo>∘</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$\pm {90^{\circ }}$]]></tex-math></alternatives></inline-formula>), CRGAN learns “complete” representations to rotate unseen faces, and LB-GAN frontalizes a face image before generating the target pose. The frontalization is a particular case of pose transformation often used to increase the accuracy of face recognition systems by rotating faces to the front view, such as in FF-GAN (Yin <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_023">2017</xref>) and TP-GAN (Huang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_013">2017</xref>) works. Recently, Zeno <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_024">2019a</xref>) proposed IP-GAN framework to generate a face image of any specific identity with an arbitrary target pose by explicitly disentangling identity and pose representation from a face image.</p>
<p>However, we argue that there are several drawbacks to the listed methods. The reposing method proposed in Crispell <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_005">2017</xref>) produces many distortions in face structure and does not fix the background. And the 3D based approach (Blanz and Vetter, <xref ref-type="bibr" rid="j_infor443_ref_002">1999</xref>) fails with large poses and it requires some additional steps to generate the hidden regions (e.g. the teeth). The augmentation methods in Zhu <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_029">2016</xref>), Guo <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_010">2017</xref>) reduce the realism of the generated images. On the other side, the GAN learning-based methods (Tran <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_022">2017</xref>; Tian <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_021">2018</xref>; Cao <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_003">2018a</xref>; Antoniou <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_001">2018</xref>; Yin <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_023">2017</xref>; Huang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_013">2017</xref>) obtain impressive results, but they need additional information such as conditioning labels (e.g. indicating a head pose, 3DMM parameters). More specifically, Yin <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_023">2017</xref>), Huang <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_013">2017</xref>) need frontal face annotations, while (Tran <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_022">2017</xref>; Tian <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_021">2018</xref>; Cao <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_003">2018a</xref>; Antoniou <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_001">2018</xref>) need profile labels, while the IP-GAN (Zeno <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_024">2019a</xref>) framework does not require any pose annotations. But despite this, it failed to learn disentangled representation of pose and identity on an unconstrained dataset of face images. Besides, the learning scheme of IP-GAN is very complex, which makes it difficult for it to converge.</p>
<p>To address the issues above, in this work we focus on pose face transformation for visual data augmentation using the Generative Adversarial Networks. We propose a novel GAN framework that enables a controlled synthesis of new face images from a single source face image given a driving face image while preserving the subject identity. The framework is trained in self-supervised settings using pairs of source and driving face images. To demonstrate the performance of our model, some face verification experiments are conducted using our proposed pose augmentation strategies. The framework architecture is described in Section <xref rid="j_infor443_s_006">3</xref>, and the self-supervised training method in Section <xref rid="j_infor443_s_010">4</xref>.</p>
<p>To conclude, our contributions are: 
<list>
<list-item id="j_infor443_li_001">
<label>•</label>
<p>We present the Pose Face Augmentation GAN (PFA-GAN) that can transform a pose of a source face image using another face image while preserving the identity of the source image, as well as the pose and the expression of the driving face image. The proposed framework consists of an identity encoder network, a pose encoder network, a generator, and a discriminator.</p>
</list-item>
<list-item id="j_infor443_li_002">
<label>•</label>
<p>We introduce a novel method for training the network in fully self-supervised settings using a large-scale dataset of unconstrained face images.</p>
</list-item>
<list-item id="j_infor443_li_003">
<label>•</label>
<p>We introduce some augmentation strategies that demonstrate how a baseline training set can be augmented to increase the pose variations.</p>
</list-item>
<list-item id="j_infor443_li_004">
<label>•</label>
<p>We conduct some comparative experiments on face verification. Our results clearly show that the augmented datasets based on our method outperform the baseline methods.</p>
</list-item>
</list>
</p>
</sec>
<sec id="j_infor443_s_002">
<label>2</label>
<title>Related Work</title>
<sec id="j_infor443_s_003">
<label>2.1</label>
<title>2D/3D Model-Based</title>
<p>Feng <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_008">2017</xref>) proposed a 2D-based method to generate profile virtual faces with out-of-plane pose variations. They built a PCA-based shape model to control only the yaw rotations since the pose varies with the same rotation direction of the original shape, i.e. left or right. Meanwhile, many approaches employed 3D face models for face pose translation (Crispell <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_005">2017</xref>; Blanz and Vetter, <xref ref-type="bibr" rid="j_infor443_ref_002">1999</xref>; Zhu <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_029">2016</xref>; Guo <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_010">2017</xref>). Crispell <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_005">2017</xref>) use a 3D face shape estimation method, followed by a rendering pipeline for arbitrarily reposing of faces and altering the light conditions. Although the results of the face re-lighting method are good, reposing of the face produces many distortions in its structure. In addition, the background is not fixed since it is rotated along with the direction of the face rotation. Blanz and Vetter (<xref ref-type="bibr" rid="j_infor443_ref_002">1999</xref>) proposed a method to estimate a 3D morphable face model by transforming the shape and the texture of a face image into a vector space representation. Then, faces with new poses and expressions can be modelled by modifying the estimated parameters to match the target 3D face model. This method is good at generating faces with small poses, but it failed with large poses due to the serious loss of the facial texture. Furthermore, some additional steps are required at synthesizing facial expressions such as smiling to generate the hidden regions (e.g. the teeth). Zhu <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_029">2016</xref>) introduced the 3D Dense Face Alignment (3DDFA) algorithm to solve face alignment in large poses. 3DDFA has also been used to profile faces, which means synthesizing the face appearances in profile view from medium pose samples by predicting the depth of face image. However, this augmentation method reduces the realism of the generated images. Guo <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_010">2017</xref>) proposed a face inverse rendering method (3DFaceNet) to recover geometry and lighting from a single image. With that, they can generate new face images with different attributes. Nevertheless, their inverse rendering procedure has limitations, and it may lead to inaccurate fitting for face images (e.g. estimating the coarse face geometry and pose parameters from a face image).</p>
</sec>
<sec id="j_infor443_s_004">
<label>2.2</label>
<title>GANs-Based</title>
<p>Recently, generative adversarial network model learning (Tran <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_022">2017</xref>; Tian <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_021">2018</xref>; Cao <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_003">2018a</xref>; Antoniou <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_001">2018</xref>; Yin <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_023">2017</xref>; Huang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_013">2017</xref>; Zeno <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_024">2019a</xref>) demonstrated an outstanding ability to synthesize face images with new poses. Tran <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_022">2017</xref>) introduced Disentangled Representation Learning-Generative Adversarial Network (DR-GAN), where the model takes a face image of any pose as input and outputs a synthetic face, frontal or rotated with the target pose, even for extreme profiles (<inline-formula id="j_infor443_ineq_002"><alternatives>
<mml:math><mml:mo>±</mml:mo><mml:msup><mml:mrow><mml:mn>90</mml:mn></mml:mrow><mml:mrow><mml:mo>∘</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$\pm {90^{\circ }}$]]></tex-math></alternatives></inline-formula>). The discriminator in DR-GAN is trained also to predict the identity and the pose of the generated face. Tian <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_021">2018</xref>) proposed the Complete Representation GAN-based method (CR-GAN) following a single-pathway design, and a two-pathway learning scheme to learn the “complete” representations. Cao <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_003">2018a</xref>) have introduced Load Balanced Generative Adversarial Networks (LB-GAN) to rotate the yaw angle of an input face image to the target angle from a specified set of learned poses. The LB-GAN consists of two modules: a normalizer, which first frontalizes the face images, and an editor, which rotates the frontal face after that. Antoniou <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_001">2018</xref>) introduced Data Augmentation Generative Adversarial Network (DAGAN) based on conditional GAN (cGAN). DAGAN captures the cross-class transformations since it takes any data item and generates other points of the equivalent class. A particular case of a pose transformation is the face frontalization. It is often used to increase the accuracy of face recognition systems by rotating faces to the frontal view, which is more convenient for a recognition model. Many methods have been introduced to frontalize profile faces, such as GAN-based methods (Yin <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_023">2017</xref>; Huang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_013">2017</xref>). The FF-GAN method (Yin <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_023">2017</xref>) relies on any 3D knowledge for geometry shape estimation, while the TP-GAN method (Huang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_013">2017</xref>) infers it through data-driven learning. TP-GAN is a Two-Pathway Generative Adversarial Network for synthesizing photorealistic frontal views from profile images by simultaneously perceiving global structures and local details. FF-GAN is a Face Frontalization Generative Adversarial Network framework, which incorporates elements from both deep 3DMM and face recognition CNNs to achieve high-quality and identity-preserving frontalization with less training data. Both TP-GAN and FF-GAN methods obtained impressive results on face frontalization, but they need explicit front-view annotations.</p>
</sec>
<sec id="j_infor443_s_005">
<label>2.3</label>
<title>IP-GAN</title>
<p>Zeno <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_024">2019a</xref>) proposed a framework for Learning Identity and Pose Disentanglement in Generative Adversarial Networks (IP-GAN). To generate a face image of any specific identity with an arbitrary target pose, IP-GAN incorporates the pose information in the synthesis process. Different from the recent work (Yin <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_023">2017</xref>) that uses a 3D morphable face simulator to generate pose information and the works (Tran <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_022">2017</xref>; Tian <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_021">2018</xref>; Huang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_013">2017</xref>) that encode pose annotation in a one-hot vector, IP-GAN can learn such information by explicitly disentangling identity and pose representation from a face image in fully self-supervised settings. The overall architecture of the IP-GAN framework is depicted in Zeno <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_024">2019a</xref>), and consists of five parts: 1) the identity encoder network <inline-formula id="j_infor443_ineq_003"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">I</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{I}}$]]></tex-math></alternatives></inline-formula> to extract the identity latent code; 2) the head pose encoder network <inline-formula id="j_infor443_ineq_004"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{P}}$]]></tex-math></alternatives></inline-formula> to extract the pose latent code; 3) the generative network <italic>G</italic> to produce the final output image using the combined identity latent code and the extracted pose latent code; 4) the identity classification network <italic>C</italic> to preserve the identity by measuring the posterior probability of the subject identities; 5) the discriminative network <italic>D</italic> to distinguish between real and generated images. To train these networks Zeno <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_024">2019a</xref>) proposed a learning method in order to learn complete representations in fully self-supervised settings. The learning method consists of two learning pathways, generation, and transformation. While the generation pathway focuses on mapping the entire latent spaces of encoders to high-quality images, the transformation pathway focuses on the synthesis of new face images with the target poses. This framework has many drawbacks and when it was trained on an unconstrained dataset of face images, it failed to learn disentangled representation of pose and identity. Besides, the learning scheme is very complex that makes it difficult for the GAN to converge.</p>
</sec>
</sec>
<sec id="j_infor443_s_006">
<label>3</label>
<title>The Proposed Framework</title>
<p>Inspired by the IP-GAN model, we present in this section a novel framework (PFA-GAN) for pose face augmentation based on a generative adversarial network.</p>
<sec id="j_infor443_s_007">
<label>3.1</label>
<title>PFA-GAN</title>
<p>To simplify the proposed architecture in Zeno <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_024">2019a</xref>), and according to the specific goal of the PFA-GAN in generating face images with new poses, we remove the Classification Network <italic>C</italic>, as there is no need to add a new task of face recognition to PFA-GAN, and preserving the subject identity in the generated face image is guaranteed by the use of the content loss function. To reduce the complexity of the learning method, we propose removing the generation pathway and focusing on the work of the transformation pathway, which consists of two sub-paths: reconstruction and transformation. The task of the head pose encoder network <inline-formula id="j_infor443_ineq_005"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{P}}$]]></tex-math></alternatives></inline-formula> is to learn a pose representation as it has to isolate the pose information from the other information in a face image such as age, gender, skin color, and identity. Isolating pose information in unconstrained images is a challenging task. To facilitate it, we reduce the amount of input information for the network <inline-formula id="j_infor443_ineq_006"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{P}}$]]></tex-math></alternatives></inline-formula> by replacing a face image with an image of its landmarks.</p>
</sec>
<sec id="j_infor443_s_008">
<label>3.2</label>
<title>Model Description</title>
<p>Let <inline-formula id="j_infor443_ineq_007"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">∈</mml:mo><mml:mi mathvariant="double-struck">X</mml:mi></mml:math>
<tex-math><![CDATA[${x_{S}}\in \mathbb{X}$]]></tex-math></alternatives></inline-formula> be a source face image of a certain subject identity, and <inline-formula id="j_infor443_ineq_008"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">∈</mml:mo><mml:mi mathvariant="double-struck">X</mml:mi></mml:math>
<tex-math><![CDATA[${x_{D}}\in \mathbb{X}$]]></tex-math></alternatives></inline-formula> be a driver face image to extract the target pose features. Our goal is to generate a new face image <inline-formula id="j_infor443_ineq_009"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\acute{x}_{S}}$]]></tex-math></alternatives></inline-formula> of the subject of <inline-formula id="j_infor443_ineq_010"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${x_{S}}$]]></tex-math></alternatives></inline-formula> with the extracted face pose of <inline-formula id="j_infor443_ineq_011"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${x_{D}}$]]></tex-math></alternatives></inline-formula>. To achieve this goal, we assume that each image <inline-formula id="j_infor443_ineq_012"><alternatives>
<mml:math><mml:mi mathvariant="italic">x</mml:mi><mml:mo stretchy="false">∈</mml:mo><mml:mi mathvariant="double-struck">X</mml:mi></mml:math>
<tex-math><![CDATA[$x\in \mathbb{X}$]]></tex-math></alternatives></inline-formula> is generated from an identity embedding vector <inline-formula id="j_infor443_ineq_013"><alternatives>
<mml:math><mml:mi mathvariant="italic">a</mml:mi><mml:mo stretchy="false">∈</mml:mo><mml:mi mathvariant="double-struck">A</mml:mi></mml:math>
<tex-math><![CDATA[$a\in \mathbb{A}$]]></tex-math></alternatives></inline-formula> and a pose embedding vector <inline-formula id="j_infor443_ineq_014"><alternatives>
<mml:math><mml:mi mathvariant="italic">p</mml:mi><mml:mo stretchy="false">∈</mml:mo><mml:mi mathvariant="double-struck">P</mml:mi></mml:math>
<tex-math><![CDATA[$p\in \mathbb{P}$]]></tex-math></alternatives></inline-formula>. In other words, <inline-formula id="j_infor443_ineq_015"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${x_{S}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_infor443_ineq_016"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${x_{D}}$]]></tex-math></alternatives></inline-formula> are synthesized by the pair <inline-formula id="j_infor443_ineq_017"><alternatives>
<mml:math><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$({a_{S}},{p_{S}})$]]></tex-math></alternatives></inline-formula> and the pair (<inline-formula id="j_infor443_ineq_018"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${a_{D}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_infor443_ineq_019"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{D}}$]]></tex-math></alternatives></inline-formula>), respectively. As a result, the new face image <inline-formula id="j_infor443_ineq_020"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\acute{x}_{S}}$]]></tex-math></alternatives></inline-formula> is generated by the pair <inline-formula id="j_infor443_ineq_021"><alternatives>
<mml:math><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$({a_{S}},{p_{D}})$]]></tex-math></alternatives></inline-formula>.</p>
</sec>
<sec id="j_infor443_s_009">
<label>3.3</label>
<title>Framework Architecture</title>
<fig id="j_infor443_fig_001">
<label>Fig. 1</label>
<caption>
<p>The proposed framework architecture, pose encoder network, identity encoder network, generator, discriminator. Learning scheme from left to right: the reconstruction sub-path, the transformation sub-path.</p>
</caption>
<graphic xlink:href="infor443_g001.jpg"/>
</fig>
<p>The proposed framework consists of the following four components, see Fig. <xref rid="j_infor443_fig_001">1</xref>:</p>
<list>
<list-item id="j_infor443_li_005">
<label>•</label>
<p>The pose encoder network <inline-formula id="j_infor443_ineq_022"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[${E_{P}}({l_{D}};{\Theta _{P}})$]]></tex-math></alternatives></inline-formula> receives a three-channel image of driving landmarks <inline-formula id="j_infor443_ineq_023"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">∈</mml:mo><mml:mi mathvariant="double-struck">L</mml:mi></mml:math>
<tex-math><![CDATA[${l_{D}}\in \mathbb{L}$]]></tex-math></alternatives></inline-formula> and maps it into a pose embedding vector <inline-formula id="j_infor443_ineq_024"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{D}}$]]></tex-math></alternatives></inline-formula>. Here <inline-formula id="j_infor443_ineq_025"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\Theta _{P}}$]]></tex-math></alternatives></inline-formula> denotes the network parameters that are learned in a way that allows the vector <inline-formula id="j_infor443_ineq_026"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{D}}$]]></tex-math></alternatives></inline-formula> to only represent the pose information of the driving image. We denote with <inline-formula id="j_infor443_ineq_027"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{S}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_infor443_ineq_028"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{D}}$]]></tex-math></alternatives></inline-formula> the pose embedding vectors for the landmark images <inline-formula id="j_infor443_ineq_029"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${l_{S}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_infor443_ineq_030"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${l_{D}}$]]></tex-math></alternatives></inline-formula>, respectively.</p>
</list-item>
<list-item id="j_infor443_li_006">
<label>•</label>
<p>The identity encoder network <inline-formula id="j_infor443_ineq_031"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[${E_{A}}({x_{S}};{\Theta _{A}})$]]></tex-math></alternatives></inline-formula> takes a source face image <inline-formula id="j_infor443_ineq_032"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${x_{S}}$]]></tex-math></alternatives></inline-formula> to extract an <italic>N</italic>-dimensional vector <inline-formula id="j_infor443_ineq_033"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${a_{S}}$]]></tex-math></alternatives></inline-formula> that contains the source-specific information, such as a person’s identity and skin tone information. Here <inline-formula id="j_infor443_ineq_034"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\Theta _{A}}$]]></tex-math></alternatives></inline-formula> denotes the network parameters that are learned in our two sub-paths learning method.</p>
</list-item>
<list-item id="j_infor443_li_007">
<label>•</label>
<p>The generator <inline-formula id="j_infor443_ineq_035"><alternatives>
<mml:math><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">G</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$G({a_{S}},{p_{D}};{\Theta _{G}})$]]></tex-math></alternatives></inline-formula> takes the pose embedding vector <inline-formula id="j_infor443_ineq_036"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{D}}$]]></tex-math></alternatives></inline-formula> and the identity embedding vector <inline-formula id="j_infor443_ineq_037"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${a_{S}}$]]></tex-math></alternatives></inline-formula> which is extracted from the source face image and outputs a synthesized target face image <inline-formula id="j_infor443_ineq_038"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\acute{x}_{S}}$]]></tex-math></alternatives></inline-formula>. During the two sub-paths learning method, the network parameters <inline-formula id="j_infor443_ineq_039"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">G</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\Theta _{G}}$]]></tex-math></alternatives></inline-formula> are trained directly.</p>
</list-item>
<list-item id="j_infor443_li_008">
<label>•</label>
<p>The discriminator <inline-formula id="j_infor443_ineq_040"><alternatives>
<mml:math><mml:mi mathvariant="italic">D</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi><mml:mi mathvariant="italic">i</mml:mi><mml:mi mathvariant="italic">s</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$D({x_{D}},{\acute{x}_{S}};{\Theta _{Dis}})$]]></tex-math></alternatives></inline-formula> takes the driving face image and the generated one <inline-formula id="j_infor443_ineq_041"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\acute{x}_{S}}$]]></tex-math></alternatives></inline-formula>, then predicts whether the image is real or not. Here <inline-formula id="j_infor443_ineq_042"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi><mml:mi mathvariant="italic">i</mml:mi><mml:mi mathvariant="italic">s</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\Theta _{Dis}}$]]></tex-math></alternatives></inline-formula> denotes the network parameters of the discriminator.</p>
</list-item>
</list>
</sec>
</sec>
<sec id="j_infor443_s_010">
<label>4</label>
<title>The Proposed Learning Algorithm</title>
<p>In this section, we present our method for learning a pose face augmentation model (PFAGAN). To achieve this goal the learning scheme is divided into two sub-paths, reconstruction and transformation, see Fig. <xref rid="j_infor443_fig_001">1</xref>. While the reconstruction sub-path aims to learn to generate a face image with the target pose, the learning goal of the transformation sub-path is to synthesize the target face image while preserving the identity of the subject. At each iteration, only one of these two sub-paths is randomly selected with a probability of 0.5.</p>
<sec id="j_infor443_s_011">
<label>4.1</label>
<title>Reconstruction Sub-Path</title>
<p>The reconstruction pathway trains the generator <italic>G</italic>, the pose encoder network <inline-formula id="j_infor443_ineq_043"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{P}}$]]></tex-math></alternatives></inline-formula> and the discriminator <italic>D</italic>. Here the identity encoder network <inline-formula id="j_infor443_ineq_044"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{A}}$]]></tex-math></alternatives></inline-formula> is not involved in the learning process since the network <inline-formula id="j_infor443_ineq_045"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{P}}$]]></tex-math></alternatives></inline-formula> learns the pose representations of the driving face images and the generator <italic>G</italic> tries to synthesize a face image using the driving pose embedding vector, while the identity embedding vector contains random values. Hence, given a random noise vector from noise uniform distribution <inline-formula id="j_infor443_ineq_046"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">∈</mml:mo><mml:mi mathvariant="italic">Z</mml:mi></mml:math>
<tex-math><![CDATA[${a^{z}}\in Z$]]></tex-math></alternatives></inline-formula> and the pose embedding vector of the driving landmark image <inline-formula id="j_infor443_ineq_047"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[${p_{D}}={E_{P}}({l_{D}})$]]></tex-math></alternatives></inline-formula>, we concatenate them in the latent space <inline-formula id="j_infor443_ineq_048"><alternatives>
<mml:math><mml:mi mathvariant="italic">z</mml:mi><mml:mo>=</mml:mo><mml:mo fence="true" stretchy="false">[</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo fence="true" stretchy="false">]</mml:mo></mml:math>
<tex-math><![CDATA[$z=[{a^{z}},{p_{D}}]$]]></tex-math></alternatives></inline-formula> and feed them to the generator which aims to generate a realistic face image <inline-formula id="j_infor443_ineq_049"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[${x^{z}}=G({a^{z}},{p_{D}})$]]></tex-math></alternatives></inline-formula> under the driving pose latent vector <inline-formula id="j_infor443_ineq_050"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{D}}$]]></tex-math></alternatives></inline-formula>. Similar to the original GAN work (Huang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_012">2007</xref>), the generative network <italic>G</italic> and the discriminative network <italic>D</italic> compete with each other in a two-player min-max game. While the discriminator <italic>D</italic> tries to distinguish real images from the output of <italic>G</italic>, the generator <italic>G</italic> tries to fool the network <italic>D</italic>. Specifically, <italic>D</italic> is trained to differentiate the fake image <inline-formula id="j_infor443_ineq_051"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${x^{z}}$]]></tex-math></alternatives></inline-formula> from the real one <inline-formula id="j_infor443_ineq_052"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${x_{D}}$]]></tex-math></alternatives></inline-formula>. This <italic>D</italic> minimizes: 
<disp-formula id="j_infor443_eq_001">
<label>(1)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">adv</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">recon</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">l</mml:mi><mml:mo stretchy="false">∼</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="script">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">∼</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="script">P</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo fence="true" maxsize="1.19em" minsize="1.19em">[</mml:mo><mml:mi mathvariant="italic">D</mml:mi><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo fence="true" maxsize="1.19em" minsize="1.19em">]</mml:mo><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">x</mml:mi><mml:mo stretchy="false">∼</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="script">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo fence="true" maxsize="1.19em" minsize="1.19em">[</mml:mo><mml:mi mathvariant="italic">D</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo fence="true" maxsize="1.19em" minsize="1.19em">]</mml:mo><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {L_{D-{\mathit{adv}_{\mathit{recon}}}}}={E_{l\sim {\mathcal{P}_{l}},{a^{z}}\sim {\mathcal{P}_{{a^{z}}}}}}\big[D\big(G\big({a^{z}},{E_{P}}({l_{D}})\big)\big)\big]-{E_{x\sim {\mathcal{P}_{x}}}}\big[D({x_{D}})\big],\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_infor443_ineq_053"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="script">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\mathcal{P}_{l}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_infor443_ineq_054"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="script">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\mathcal{P}_{x}}$]]></tex-math></alternatives></inline-formula> are the real data distribution and <inline-formula id="j_infor443_ineq_055"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="script">P</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\mathcal{P}_{{a^{z}}}}$]]></tex-math></alternatives></inline-formula> is the noise uniform distribution. <italic>G</italic> tries to fool <italic>D</italic>; it maximizes the following adversarial loss function: 
<disp-formula id="j_infor443_eq_002">
<label>(2)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">G</mml:mi><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">adv</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">recon</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">l</mml:mi><mml:mo stretchy="false">∼</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="script">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">∼</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="script">P</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo fence="true" maxsize="1.19em" minsize="1.19em">[</mml:mo><mml:mi mathvariant="italic">D</mml:mi><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo fence="true" maxsize="1.19em" minsize="1.19em">]</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {L_{G-{\mathit{adv}_{\mathit{recon}}}}}={E_{l\sim {\mathcal{P}_{l}},{a^{z}}\sim {\mathcal{P}_{{a^{z}}}}}}\big[D\big(G\big({a^{z}},{E_{P}}({l_{D}})\big)\big)\big].\]]]></tex-math></alternatives>
</disp-formula> 
The pose encoder network helps the generator <italic>G</italic> to generate a high-quality image with pose of <inline-formula id="j_infor443_ineq_056"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${x_{D}}$]]></tex-math></alternatives></inline-formula>, and to achieve that we reconstruct both the source and the driving face images and make use of a content-consistency loss function <inline-formula id="j_infor443_ineq_057"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">cnt</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${L_{\mathit{cnt}}}$]]></tex-math></alternatives></inline-formula>, which measures differences in high-level content between the ground truth images <inline-formula id="j_infor443_ineq_058"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${x_{S}},{x_{D}}$]]></tex-math></alternatives></inline-formula> and the reconstructions <inline-formula id="j_infor443_ineq_059"><alternatives>
<mml:math><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$G({E_{A}}({x_{S}}),{E_{P}}({l_{S}})),G({E_{A}}({x_{D}}),{E_{P}}({l_{D}}))$]]></tex-math></alternatives></inline-formula> using the perceptual similarity measure (Johnson <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_014">2016</xref>). Our content loss function uses the pre-trained VGG19 (Simonyan and Zisserman, <xref ref-type="bibr" rid="j_infor443_ref_020">2015</xref>) and VGGFace (Parkhi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_017">2015</xref>) networks since we extract the feature maps <inline-formula id="j_infor443_ineq_060"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="normal">Φ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">x</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[${\Phi ^{k}}(x)$]]></tex-math></alternatives></inline-formula> from several layers in these networks. Later, the loss is calculated as a weighted sum of <inline-formula id="j_infor443_ineq_061"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi>ℓ</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\ell _{1}}$]]></tex-math></alternatives></inline-formula>-norm losses between the features of these networks: <disp-formula-group id="j_infor443_dg_001">
<disp-formula id="j_infor443_eq_003">
<label>(3)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right left" columnspacing="0pt"><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">n</mml:mi><mml:mi mathvariant="italic">t</mml:mi><mml:msub><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">recon</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo>
<mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mstyle displaystyle="true"><mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">layers</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mrow><mml:mo maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">Φ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo>−</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">Φ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[\begin{aligned}{}& {L_{cnt{S_{\mathit{recon}}}}}={\sum \limits_{k=1}^{\mathit{layers}}}{\big\| {\Phi ^{k}}\big(G\big({E_{A}}({x_{S}}),{E_{P}}({l_{S}})\big)\big)-{\Phi ^{k}}({x_{S}})\big\| _{1}},\end{aligned}\]]]></tex-math></alternatives>
</disp-formula>
<disp-formula id="j_infor443_eq_004">
<label>(4)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right left" columnspacing="0pt"><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">n</mml:mi><mml:mi mathvariant="italic">t</mml:mi><mml:msub><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">recon</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo>
<mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mstyle displaystyle="true"><mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">layers</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mrow><mml:mo maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">Φ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo>−</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">Φ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[\begin{aligned}{}& {L_{cnt{D_{\mathit{recon}}}}}={\sum \limits_{k=1}^{\mathit{layers}}}{\big\| {\Phi ^{k}}\big(G\big({E_{A}}({x_{D}}),{E_{P}}({l_{D}})\big)\big)-{\Phi ^{k}}({x_{D}})\big\| _{1}}.\end{aligned}\]]]></tex-math></alternatives>
</disp-formula>
</disp-formula-group> We have added a regularization term that keeps the weights small, making the model simpler and avoiding overfitting: 
<disp-formula id="j_infor443_eq_005">
<label>(5)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">regular</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">recon</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow></mml:mfrac></mml:mstyle>
<mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mstyle displaystyle="true"><mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mrow><mml:mo maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {L_{{\mathit{regular}_{\mathit{recon}}}}}=\frac{1}{n}{\sum \limits_{i=1}^{n}}{\big\| {\Theta _{P}^{i}}\big\| _{2}},\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_infor443_ineq_062"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\Theta _{P}}$]]></tex-math></alternatives></inline-formula> denotes the parameters of the pose encoder network.</p>
</sec>
<sec id="j_infor443_s_012">
<label>4.2</label>
<title>Transformation Sub-Path</title>
<p>The transformation sub-path trains the networks <inline-formula id="j_infor443_ineq_063"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{A}}$]]></tex-math></alternatives></inline-formula>, <italic>G</italic>, and <italic>D</italic>, but keeps the pose encoder network <inline-formula id="j_infor443_ineq_064"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{P}}$]]></tex-math></alternatives></inline-formula> fixed. The output of the <inline-formula id="j_infor443_ineq_065"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{A}}$]]></tex-math></alternatives></inline-formula> network should ensure preserving the identity of the source face image. We introduce a cross reconstruction task to make <inline-formula id="j_infor443_ineq_066"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{P}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_infor443_ineq_067"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{A}}$]]></tex-math></alternatives></inline-formula> disentangle the pose from the identity information. More specifically, we sample a real image pair <inline-formula id="j_infor443_ineq_068"><alternatives>
<mml:math><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo mathvariant="normal">,</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$({x_{S}^{i}},{x_{D}^{i}})$]]></tex-math></alternatives></inline-formula> that shares the same identity but different appearance, poses, and facial expressions. The goal is to transform <inline-formula id="j_infor443_ineq_069"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${x_{S}^{i}}$]]></tex-math></alternatives></inline-formula> to a new face image <inline-formula id="j_infor443_ineq_070"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${\acute{x}_{S}^{i}}$]]></tex-math></alternatives></inline-formula> where its pose matches the pose of the driving face image <inline-formula id="j_infor443_ineq_071"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${x_{D}^{i}}$]]></tex-math></alternatives></inline-formula>. To achieve this, <inline-formula id="j_infor443_ineq_072"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{A}}$]]></tex-math></alternatives></inline-formula> receives <inline-formula id="j_infor443_ineq_073"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${x_{S}^{i}}$]]></tex-math></alternatives></inline-formula> as an input and outputs a pose-invariant face representation <inline-formula id="j_infor443_ineq_074"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${a_{S}^{i}}$]]></tex-math></alternatives></inline-formula>, while <inline-formula id="j_infor443_ineq_075"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{P}}$]]></tex-math></alternatives></inline-formula> takes <inline-formula id="j_infor443_ineq_076"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${l_{D}^{i}}$]]></tex-math></alternatives></inline-formula> landmarks image of <inline-formula id="j_infor443_ineq_077"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${x_{D}^{i}}$]]></tex-math></alternatives></inline-formula> as an input and outputs a pose representation vector <inline-formula id="j_infor443_ineq_078"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${p_{D}^{i}}$]]></tex-math></alternatives></inline-formula>. We concatenate the embedding vectors and feed the combined vector, <inline-formula id="j_infor443_ineq_079"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo mathvariant="normal">,</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mo fence="true" stretchy="false">[</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo fence="true" stretchy="false">]</mml:mo></mml:math>
<tex-math><![CDATA[${z^{i}}=({a_{S}^{i}},{p_{D}^{i}})=[({E_{A}}({x_{S}^{i}}),{E_{P}}({l_{D}^{i}})]$]]></tex-math></alternatives></inline-formula> into the network G. The generator G should produce <inline-formula id="j_infor443_ineq_080"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${\acute{x}_{S}^{i}}$]]></tex-math></alternatives></inline-formula>, the transformation of <inline-formula id="j_infor443_ineq_081"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${x_{S}^{i}}$]]></tex-math></alternatives></inline-formula>. <italic>D</italic> is trained to distinguish the fake image <inline-formula id="j_infor443_ineq_082"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${\acute{x}_{S}^{i}}$]]></tex-math></alternatives></inline-formula> from the real one <inline-formula id="j_infor443_ineq_083"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${x_{D}^{i}}$]]></tex-math></alternatives></inline-formula>. Thus, <italic>D</italic> minimizes: 
<disp-formula id="j_infor443_eq_006">
<label>(6)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">adv</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">trans</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">l</mml:mi><mml:mo stretchy="false">∼</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="script">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">x</mml:mi><mml:mo stretchy="false">∼</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="script">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo fence="true" maxsize="1.19em" minsize="1.19em">[</mml:mo><mml:mi mathvariant="italic">D</mml:mi><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo fence="true" maxsize="1.19em" minsize="1.19em">]</mml:mo><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">x</mml:mi><mml:mo stretchy="false">∼</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo fence="true" maxsize="1.19em" minsize="1.19em">[</mml:mo><mml:mi mathvariant="italic">D</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo fence="true" maxsize="1.19em" minsize="1.19em">]</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {L_{D-{\mathit{adv}_{\mathit{trans}}}}}={E_{l\sim {\mathcal{P}_{l}},x\sim {\mathcal{P}_{x}}}}\big[D\big(G\big({E_{A}}({x_{S}}),{E_{P}}({l_{D}})\big)\big)\big]-{E_{x\sim {P_{x}}}}\big[D({x_{D}})\big].\]]]></tex-math></alternatives>
</disp-formula> 
The generator tries to fool <italic>D</italic> network, it maximizes: 
<disp-formula id="j_infor443_eq_007">
<label>(7)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">G</mml:mi><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">adv</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">trans</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">l</mml:mi><mml:mo stretchy="false">∼</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="script">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">x</mml:mi><mml:mo stretchy="false">∼</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="script">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo fence="true" maxsize="1.19em" minsize="1.19em">[</mml:mo><mml:mi mathvariant="italic">D</mml:mi><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo fence="true" maxsize="1.19em" minsize="1.19em">]</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {L_{G-{\mathit{adv}_{\mathit{trans}}}}}={E_{l\sim {\mathcal{P}_{l}},x\sim {\mathcal{P}_{x}}}}\big[D\big(G\big({E_{A}}({x_{S}}),{E_{P}}({l_{D}})\big)\big)\big].\]]]></tex-math></alternatives>
</disp-formula> 
To preserve the subject identity in the generated face image, we follow multiple feature-level warping methods instead of image-level warping. So similar to the reconstruction sub-path, the content-consistency loss is used in the transformation sub-path, since several feature maps <inline-formula id="j_infor443_ineq_084"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="normal">Φ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">x</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[${\Phi ^{k}}(x)$]]></tex-math></alternatives></inline-formula> are extracted from the pre-trained VGG19 and VGGFace networks: <disp-formula-group id="j_infor443_dg_002">
<disp-formula id="j_infor443_eq_008">
<label>(8)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right left" columnspacing="0pt"><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">n</mml:mi><mml:msub><mml:mrow><mml:mi mathvariant="italic">t</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">trans</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo>
<mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mstyle displaystyle="true"><mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">layers</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mrow><mml:mo maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">Φ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo>−</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">Φ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[\begin{aligned}{}& {L_{cn{t_{\mathit{trans}}}}}={\sum \limits_{k=1}^{\mathit{layers}}}{\big\| {\Phi ^{k}}\big(G\big({E_{A}}({x_{S}}),{E_{P}}({l_{D}})\big)\big)-{\Phi ^{k}}({x_{D}})\big\| _{1}},\end{aligned}\]]]></tex-math></alternatives>
</disp-formula>
<disp-formula id="j_infor443_eq_009">
<label>(9)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right left" columnspacing="0pt"><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">n</mml:mi><mml:mi mathvariant="italic">t</mml:mi><mml:msub><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">trans</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo>
<mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mstyle displaystyle="true"><mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">layers</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mrow><mml:mo maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">Φ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo mathvariant="normal" fence="true" maxsize="1.19em" minsize="1.19em">)</mml:mo><mml:mo>−</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">Φ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">k</mml:mi></mml:mrow></mml:msup><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[\begin{aligned}{}& {L_{cnt{S_{\mathit{trans}}}}}={\sum \limits_{k=1}^{\mathit{layers}}}{\big\| {\Phi ^{k}}\big(G\big({E_{A}}({x_{S}}),{E_{P}}({l_{S}})\big)\big)-{\Phi ^{k}}({x_{S}})\big\| _{1}}.\end{aligned}\]]]></tex-math></alternatives>
</disp-formula>
</disp-formula-group> To avoid overfitting problem, we add the following regularization loss function to keep the weights small in the identity encoder network: 
<disp-formula id="j_infor443_eq_010">
<label>(10)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">regular</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">trans</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow></mml:mfrac></mml:mstyle>
<mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mstyle displaystyle="true"><mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mrow><mml:mo maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {L_{{\mathit{regular}_{\mathit{trans}}}}}=\frac{1}{n}{\sum \limits_{i=1}^{n}}{\big\| {\Theta _{A}^{i}}\big\| _{2}}.\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_infor443_ineq_085"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\Theta _{A}}$]]></tex-math></alternatives></inline-formula> denotes the parameters of the identity encoder network.</p>
</sec>
<sec id="j_infor443_s_013">
<label>4.3</label>
<title>The Overall Loss Function</title>
<p>The final loss function is a weighted sum of all losses defined in Eqs. (<xref rid="j_infor443_eq_001">1</xref>)–(<xref rid="j_infor443_eq_010">10</xref>): <disp-formula-group id="j_infor443_dg_003">
<disp-formula id="j_infor443_eq_011">
<label>(11)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right left" columnspacing="0pt"><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:msub><mml:mrow><mml:mi mathvariant="italic">Loss</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">recon</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">adv</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">adv</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">recon</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">G</mml:mi><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">adv</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">recon</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">cnt</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">n</mml:mi><mml:mi mathvariant="italic">t</mml:mi><mml:msub><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">recon</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">n</mml:mi><mml:mi mathvariant="italic">t</mml:mi><mml:msub><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">recon</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:mphantom><mml:msub><mml:mrow><mml:mi mathvariant="italic">Loss</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">recon</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo></mml:mphantom><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">reg</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">regular</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">recon</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[\begin{aligned}{}& {\mathit{Loss}_{\mathit{recon}}}={\lambda _{\mathit{adv}}}({L_{D-{\mathit{adv}_{\mathit{recon}}}}}+{L_{G-{\mathit{adv}_{\mathit{recon}}}}})+{\lambda _{\mathit{cnt}}}({L_{cnt{S_{\mathit{recon}}}}}+{L_{cnt{D_{\mathit{recon}}}}})\\ {} & \phantom{{\mathit{Loss}_{\mathit{recon}}}=}+{\lambda _{\mathit{reg}}}{L_{{\mathit{regular}_{\mathit{recon}}}}},\end{aligned}\]]]></tex-math></alternatives>
</disp-formula>
<disp-formula id="j_infor443_eq_012">
<label>(12)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right left" columnspacing="0pt"><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:msub><mml:mrow><mml:mi mathvariant="italic">Loss</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">trans</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">adv</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">adv</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">trans</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">G</mml:mi><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">adv</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">trans</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">cnt</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">n</mml:mi><mml:msub><mml:mrow><mml:mi mathvariant="italic">t</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">trans</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">n</mml:mi><mml:mi mathvariant="italic">t</mml:mi><mml:msub><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">trans</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:mphantom><mml:msub><mml:mrow><mml:mi mathvariant="italic">Loss</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">trans</mml:mi></mml:mrow></mml:msub></mml:mphantom><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">reg</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi mathvariant="italic">L</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">regular</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">trans</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[\begin{aligned}{}& {\mathit{Loss}_{\mathit{trans}}}={\lambda _{\mathit{adv}}}({L_{D-{\mathit{adv}_{\mathit{trans}}}}}+{L_{G-{\mathit{adv}_{\mathit{trans}}}}})+{\lambda _{\mathit{cnt}}}({L_{cn{t_{\mathit{trans}}}}}+{L_{cnt{S_{\mathit{trans}}}}})\\ {} & \phantom{{\mathit{Loss}_{\mathit{trans}}}}+{\lambda _{\mathit{reg}}}{L_{{\mathit{regular}_{\mathit{trans}}}}},\end{aligned}\]]]></tex-math></alternatives>
</disp-formula>
</disp-formula-group> where <inline-formula id="j_infor443_ineq_086"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">adv</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\lambda _{\mathit{adv}}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_infor443_ineq_087"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">cnt</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\lambda _{\mathit{cnt}}}$]]></tex-math></alternatives></inline-formula>, and <inline-formula id="j_infor443_ineq_088"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">reg</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\lambda _{\mathit{reg}}}$]]></tex-math></alternatives></inline-formula> are weights that control the importance of loss terms. The overall loss will be: 
<disp-formula id="j_infor443_eq_013">
<label>(13)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mi mathvariant="italic">Loss</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">overall</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">r</mml:mi><mml:msub><mml:mrow><mml:mi mathvariant="italic">Loss</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">recon</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi mathvariant="italic">r</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">Loss</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">trans</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {\mathit{Loss}_{\mathit{overall}}}=r{\mathit{Loss}_{\mathit{recon}}}+(1-r){\mathit{Loss}_{\mathit{trans}}}\]]]></tex-math></alternatives>
</disp-formula> 
since <inline-formula id="j_infor443_ineq_089"><alternatives>
<mml:math><mml:mi mathvariant="italic">r</mml:mi><mml:mo stretchy="false">∈</mml:mo><mml:mo fence="true" stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mn>1</mml:mn><mml:mo fence="true" stretchy="false">}</mml:mo></mml:math>
<tex-math><![CDATA[$r\in \{0,1\}$]]></tex-math></alternatives></inline-formula> is a random binary value that is updated before each learning iteration.</p>
</sec>
</sec>
<sec id="j_infor443_s_014">
<label>5</label>
<title>Experiments</title>
<sec id="j_infor443_s_015">
<label>5.1</label>
<title>Dataset</title>
<p>The PFA-GAN is trained on a subset of the MS-Celeb-1M (Guo <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_009">2016</xref>) dataset, which contains about 5M images of 80K celebrities with unbalanced viewpoint distributions and with a very large appearance variation (e.g. due to gender, race, age, or even makeups). We use 36K face images belonging to 528 different identities, while no pose or identity annotations are employed in the training process. For each face image, we first detect the facial region using the multi-task cascaded CNN detector (MTCNN) (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_026">2016</xref>) and then align and resize the detected face to <inline-formula id="j_infor443_ineq_090"><alternatives>
<mml:math><mml:mn>128</mml:mn><mml:mo>×</mml:mo><mml:mn>128</mml:mn></mml:math>
<tex-math><![CDATA[$128\times 128$]]></tex-math></alternatives></inline-formula> pixels.</p>
</sec>
<sec id="j_infor443_s_016">
<label>5.2</label>
<title>Implementation Details</title>
<p>We use the same implementations of the generator <italic>G</italic> and the discriminator <italic>D</italic> in IP-GAN that were introduced by Tian <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor443_ref_021">2018</xref>). For the pose encoder network <inline-formula id="j_infor443_ineq_091"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{P}}$]]></tex-math></alternatives></inline-formula> and the identity encoder network <inline-formula id="j_infor443_ineq_092"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{A}}$]]></tex-math></alternatives></inline-formula>, we use ResNet50 (He <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_011">2016</xref>) network architecture, where the skip connections in the network allow to learn the desired representation (e.g. the identity or the pose) since the performance of the upper layers will be at least as good as the lower layers. The following parameters were used: <inline-formula id="j_infor443_ineq_093"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">cnt</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math>
<tex-math><![CDATA[${\lambda _{\mathit{cnt}}}=1$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_infor443_ineq_094"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">adv</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.005</mml:mn></mml:math>
<tex-math><![CDATA[${\lambda _{\mathit{adv}}}=0.005$]]></tex-math></alternatives></inline-formula>, and <inline-formula id="j_infor443_ineq_095"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">reg</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.001</mml:mn></mml:math>
<tex-math><![CDATA[${\lambda _{\mathit{reg}}}=0.001$]]></tex-math></alternatives></inline-formula>. The values of the random noise <inline-formula id="j_infor443_ineq_096"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">z</mml:mi></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${a^{z}}$]]></tex-math></alternatives></inline-formula> are in the range <inline-formula id="j_infor443_ineq_097"><alternatives>
<mml:math><mml:mo fence="true" stretchy="false">[</mml:mo><mml:mo>−</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo fence="true" stretchy="false">]</mml:mo></mml:math>
<tex-math><![CDATA[$[-1,+1]$]]></tex-math></alternatives></inline-formula>. To implement the model, a set of Pytorch deep learning tools was used. The batch size was set in 16 and one Nvidia graphic card (GTX 1080 Ti) was used. The Adam optimizer (Kingma and Ba, <xref ref-type="bibr" rid="j_infor443_ref_015">2015</xref>) was used and configured with the learning rate of 0.0005, and the momentum of <inline-formula id="j_infor443_ineq_098"><alternatives>
<mml:math><mml:mo fence="true" stretchy="false">[</mml:mo><mml:mn>0</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mn>0.9</mml:mn><mml:mo fence="true" stretchy="false">]</mml:mo></mml:math>
<tex-math><![CDATA[$[0,0.9]$]]></tex-math></alternatives></inline-formula>.</p>
</sec>
<sec id="j_infor443_s_017">
<label>5.3</label>
<title>Interpolation of Pose Latent Space</title>
<p>In this section, we demonstrate that a pose of the generated face images can be gradually changed with the latent vector. We call this phenomenon face pose morphing. We have tested our model on the selected subset from the MS-Celeb-1M (Guo <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_009">2016</xref>) dataset. We first choose a pair of images <inline-formula id="j_infor443_ineq_099"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${x_{S}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_infor443_ineq_100"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${x_{D}}$]]></tex-math></alternatives></inline-formula>, and then extract the pose latent vectors <inline-formula id="j_infor443_ineq_101"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{S}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_infor443_ineq_102"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{D}}$]]></tex-math></alternatives></inline-formula> using the pose encoder network <inline-formula id="j_infor443_ineq_103"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${E_{P}}$]]></tex-math></alternatives></inline-formula>. Then, we obtain a series of pose embedding vectors <inline-formula id="j_infor443_ineq_104"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\tilde{p}_{i}}$]]></tex-math></alternatives></inline-formula> by linear interpolation, i.e.: 
<disp-formula id="j_infor443_eq_014">
<label>(14)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">α</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">α</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {\tilde{p}_{i}}={\alpha _{i}}{p_{S}}+(1-{\alpha _{i}}){p_{D}},\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_infor443_ineq_105"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">α</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">∈</mml:mo><mml:mo fence="true" stretchy="false">[</mml:mo><mml:mn>0</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mn>1</mml:mn><mml:mo fence="true" stretchy="false">]</mml:mo></mml:math>
<tex-math><![CDATA[${\alpha _{i}}\in [0,1]$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_infor443_ineq_106"><alternatives>
<mml:math><mml:mi mathvariant="italic">i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">k</mml:mi></mml:math>
<tex-math><![CDATA[$i=1,\dots ,k$]]></tex-math></alternatives></inline-formula>; <italic>k</italic> is the number of interpolated images. Finally, we concatenate each interpolated pose vector <inline-formula id="j_infor443_ineq_107"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\tilde{p}_{i}}$]]></tex-math></alternatives></inline-formula> with the extracted identity embedding vector <inline-formula id="j_infor443_ineq_108"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${a_{S}}$]]></tex-math></alternatives></inline-formula> and feed the combined vector into the generator <italic>G</italic> to synthesize an interpolated face image <inline-formula id="j_infor443_ineq_109"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[${\tilde{x}_{i}}=G({a_{S}},{\tilde{p}_{i}})$]]></tex-math></alternatives></inline-formula>. Figure <xref rid="j_infor443_fig_002">2</xref> presents the results of the face pose morphing using <inline-formula id="j_infor443_ineq_110"><alternatives>
<mml:math><mml:mi mathvariant="italic">k</mml:mi><mml:mo>=</mml:mo><mml:mn>10</mml:mn></mml:math>
<tex-math><![CDATA[$k=10$]]></tex-math></alternatives></inline-formula>, since every row shows how a face pose is gradually morphing into the next one. The last column denotes the landmark image of the driving face.</p>
<fig id="j_infor443_fig_002">
<label>Fig. 2</label>
<caption>
<p>Interpolation of the pose latent space.</p>
</caption>
<graphic xlink:href="infor443_g002.jpg"/>
</fig>
</sec>
<sec id="j_infor443_s_018">
<label>5.4</label>
<title>Visual Data Augmentation Strategies</title>
<p>The traditional visual data augmentation methods alter the entire face image by transferring image pixel values to new positions or by shifting pixel colours to new values. These generic methods ignore high-level content such as moving the head or adding a smile, so in this section, we show the effectiveness of using our model as an alternative face specific augmentation method. The ability of the PFA-GAN model to perform a controlled synthesis of a face image allows enlarging the volume of data for training or testing by generating new face images with new poses. So, using our proposed model, for each image in the dataset, an unlimited number of images can be generated for the same identity subject with a great variety of face poses. Assuming the original dataset is <italic>R</italic>, pose face augmentation can be represented by the following transformation: 
<disp-formula id="j_infor443_eq_015">
<label>(15)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:mi mathvariant="italic">f</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="italic">R</mml:mi><mml:mo stretchy="false">⟶</mml:mo><mml:mi mathvariant="italic">T</mml:mi><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ f:R\longrightarrow T,\]]]></tex-math></alternatives>
</disp-formula> 
where <italic>T</italic> is the augmented dataset of <italic>R</italic>. Then the dataset is expanded as a combination of the original dataset and the augmented one: 
<disp-formula id="j_infor443_eq_016">
<label>(16)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mi mathvariant="italic">R</mml:mi><mml:mo>∪</mml:mo><mml:mi mathvariant="italic">T</mml:mi><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ \acute{R}=R\cup T.\]]]></tex-math></alternatives>
</disp-formula> 
We introduce three visual data augmentation strategies, each one extends the original training dataset with a new augmented dataset whose images have a degree of difference in the pose face from the original ones.</p>
<list>
<list-item id="j_infor443_li_009">
<label>•</label>
<p>First augmentation strategy (Aug-S1). For each image in our dataset, we choose a random driving face image, and by following the interpolation technique described in Section <xref rid="j_infor443_s_017">5.3</xref>, we choose the interpolated pose vector <inline-formula id="j_infor443_ineq_111"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\tilde{p}_{3}}$]]></tex-math></alternatives></inline-formula> which has a slight difference from the original one as a driving pose. Then we feed <inline-formula id="j_infor443_ineq_112"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\tilde{p}_{3}}$]]></tex-math></alternatives></inline-formula> along with <inline-formula id="j_infor443_ineq_113"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${a_{S}}$]]></tex-math></alternatives></inline-formula> to synthesize an augmented face image <inline-formula id="j_infor443_ineq_114"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[${\tilde{x}_{S}}=G({a_{S}},{\tilde{p}_{3}})$]]></tex-math></alternatives></inline-formula> in the augmented dataset <inline-formula id="j_infor443_ineq_115"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${T_{1}}$]]></tex-math></alternatives></inline-formula>. Therefore, the dataset of this augmentation strategy will be: 
<disp-formula id="j_infor443_eq_017">
<label>(17)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">R</mml:mi><mml:mo>∪</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {\acute{R}_{1}}=R\cup {T_{1}}.\]]]></tex-math></alternatives>
</disp-formula>
</p>
</list-item>
<list-item id="j_infor443_li_010">
<label>•</label>
<p>Second augmentation strategy (Aug-S2). Similar to the first augmentation strategy, we select the interpolated pose vector <inline-formula id="j_infor443_ineq_116"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\tilde{p}_{6}}$]]></tex-math></alternatives></inline-formula>, which differs more than the <inline-formula id="j_infor443_ineq_117"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\tilde{p}_{3}}$]]></tex-math></alternatives></inline-formula> to synthesize an augmented face image <inline-formula id="j_infor443_ineq_118"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">G</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[${\tilde{x}_{S}}=G({a_{S}},{\tilde{p}_{6}})$]]></tex-math></alternatives></inline-formula>. Consequently, the dataset of the second augmentation strategy will be: 
<disp-formula id="j_infor443_eq_018">
<label>(18)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">R</mml:mi><mml:mo>∪</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {\acute{R}_{2}}=R\cup {T_{2}}.\]]]></tex-math></alternatives>
</disp-formula>
</p>
</list-item>
<list-item id="j_infor443_li_011">
<label>•</label>
<p>Third augmentation strategy (Aug-S3). The generated images may have a large degree of variation from the original with regard to head pose and facial expressions. That’s why for each source face image in our dataset, we randomly select a driving image to extract the pose embedding vector <inline-formula id="j_infor443_ineq_119"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">r</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{r}}$]]></tex-math></alternatives></inline-formula> and feed the combined vector <inline-formula id="j_infor443_ineq_120"><alternatives>
<mml:math><mml:mo fence="true" stretchy="false">[</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">r</mml:mi></mml:mrow></mml:msub><mml:mo fence="true" stretchy="false">]</mml:mo></mml:math>
<tex-math><![CDATA[$[{a_{S}},{p_{r}}]$]]></tex-math></alternatives></inline-formula> to the generator. The dataset of this augmentation strategy will be: 
<disp-formula id="j_infor443_eq_019">
<label>(19)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">R</mml:mi><mml:mo>∪</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {\acute{R}_{3}}=R\cup {T_{3}}.\]]]></tex-math></alternatives>
</disp-formula> 
Figure <xref rid="j_infor443_fig_003">3</xref> shows examples of face images from the augmented datasets. We can note the pose variation between them.</p>
</list-item>
</list>
<fig id="j_infor443_fig_003">
<label>Fig. 3</label>
<caption>
<p>Face image examples from the original and augmented datasets. From left to right: the original dataset <italic>R</italic>, the augmented datasets <inline-formula id="j_infor443_ineq_121"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${T_{1}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_infor443_ineq_122"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${T_{2}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_infor443_ineq_123"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${T_{3}}$]]></tex-math></alternatives></inline-formula> in the second, third and fourth columns, respectively.</p>
</caption>
<graphic xlink:href="infor443_g003.jpg"/>
</fig>
</sec>
<sec id="j_infor443_s_019">
<label>5.5</label>
<title>Face Verification Task</title>
<p>In this subsection, we evaluate whether the augmented datasets will improve the performance of the face verification task or not. In general, face verification needs the following steps: training a convolution neural network classifier on a dataset, then using it as a feature extraction network to extract the embedding vectors for a pair of face images from testing datasets. Next, the extracted two vectors are sent to the distance function to calculate the similarity between them, and according to the threshold, the function judges whether it is the face of the same person or not. Two classifiers are used, <inline-formula id="j_infor443_ineq_124"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{1}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_infor443_ineq_125"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{2}}$]]></tex-math></alternatives></inline-formula>, the backbones Restnet50, Restnet101 are chosen for the <inline-formula id="j_infor443_ineq_126"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{1}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_infor443_ineq_127"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{2}}$]]></tex-math></alternatives></inline-formula>, respectively. Both classifiers use ArcFace (Deng <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_006">2019</xref>) and Focal loss function. We use different datasets for face verification, such as LFW (Huang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_012">2007</xref>), CFP-FP (Sengupta <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_019">2016</xref>), AgeDB (Moschoglou <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_016">2017</xref>), CFP-FF (Sengupta <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_019">2016</xref>) and VGGFace2-FP (Cao <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_004">2018b</xref>). Apart from the most widely used LFW dataset, we also report the performance of our augmentation model on the recent large-pose and large-age datasets (e.g. CPLFW. Zheng and Deng, <xref ref-type="bibr" rid="j_infor443_ref_027">2018</xref> and CALFW, Zheng <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor443_ref_028">2017</xref>). Table <xref rid="j_infor443_tab_001">1</xref> shows the statistics of both training and testing datasets in a verification scenario.</p>
<table-wrap id="j_infor443_tab_001">
<label>Table 1</label>
<caption>
<p>Characteristics of the training and testing datasets.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Dataset</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Number of people</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Total images</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left"><italic>R</italic></td>
<td style="vertical-align: top; text-align: left">529</td>
<td style="vertical-align: top; text-align: left">36000</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor443_ineq_128"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\acute{R}_{1}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left">529</td>
<td style="vertical-align: top; text-align: left">72000</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor443_ineq_129"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\acute{R}_{2}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left">529</td>
<td style="vertical-align: top; text-align: left">72000</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor443_ineq_130"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\acute{R}_{3}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left">529</td>
<td style="vertical-align: top; text-align: left">72000</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">LFW</td>
<td style="vertical-align: top; text-align: left">5749</td>
<td style="vertical-align: top; text-align: left">13233</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">CFP-FP</td>
<td style="vertical-align: top; text-align: left">500</td>
<td style="vertical-align: top; text-align: left">2000</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">CFP-FF</td>
<td style="vertical-align: top; text-align: left">500</td>
<td style="vertical-align: top; text-align: left">5000</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">AgeDB</td>
<td style="vertical-align: top; text-align: left">570</td>
<td style="vertical-align: top; text-align: left">16488</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">CALFW</td>
<td style="vertical-align: top; text-align: left">4025</td>
<td style="vertical-align: top; text-align: left">12174</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">CPLFW</td>
<td style="vertical-align: top; text-align: left">3884</td>
<td style="vertical-align: top; text-align: left">11652</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">VGGFace2-FP</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">500</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">11000</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="j_infor443_tab_002">
<label>Table 2</label>
<caption>
<p>Verification accuracy after training the classifier <inline-formula id="j_infor443_ineq_131"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{1}}$]]></tex-math></alternatives></inline-formula>.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Classifier</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Training dataset</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">LFW</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">CFP-FP</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">CFP-FF</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">AgeDB</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">CAL-FW</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">CPLFW-FP</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">VGG-Face2</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor443_ineq_132"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{1}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left"><italic>R</italic></td>
<td style="vertical-align: top; text-align: left">89.77</td>
<td style="vertical-align: top; text-align: left">78.39</td>
<td style="vertical-align: top; text-align: left">88.73</td>
<td style="vertical-align: top; text-align: left">69.23</td>
<td style="vertical-align: top; text-align: left">72.32</td>
<td style="vertical-align: top; text-align: left">70.15</td>
<td style="vertical-align: top; text-align: left">80.74</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor443_ineq_133"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{1}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor443_ineq_134"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\acute{R}_{1}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left">91.00</td>
<td style="vertical-align: top; text-align: left">80.04</td>
<td style="vertical-align: top; text-align: left">89.57</td>
<td style="vertical-align: top; text-align: left">70.17</td>
<td style="vertical-align: top; text-align: left">72.25</td>
<td style="vertical-align: top; text-align: left">70.72</td>
<td style="vertical-align: top; text-align: left">81.08</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor443_ineq_135"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{1}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor443_ineq_136"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\acute{R}_{2}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left">90.88</td>
<td style="vertical-align: top; text-align: left">80.34</td>
<td style="vertical-align: top; text-align: left"><bold>89.81</bold></td>
<td style="vertical-align: top; text-align: left">70.65</td>
<td style="vertical-align: top; text-align: left">71.80</td>
<td style="vertical-align: top; text-align: left"><bold>70.78</bold></td>
<td style="vertical-align: top; text-align: left"><bold>81.36</bold></td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><inline-formula id="j_infor443_ineq_137"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{1}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><inline-formula id="j_infor443_ineq_138"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\acute{R}_{3}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>91.53</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>81.23</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">89.70</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>71.20</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>72.43</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">70.28</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">81.16</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="j_infor443_tab_003">
<label>Table 3</label>
<caption>
<p>Verification accuracy after training the classifier <inline-formula id="j_infor443_ineq_139"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{2}}$]]></tex-math></alternatives></inline-formula>.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Classifier</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Training dataset</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">LFW</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">CFP-FP</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">CFP-FF</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">AgeDB</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">CAL-FW</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">CPLFW-FP</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">VGG-Face2</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor443_ineq_140"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{2}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left"><italic>R</italic></td>
<td style="vertical-align: top; text-align: left">89.38</td>
<td style="vertical-align: top; text-align: left">77.97</td>
<td style="vertical-align: top; text-align: left">88.39</td>
<td style="vertical-align: top; text-align: left">68.30</td>
<td style="vertical-align: top; text-align: left">70.50</td>
<td style="vertical-align: top; text-align: left">69.90</td>
<td style="vertical-align: top; text-align: left">80.06</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor443_ineq_141"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{2}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor443_ineq_142"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\acute{R}_{1}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left">90.90</td>
<td style="vertical-align: top; text-align: left">80.13</td>
<td style="vertical-align: top; text-align: left">89.67</td>
<td style="vertical-align: top; text-align: left">70.38</td>
<td style="vertical-align: top; text-align: left">72.03</td>
<td style="vertical-align: top; text-align: left">70.90</td>
<td style="vertical-align: top; text-align: left">80.44</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor443_ineq_143"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{2}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor443_ineq_144"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\acute{R}_{2}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left">91.23</td>
<td style="vertical-align: top; text-align: left">81.17</td>
<td style="vertical-align: top; text-align: left">89.73</td>
<td style="vertical-align: top; text-align: left">69.23</td>
<td style="vertical-align: top; text-align: left">72.18</td>
<td style="vertical-align: top; text-align: left">71.42</td>
<td style="vertical-align: top; text-align: left">81.56</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><inline-formula id="j_infor443_ineq_145"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${C_{2}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><inline-formula id="j_infor443_ineq_146"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mo>´</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\acute{R}_{3}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>91.68</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>81.70</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>89.77</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>70.93</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>72.67</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>71.58</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>81.64</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We feed the augmented datasets to the classifiers <inline-formula id="j_infor443_ineq_147"><alternatives>
<mml:math><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">C</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$({C_{1}},{C_{2}})$]]></tex-math></alternatives></inline-formula> for training, then we use the learned model to extract embedding vectors for each image in the testing datasets. Therefore, the verification accuracies are calculated as shown in Table <xref rid="j_infor443_tab_002">2</xref> and Table <xref rid="j_infor443_tab_003">3</xref>, and it is clear, that the verification accuracy is higher than it on the dataset without augmentation. For instance, the verification accuracy on AgeDB increased from using the CNN model (RestNet50) that trained on the augmented dataset (Aug-D3). As a comparison between the proposed augmentation strategies, the difference between augmented datasets <inline-formula id="j_infor443_ineq_148"><alternatives>
<mml:math><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$({T_{1}},{T_{2}},{T_{3}})$]]></tex-math></alternatives></inline-formula> is that the pose face in each of them is transformed from the original dataset baseline with a different degree of change, from small, as in the case of the <inline-formula id="j_infor443_ineq_149"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${T_{1}}$]]></tex-math></alternatives></inline-formula>, to randomness, as in the case of the <inline-formula id="j_infor443_ineq_150"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${T_{3}}$]]></tex-math></alternatives></inline-formula>. Consequently, by comparing the tables’ results, it can be noted that a small change in pose transformation improves the results of face verification, but when the change is random as in the augmentation strategy (Aug-D3), the results of verification using the more deeper feature extractor Resnet101 are the best. Since on the publicly available dataset CFP-FP, the increase in verification accuracy has been achieved up to 4.5%.</p>
</sec>
</sec>
<sec id="j_infor443_s_020">
<label>6</label>
<title>Conclusion</title>
<p>In this paper, we proposed a self-supervised framework PFA-GAN based on Generative Adversarial Networks to control the pose of a given face image using another face image, where the identity of the source image is preserved in the generated one. This framework makes no assumptions about the pose of the source images since the proposed training method allows us to train the overall networks in fully self-supervised settings using a large-scale unconstrained face images dataset. Finally, we use the trained model as a tool for visual data augmentation. Our PFA-GAN framework demonstrates the ability to synthesize photorealistic and identity-preserving faces with arbitrary poses, which improve face recognition tasks. The face verification experimental results demonstrate the effectiveness of the proposed framework for pose face augmentation as all augmented datasets outperform the baseline. Furthermore, to the best of our knowledge, we are the first to train such a model using a large-scale unconstrained dataset of face images. One exciting avenue for future work is to improve the network architecture by utilizing operations such as adaptive instance normalization (AdaIN) and to train our framework on other datasets larger than ours.</p>
</sec>
</body>
<back>
<ref-list id="j_infor443_reflist_001">
<title>References</title>
<ref id="j_infor443_ref_001">
<mixed-citation publication-type="other"><string-name><surname>Antoniou</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Storkey</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Edwards</surname>, <given-names>H.</given-names></string-name> (2018). <italic>Data Augmentation Generative Adversarial Networks. Iclr.</italic></mixed-citation>
</ref>
<ref id="j_infor443_ref_002">
<mixed-citation publication-type="chapter"><string-name><surname>Blanz</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Vetter</surname>, <given-names>T.</given-names></string-name> (<year>1999</year>). <chapter-title>A morphable model for the synthesis of 3D faces</chapter-title>. In: <source>Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH, 1999</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_003">
<mixed-citation publication-type="other"><string-name><surname>Cao</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Hu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Yu</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>He</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Sun</surname>, <given-names>Z.</given-names></string-name> (2018a). Load balanced GANs for multi-view face image synthesis. <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/abs/1802.07447">abs/1802.07447</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_004">
<mixed-citation publication-type="chapter"><string-name><surname>Cao</surname>, <given-names>Q.</given-names></string-name>, <string-name><surname>Shen</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Xie</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Parkhi</surname>, <given-names>O.M.</given-names></string-name>, <string-name><surname>Zisserman</surname>, <given-names>A.</given-names></string-name> (<year>2018</year>b). <chapter-title>VGGFace2: a dataset for recognising faces across pose and age</chapter-title>. In: <source>Proceedings – 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_005">
<mixed-citation publication-type="other"><string-name><surname>Crispell</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Biris</surname>, <given-names>O.</given-names></string-name>, <string-name><surname>Crosswhite</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Byrne</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Mundy</surname>, <given-names>J.L.</given-names></string-name> (2017). Dataset augmentation for pose and lighting invariant face recognition. <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/arXiv:1704.04326 [cs.CV]">arXiv:1704.04326 [cs.CV]</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_006">
<mixed-citation publication-type="chapter"><string-name><surname>Deng</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Guo</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Xue</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Zafeiriou</surname>, <given-names>S.</given-names></string-name> (<year>2019</year>). <chapter-title>ArcFace: additive angular margin loss for deep face recognition</chapter-title>. In: <source>Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_007">
<mixed-citation publication-type="journal"><string-name><surname>Farahani</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Mohseni</surname>, <given-names>H.</given-names></string-name> (<year>2019</year>). <article-title>Multi-pose face recognition using pairwise supervised dictionary learning</article-title>. <source>Informatica</source>, <volume>30</volume>, <fpage>647</fpage>–<lpage>670</lpage>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_008">
<mixed-citation publication-type="chapter"><string-name><surname>Feng</surname>, <given-names>Z.H.</given-names></string-name>, <string-name><surname>Kittler</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Christmas</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Huber</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Wu</surname>, <given-names>X.J.</given-names></string-name> (<year>2017</year>). <chapter-title>Dynamic attention-controlled cascaded shape regression exploiting training data augmentation and fuzzy-set sample weighting</chapter-title>. In: <source>Proceedings – 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_009">
<mixed-citation publication-type="other"><string-name><surname>Guo</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Hu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>He</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Gao</surname>, <given-names>J.</given-names></string-name> (2016). MS-celeb-1M: a dataset and benchmark for large-scale face recognition. In: <italic>Lecture Notes in Computer Science</italic> (including subseries <italic>Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics</italic>).</mixed-citation>
</ref>
<ref id="j_infor443_ref_010">
<mixed-citation publication-type="other"><string-name><surname>Guo</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Cai</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Jiang</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Zheng</surname>, <given-names>J.</given-names></string-name> (2017). 3DFaceNet: real-time dense face reconstruction via synthesizing photo-realistic face images. <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/arXiv:1708.00980">arXiv:1708.00980</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_011">
<mixed-citation publication-type="chapter"><string-name><surname>He</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Ren</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Sun</surname>, <given-names>J.</given-names></string-name> (<year>2016</year>). <chapter-title>Deep residual learning for image recognition</chapter-title>. In: <source>Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_012">
<mixed-citation publication-type="other"><string-name><surname>Huang</surname>, <given-names>G.B.</given-names></string-name>, <string-name><surname>Ramesh</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Berg</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Learned-Miller</surname>, <given-names>E.</given-names></string-name> (2007). <italic>Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments</italic>. Tech. Rep. 07-49, University of Massachusetts, Amherst.</mixed-citation>
</ref>
<ref id="j_infor443_ref_013">
<mixed-citation publication-type="chapter"><string-name><surname>Huang</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>He</surname>, <given-names>R.</given-names></string-name> (<year>2017</year>). <chapter-title>Beyond face rotation: global and local perception GAN for photorealistic and identity preserving frontal view synthesis</chapter-title>. In: <source>Proceedings of the IEEE International Conference on Computer Vision</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_014">
<mixed-citation publication-type="other"><string-name><surname>Johnson</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Alahi</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Fei-Fei</surname>, <given-names>L.</given-names></string-name> (2016). Perceptual losses for real-time style transfer and super-resolution. In: <italic>Lecture Notes in Computer Science</italic> (including subseries <italic>Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics</italic>).</mixed-citation>
</ref>
<ref id="j_infor443_ref_015">
<mixed-citation publication-type="chapter"><string-name><surname>Kingma</surname>, <given-names>D.P.</given-names></string-name>, <string-name><surname>Ba</surname>, <given-names>J.L.</given-names></string-name> (<year>2015</year>). <chapter-title>Adam: a method for stochastic optimization</chapter-title>. In: <source>3rd International Conference on Learning Representations, ICLR 2015 – Conference Track Proceedings</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_016">
<mixed-citation publication-type="chapter"><string-name><surname>Moschoglou</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Papaioannou</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Sagonas</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Deng</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Kotsia</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Zafeiriou</surname>, <given-names>S.</given-names></string-name> (<year>2017</year>). <chapter-title>AgeDB: the first manually collected, in-the-wild age database</chapter-title>. In: <source>IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_017">
<mixed-citation publication-type="chapter"><string-name><surname>Parkhi</surname>, <given-names>O.M.</given-names></string-name>, <string-name><surname>Vedaldi</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Zisserman</surname>, <given-names>A.</given-names></string-name> (<year>2015</year>). <chapter-title>Deep face recognition</chapter-title>. In: <source>British Machine Vision Conference</source>, Vol. <volume>1</volume>, pp. <fpage>41.1</fpage>–<lpage>41.12</lpage>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_018">
<mixed-citation publication-type="journal"><string-name><surname>Ribarić</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Fratrić</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Kiš</surname>, <given-names>K.</given-names></string-name> (<year>2008</year>). <article-title>A novel biometric personal verification system based on the combination of palmprints and faces</article-title>. <source>Informatica</source>, <volume>19</volume>(<issue>1</issue>), <fpage>81</fpage>–<lpage>100</lpage>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_019">
<mixed-citation publication-type="chapter"><string-name><surname>Sengupta</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>J.C.</given-names></string-name>, <string-name><surname>Castillo</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Patel</surname>, <given-names>V.M.</given-names></string-name>, <string-name><surname>Chellappa</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Jacobs</surname>, <given-names>D.W.</given-names></string-name> (<year>2016</year>). <chapter-title>Frontal to profile face verification in the wild</chapter-title>. In: <source>2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_020">
<mixed-citation publication-type="chapter"><string-name><surname>Simonyan</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Zisserman</surname>, <given-names>A.</given-names></string-name> (<year>2015</year>). <chapter-title>Very deep convolutional networks for large-scale image recognition</chapter-title>. In: <source>3rd International Conference on Learning Representations, ICLR 2015 – Conference Track Proceedings</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_021">
<mixed-citation publication-type="chapter"><string-name><surname>Tian</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Peng</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Zhao</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Metaxas</surname>, <given-names>D.N.</given-names></string-name> (<year>2018</year>). <chapter-title>CR-GAN: learning complete representations for multi-view generation</chapter-title>. In: <source>IJCAI International Joint Conference on Artificial Intelligence</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_022">
<mixed-citation publication-type="chapter"><string-name><surname>Tran</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Yin</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>X.</given-names></string-name> (<year>2017</year>). <chapter-title>Disentangled representation learning GAN for pose-invariant face recognition</chapter-title>. In: <source>Proceedings – 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_023">
<mixed-citation publication-type="chapter"><string-name><surname>Yin</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Yu</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Sohn</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Chandraker</surname>, <given-names>M.</given-names></string-name> (<year>2017</year>). <chapter-title>Towards large-pose face Frontalization in the wild</chapter-title>. In: <source>Proceedings of the IEEE International Conference on Computer Vision</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_024">
<mixed-citation publication-type="other"><string-name><surname>Zeno</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Kalinovskiy</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Matveev</surname>, <given-names>Y.</given-names></string-name> (2019a). IP-GAN: learning identity and pose disentanglement in generative adversarial networks. In: <italic>Lecture Notes in Computer Science</italic> (including subseries <italic>Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics</italic>).</mixed-citation>
</ref>
<ref id="j_infor443_ref_025">
<mixed-citation publication-type="chapter"><string-name><surname>Zeno</surname>, <given-names>B.H.</given-names></string-name>, <string-name><surname>Kalinovskiy</surname>, <given-names>I.A.</given-names></string-name>, <string-name><surname>Matveev</surname>, <given-names>Y.N.</given-names></string-name> (<year>2019b</year>). <chapter-title>Identity preserving face synthesis using generative adversarial networks</chapter-title>. In: <source>ACM International Conference Proceeding Series</source>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_026">
<mixed-citation publication-type="journal"><string-name><surname>Zhang</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Qiao</surname>, <given-names>Y.</given-names></string-name> (<year>2016</year>). <article-title>Joint face detection and alignment using multitask cascaded convolutional networks</article-title>. <source>IEEE Signal Processing Letters</source>, <volume>23</volume>(<issue>10</issue>), <fpage>1499</fpage>–<lpage>1503</lpage>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_027">
<mixed-citation publication-type="other"><string-name><surname>Zheng</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Deng</surname>, <given-names>W.</given-names></string-name> (2018). Cross-pose LFW: a database for studying cross-pose face recognition in unconstrained environments.</mixed-citation>
</ref>
<ref id="j_infor443_ref_028">
<mixed-citation publication-type="other"><string-name><surname>Zheng</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Deng</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Hu</surname>, <given-names>J.</given-names></string-name> (2017). Cross-age LFW: a database for studying cross-age face recognition in unconstrained environments. ArXiv: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/abs/1708.08197">abs/1708.08197</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor443_ref_029">
<mixed-citation publication-type="chapter"><string-name><surname>Zhu</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Lei</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Shi</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>S.Z.</given-names></string-name> (<year>2016</year>). <chapter-title>Face alignment across large poses: a 3D solution</chapter-title>. In: <source>Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
