<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">INFORMATICA</journal-id>
<journal-title-group><journal-title>Informatica</journal-title></journal-title-group>
<issn pub-type="epub">1822-8844</issn>
<issn pub-type="ppub">0868-4952</issn>
<issn-l>0868-4952</issn-l>
<publisher>
<publisher-name>Vilnius University</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">INFO1236</article-id>
<article-id pub-id-type="doi">10.15388/Informatica.2019.222</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Research Article</subject></subj-group></article-categories>
<title-group>
<article-title>Improving Statistical Machine Translation Quality Using Differential Evolution</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Dugonik</surname><given-names>Jani</given-names></name><email xlink:href="jani.dugonik@um.si">jani.dugonik@um.si</email><xref ref-type="aff" rid="j_info1236_aff_001"/><xref ref-type="corresp" rid="cor1">∗</xref><bio>
<p><bold>J. Dugonik</bold> received his BSc and MSc in computer science from the University of Maribor, Maribor, Slovenia, in 2010 and 2013. He is currently a teaching assistant at the Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia. He has worked in the Laboratory for Computer Architecture and Programming Languages, University of Maribor, since 2011. From 2017 he is working in the Laboratory for Real-Time Systems. His research interests include evolutionary computing, optimization, natural language processing and deep learning.</p></bio>
</contrib>
<contrib contrib-type="author">
<name><surname>Bošković</surname><given-names>Borko</given-names></name><email xlink:href="borko.boskovic@um.si">borko.boskovic@um.si</email><xref ref-type="aff" rid="j_info1236_aff_001"/><bio>
<p><bold>J. Brest</bold> received his BSc, MSc, and PhD in computer science from the University of Maribor, Maribor, Slovenia, in 1995, 1998, and 2000, respectively. He has been with the Laboratory for Computer Architecture and Programming Languages, University of Maribor, since 1993. He is currently a full professor and head of the Laboratory for Computer Architecture and Programming Languages.</p></bio>
</contrib>
<contrib contrib-type="author">
<name><surname>Brest</surname><given-names>Janez</given-names></name><email xlink:href="janez.brest@um.si">janez.brest@um.si</email><xref ref-type="aff" rid="j_info1236_aff_001"/><bio>
<p><bold>B. Bošković</bold> received his BSc and PhD in computer science from the University of Maribor, Maribor, Slovenia, in 2004 and 2010. He is currently an assistant professor at the Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia. He has worked in the Laboratory for Computer Architecture and Programming Languages, University of Maribor, since 2000. His research interests include evolutionary computing, optimization, natural language processing and programming languages.</p></bio>
</contrib>
<contrib contrib-type="author">
<name><surname>Sepesy Maučec</surname><given-names>Mirjam</given-names></name><email xlink:href="mirjam.sepesy@um.si">mirjam.sepesy@um.si</email><xref ref-type="aff" rid="j_info1236_aff_001"/><bio>
<p><bold>M. Sepesy Maučec</bold> received her BSc and PhD in computer science from the Faculty of Electrical Engineering and Computer Science at the University of Maribor in 1996 and 2001, respectively. She is currently an associate professor at the same faculty. Her research interests include language modelling, statistical machine translation, computational linguistics and evolutionary computing.</p></bio>
</contrib>
<aff id="j_info1236_aff_001">Faculty of Electrical Engineering and Computer Science, <institution>University of Maribor</institution>, Koroška c. 46, 2000 Maribor, <country>Slovenia</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2019</year></pub-date>
<pub-date pub-type="epub"><day>1</day><month>1</month><year>2019</year></pub-date><volume>30</volume><issue>4</issue><fpage>629</fpage><lpage>645</lpage>
<history>
<date date-type="received"><month>3</month><year>2018</year></date>
<date date-type="accepted"><month>6</month><year>2019</year></date>
</history>
<permissions><copyright-statement>© 2019 Vilnius University</copyright-statement><copyright-year>2019</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Machine Translation has become an important tool in overcoming the language barrier. The quality of translations depends on the languages and used methods. The research presented in this paper is based on well-known standard methods for Statistical Machine Translation that are advanced by a newly proposed approach for optimizing the weights of translation system components. Better weights of system components improve the translation quality. In most cases, machine translation systems translate to/from English and, in our research, English is paired with a Slavic language, Slovenian. In our experiment, we built two Statistical Machine Translation systems for the Slovenian-English language pair of the Acquis Communautaire corpus. Both systems were optimized using self-adaptive Differential Evolution and compared to the other related optimization methods. The results show improvement in the translation quality, and are comparable to the other related methods.</p>
</abstract>
<kwd-group>
<label>Key words</label>
<kwd>statistical machine translation</kwd>
<kwd>differential evolution</kwd>
<kwd>optimization</kwd>
</kwd-group>
<funding-group>
<award-group>
<funding-source xlink:href="https://doi.org/10.13039/501100004329">Slovenian Research Agency</funding-source>
<award-id>P2-0041</award-id>
<award-id>P2-0069</award-id>
</award-group>
<funding-statement>The authors acknowledge the financial support from the Slovenian Research Agency (Research Core Funding No. P2-0041 – Computer Systems, Methodologies, and Intelligent Services; P2-0069 – Advanced methods of interaction in telecommunication). </funding-statement>
</funding-group>
</article-meta>
</front>
<body>
<sec id="j_info1236_s_001">
<label>1</label>
<title>Introduction</title>
<p>A translation is a challenging and creative act. A Machine Translation (MT) (Dorr <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_017">1999</xref>; Bungum and Gambäck, <xref ref-type="bibr" rid="j_info1236_ref_008">2010</xref>) can ease the work of a translator, or even replace it as a rough translation, or as a draft which serves as an aid to the translation. Nowadays, Statistical Machine Translation (SMT) (Lopez, <xref ref-type="bibr" rid="j_info1236_ref_029">1993</xref>; Specia, <xref ref-type="bibr" rid="j_info1236_ref_040">2010</xref>; Bungum and Gambäck, <xref ref-type="bibr" rid="j_info1236_ref_008">2010</xref>) is by far the most studied and used MT method (Albat, <xref ref-type="bibr" rid="j_info1236_ref_001">2007</xref>). SMT was based originally on single words, but has now progressed to the level of word sequences, called phrases. Currently the most successful SMT approach is phrase-based translation (Specia, <xref ref-type="bibr" rid="j_info1236_ref_040">2010</xref>).</p>
<p>Translations in SMT are generated on the basis of statistical models, i.e. translation and language models, where different models’ weights provide various translations. The translation model is used to translate words or phrases from the source language to the target language text, and the language model ensures that the translated text is more fluent. These models know nothing about each other, so the problem appears to be how to find a set of weights that would provide the best translation quality. This problem can be regarded as an optimization problem. Optimization refers to the process of finding the optimal models’ weights, where the optimal weights are those which maximize the translation quality.</p>
<p>The translation quality is measured using translation error metrics, and the Bilingual Evaluation Understudy (BLEU) (Papineni <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_035">2002</xref>) metric is one of the more popular and inexpensive automated metrics for achieving a high correlation with human judgments of quality (Callison-Burch <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_009">2006</xref>). It is worth noting that MT evaluation is a complex problem, and that methods such as BLEU are not without criticism.</p>
<p>The quality of MT depends on the languages. For some language pairs, SMT brings good results, especially if the target language is English. The morphological richness of languages has a direct impact on the quality. Significantly lower quality is obtained for language pairs when translating from English into a morphologically rich language. Agglutinative target languages (Hungarian or Turkish) are even more problematic for statistical approaches. Although the approach proposed in this paper is general, our research was done on a difficult language pair where one language is highly analytical (English) and the other, morphologically rich (Slovenian) (Sepesy Maučec and Brest, <xref ref-type="bibr" rid="j_info1236_ref_038">2010</xref>).</p>
<p>Numerous algorithms exist for solving general optimization problems with real valued numbers. One of these algorithms is the Differential Evolution algorithm (DE) (Storn and Price, <xref ref-type="bibr" rid="j_info1236_ref_042">1997</xref>; Price <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_036">2005</xref>; Neri and Tirronen, <xref ref-type="bibr" rid="j_info1236_ref_031">2010</xref>; Das <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_016">2016</xref>), which is a simple and effective algorithm for global optimization. It has been proved to be efficient at solving different optimization problems involving real valued numbers which interact non-linearly with each other (Das and Suganthan, <xref ref-type="bibr" rid="j_info1236_ref_014">2011</xref>; Das <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_015">2011</xref>; Zhou <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_046">2011</xref>; Bošković <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_005">2011</xref>; Glotić and Zamuda, <xref ref-type="bibr" rid="j_info1236_ref_021">2015</xref>; Mlakar <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_030">2014</xref>; Bošković and Brest, <xref ref-type="bibr" rid="j_info1236_ref_004">2016</xref>). The DE algorithm is an evolutionary based algorithm where each individual from the population is described as a vector of models’ weights.</p>
<p>Currently, the more popular way to find optimal models’ weights is to use Minimum Error Rate Training (MERT) (Och, <xref ref-type="bibr" rid="j_info1236_ref_032">2003</xref>; Bertoldi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_002">2009</xref>) and, in this paper, we used the jDE (Brest <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_006">2006a</xref>) algorithm where each individual has its own crossover rate and scale factor. The authors in Brest <italic>et al.</italic> (<xref ref-type="bibr" rid="j_info1236_ref_006">2006a</xref>), Zhang and Sanderson (<xref ref-type="bibr" rid="j_info1236_ref_045">2009</xref>) observed through experiments that the efficiency of the DE algorithm is improved when control parameters respond to the evolution with a self-adapting mechanism. This enables the jDE algorithm to solve our problem more efficiently and reduce the number of main control parameters. Translations are then evaluated using the BLEU metric.</p>
<p>Recently, Evolutionary Algorithms have attracted increasing attention for enhancing the performance of Natural Language Processing (NLP) (Bungum and Gambäck, <xref ref-type="bibr" rid="j_info1236_ref_008">2010</xref>) techniques. NLP is a field concerned with the interaction between computer and human using natural language – spoken (Du Bois <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_018">2005</xref>; Kasparaitis and Anbinderis, <xref ref-type="bibr" rid="j_info1236_ref_024">2014</xref>) and written (Koehn, <xref ref-type="bibr" rid="j_info1236_ref_026">2005</xref>; Steinberger <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_041">2006</xref>). In this paper, the focus is on the written language.</p>
<p>The SMT system should produce translations in a reasonable time. Some MT applications are working almost in real-time. Since the training and optimization processes are both a part of an offline training where we have a static corpus and no time constraints, the training time is not so relevant. Once the SMT system is built it is ready to use, and then the actual translating depends mostly on the size of the text which is to be translated.</p>
<p>The main goal of this paper is to find the optimal models’ weights. The work presented in this paper is different from previous studies in several aspects. Firstly, this is the first study of using the jDE algorithm to optimize models’ weights within SMT. We believe that the self-adaptive nature of the jDE algorithm in comparison to a DE algorithm improves the efficiency of the optimization algorithm. Secondly, the evaluation is usually based on only one optimizer run (Koehn <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_027">2009</xref>; Bojar <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_003">2015</xref>). In this paper, each optimizer (MERT, MIRA, DE, and jDE) was run many times, and the results were compared statistically to meet the conventional significance level.</p>
<p>The remainder of the paper is organized as follows. Section <xref rid="j_info1236_s_002">2</xref> presents some background on the related work. Our experiment is described in Section <xref rid="j_info1236_s_005">3</xref>, and the results are presented in Section <xref rid="j_info1236_s_006">4</xref>, along with the statistical analysis using a MultEval (Clark <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_013">2011</xref>) tool. We conclude this paper with Section <xref rid="j_info1236_s_010">5</xref> where we give our opinion about the obtained results and future work.</p>
</sec>
<sec id="j_info1236_s_002">
<label>2</label>
<title>Background</title>
<p>The availabilities of linear models and discriminative optimization algorithms have been a huge boon to SMT, allowing this field to move beyond the constraints of generative noisy channels (Och and Ney, <xref ref-type="bibr" rid="j_info1236_ref_034">2002</xref>). The ability to optimize these models according to an error metric has become a standard assumption in SMT, due to the wide-spread adoption of MERT. The problems with MERT can be addressed through the use of surrogate loss functions. The Margin Infused Relaxed Algorithm (MIRA) (Watanabe <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_044">2007</xref>; Chiang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_011">2008</xref>, <xref ref-type="bibr" rid="j_info1236_ref_012">2009</xref>; Cherry and Foster, <xref ref-type="bibr" rid="j_info1236_ref_010">2012</xref>; Hasler <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_022">2011</xref>) employs a structured hinge loss. In order to improve generalization, the average of all weights seen during learning is used on unseen data. Chiang <italic>et al.</italic> (<xref ref-type="bibr" rid="j_info1236_ref_011">2008</xref>) took advantage of the MIRA to modify each update to suit SMT better. Pairwise Ranking Optimization (PRO) (Hopkins and May, <xref ref-type="bibr" rid="j_info1236_ref_023">2011</xref>) aims to handle large feature sets inside the traditional MERT architecture. This architecture is desirable, as most groups have infrastructures to <italic>n</italic>-best decode their tuning sets in parallel. A simple approach of using Evolutionary Algorithms in SMT was shown in our previous work (Dugonik <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_019">2014</xref>).</p>
<p>SMT deals with mapping sentences in one natural language (source) into another natural language (target). This process can be represented as a stochastic process. There are many SMT variants, depending on how the translation is modelled. Commonly, we are translating the text sentence by sentence. We want to find the best possible translation <inline-formula id="j_info1236_ineq_001"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mo>∗</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${e^{\ast }}$]]></tex-math></alternatives></inline-formula> out of all possible translations <italic>e</italic> for a given source sentence <italic>f</italic>. The system selects the translation with the highest probability <inline-formula id="j_info1236_ineq_002"><alternatives>
<mml:math><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">e</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:mi mathvariant="italic">f</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$P(e|f)$]]></tex-math></alternatives></inline-formula>. Applying the Bayes rule, the probability <inline-formula id="j_info1236_ineq_003"><alternatives>
<mml:math><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">e</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:mi mathvariant="italic">f</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$P(e|f)$]]></tex-math></alternatives></inline-formula> is decomposed into probabilities <inline-formula id="j_info1236_ineq_004"><alternatives>
<mml:math><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">f</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:mi mathvariant="italic">e</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$P(f|e)$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1236_ineq_005"><alternatives>
<mml:math><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">e</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$P(e)$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_info1236_ineq_006"><alternatives>
<mml:math><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">f</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$P(f)$]]></tex-math></alternatives></inline-formula>: 
<disp-formula id="j_info1236_eq_001">
<label>(1)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mo>∗</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>argmax</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow></mml:msub><mml:mstyle displaystyle="true"><mml:mfrac><mml:mrow><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">f</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:mi mathvariant="italic">e</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>·</mml:mo><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">e</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">f</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>argmax</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow></mml:msub><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">f</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:mi mathvariant="italic">e</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>·</mml:mo><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">e</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {e^{\ast }}={\text{argmax}_{e}}\frac{P(f|e)\cdot P(e)}{P(f)}={\text{argmax}_{e}}P(f|e)\cdot P(e).\]]]></tex-math></alternatives>
</disp-formula> 
The denominator <inline-formula id="j_info1236_ineq_007"><alternatives>
<mml:math><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">f</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$P(f)$]]></tex-math></alternatives></inline-formula> does not influence argmax and can be disregarded.</p>
<p>This approach has three major aspects:</p>
<list>
<list-item id="j_info1236_li_001">
<label>•</label>
<p>Translation model <inline-formula id="j_info1236_ineq_008"><alternatives>
<mml:math><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">f</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:mi mathvariant="italic">e</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$P(f|e)$]]></tex-math></alternatives></inline-formula>: specifies the set of possible translations for some target sentence and assigns probabilities to these translations.</p>
</list-item>
<list-item id="j_info1236_li_002">
<label>•</label>
<p>Language model <inline-formula id="j_info1236_ineq_009"><alternatives>
<mml:math><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">e</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$P(e)$]]></tex-math></alternatives></inline-formula>: models the fluency of the proposed target sentence and assigns distributions over strings (higher probabilities are assigned to sentences which are more representative of a natural language).</p>
</list-item>
<list-item id="j_info1236_li_003">
<label>•</label>
<p>Search process (argmax operation): this process is called decoding, and its job is to find possible target translations.</p>
</list-item>
</list>
<p>SMT systems usually decompose entire sentences into a sequence of strings called phrases. These phrases are not linguistic phrases but phrases found using statistical methods from a corpus. The SMT system looks for general patterns (<italic>n</italic>-grams) which appear in everyday language. An <italic>n</italic>-gram is a contiguous sequence of <italic>n</italic> items from a given sequence of text or speech. In the phrase-based model, the source sentence <italic>f</italic> is broken down into <italic>I</italic> phrases <inline-formula id="j_info1236_ineq_010"><alternatives>
<mml:math><mml:mover accent="true"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">f</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:math>
<tex-math><![CDATA[$\bar{{f_{i}}}$]]></tex-math></alternatives></inline-formula>, and each source phrase <inline-formula id="j_info1236_ineq_011"><alternatives>
<mml:math><mml:mover accent="true"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">f</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:math>
<tex-math><![CDATA[$\bar{{f_{i}}}$]]></tex-math></alternatives></inline-formula> is translated into a target phrase <inline-formula id="j_info1236_ineq_012"><alternatives>
<mml:math><mml:mover accent="true"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:math>
<tex-math><![CDATA[$\bar{{e_{i}}}$]]></tex-math></alternatives></inline-formula>. From the produced translations it can be seen that the target sentence is not fluent, hence the idea to introduce weights and scale the contribution of each model: 
<disp-formula id="j_info1236_eq_002">
<label>(2)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mo>∗</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>argmax</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow></mml:msub>
<mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mstyle displaystyle="true"><mml:mo largeop="true" movablelimits="false">∏</mml:mo></mml:mstyle></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">I</mml:mi></mml:mrow></mml:munderover><mml:mi mathvariant="italic">P</mml:mi><mml:msup><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">f</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">¯</mml:mo></mml:mover><mml:mo stretchy="false">|</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">¯</mml:mo></mml:mover><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msup><mml:mo>·</mml:mo><mml:mi mathvariant="italic">P</mml:mi><mml:msup><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">e</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msup><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {e^{\ast }}={\text{argmax}_{e}}{\prod \limits_{i=1}^{I}}P{(\bar{{f_{i}}}|\bar{{e_{i}}})^{{\lambda _{1}}}}\cdot P{(e)^{{\lambda _{2}}}}.\]]]></tex-math></alternatives>
</disp-formula>
</p>
<p>We can generalize the setup of the SMT system to many different models, and we can scale the contribution of each of them: 
<disp-formula id="j_info1236_eq_003">
<label>(3)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mo>∗</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>argmax</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo>
<mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mstyle displaystyle="true"><mml:mo largeop="true" movablelimits="false">∏</mml:mo></mml:mstyle></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">r</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mrow><mml:mi mathvariant="italic">h</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">e</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">f</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {e^{\ast }}={\text{argmax}_{e}}={\prod \limits_{i=1}^{r}}{h_{i}}{(e,f)^{{\lambda _{i}}}},\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_info1236_ineq_013"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">h</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">h</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">r</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${h_{1}},\dots ,{h_{r}}$]]></tex-math></alternatives></inline-formula> are the models of a search algorithm, e.g. translation model, language model, reordering model, word penalty, etc., <italic>r</italic> denotes the number of models, and <inline-formula id="j_info1236_ineq_014"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">r</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\lambda _{i}},\dots ,{\lambda _{r}}$]]></tex-math></alternatives></inline-formula> are models’ weights. The weights are scaling factors, and are optimized with a loss function which evaluates the translation quality, for example, the BLEU evaluation metric. These models are trained separately and then combined, assuming that they are independent of each other. But the contributions of different models influence each other. The problem is to find a set of weights that will provide the best translation quality: 
<disp-formula id="j_info1236_eq_004">
<label>(4)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mo>∗</mml:mo></mml:mrow></mml:msup><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">r</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>argmax</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow></mml:msub><mml:mo movablelimits="false">exp</mml:mo>
<mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mstyle displaystyle="true"><mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">r</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>·</mml:mo><mml:mo movablelimits="false">log</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">h</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">e</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">f</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {e^{\ast }}({\lambda _{i}},\dots ,{\lambda _{r}})={\text{argmax}_{e}}\exp {\sum \limits_{i=1}^{r}}{\lambda _{i}}\cdot \log {h_{i}}(e,f).\]]]></tex-math></alternatives>
</disp-formula>
</p>
<p>The area of possible weight settings is too large for the exploration of all possible values. Usually a tuning set is used to optimize weights. The simplest method is to try out with a large number of possible settings and pick what works best. Assuming we wish to optimize our decoder’s BLEU score, the natural objective of learning would be to find such <inline-formula id="j_info1236_ineq_015"><alternatives>
<mml:math><mml:mi mathvariant="bold-italic">λ</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="bold-italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="bold-italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[$\boldsymbol{\lambda }={\boldsymbol{\lambda }_{\boldsymbol{i}}},\dots ,{\boldsymbol{\lambda }_{\boldsymbol{r}}}$]]></tex-math></alternatives></inline-formula> that the BLEU score is maximal.</p>
<p>Optimization refers to the process of finding the optimal weights for this linear model where optimal weights are those which maximize the translation quality on the tuning set. During decoding, the decoder scores translations using a linear model. The features of this linear model are the probabilities of multiple models. Each feature contributes information over one aspect of the characteristics of a good translation, e.g. the language model ensures that the translation is more fluent. Each feature can be given a weight that sets its importance. We see the problem as an optimization problem that will be tackled using the jDE algorithm.</p>
<sec id="j_info1236_s_003">
<label>2.1</label>
<title>Differential Evolution</title>
<p>The DE algorithm is a simple and effective evolutionary algorithm for global optimization. This algorithm is a population-based algorithm, and uses the differences between individuals. These differences are defined with simple and fast arithmetic operations. The DE algorithm uses a population <bold>P</bold> of <italic>Np</italic> individuals, where each individual is represented as a <italic>D</italic>-dimensional vector. The elements of the vector are real-valued numbers from specified intervals. These intervals and the dimension (<italic>D</italic>) are determined by the problem being solved. The following control parameters are specified by the user and affect the behaviour of the algorithm: 
<list>
<list-item id="j_info1236_li_004">
<label>•</label>
<p>mutation parameter (<italic>F</italic>),</p>
</list-item>
<list-item id="j_info1236_li_005">
<label>•</label>
<p>crossover parameter (<italic>Cr</italic>),</p>
</list-item>
<list-item id="j_info1236_li_006">
<label>•</label>
<p>population size (<italic>Np</italic>).</p>
</list-item>
</list> 
These parameters are fixed during the evolutionary process. If nothing is known about the problem, the initial population is chosen randomly: 
<disp-formula id="j_info1236_eq_005">
<label>(5)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right left" columnspacing="0pt"><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">P</mml:mtext></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo fence="true" stretchy="false">{</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="italic">N</mml:mi><mml:mi mathvariant="italic">p</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo fence="true" stretchy="false">}</mml:mo><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo fence="true" stretchy="false">{</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">i</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">i</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">i</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo fence="true" stretchy="false">}</mml:mo><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">j</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">i</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">rand</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">min</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">max</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:mi mathvariant="italic">i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mn>2</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mtext mathvariant="italic">Np</mml:mtext><mml:mo>;</mml:mo><mml:mspace width="2.5pt"/><mml:mi mathvariant="italic">j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mn>2</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mtext mathvariant="italic">D</mml:mtext><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[\begin{aligned}{}& {\textbf{P}_{0}}=\{{\textbf{x}_{1,0}},{\textbf{x}_{2,0}},\dots ,{\textbf{x}_{Np,0}}\},\\ {} & {\textbf{x}_{i,0}}=\{{x_{1,i,0}},{x_{2,i,0}},\dots ,{x_{D,i,0}}\},\\ {} & {x_{j,i,0}}=\mathit{rand}(\mathit{min},\mathit{max}),\\ {} & i=1,2,\dots ,\textit{Np};\hspace{2.5pt}j=1,2,\dots ,\textit{D},\end{aligned}\]]]></tex-math></alternatives>
</disp-formula> 
where <italic>min</italic> and <italic>max</italic> are the lower and upper bounds determined by the problem. The function <inline-formula id="j_info1236_ineq_016"><alternatives>
<mml:math><mml:mi mathvariant="italic">rand</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">min</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">max</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$\mathit{rand}(\mathit{min},\mathit{max})$]]></tex-math></alternatives></inline-formula> returns a uniformly distributed random number within the range <inline-formula id="j_info1236_ineq_017"><alternatives>
<mml:math><mml:mo fence="true" stretchy="false">[</mml:mo><mml:mi mathvariant="italic">min</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">max</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$[\mathit{min},\mathit{max})$]]></tex-math></alternatives></inline-formula>.</p>
<p>The crucial idea behind the DE algorithm is a strategy for generating new individuals. Based on the type of problem we can choose between various strategies of the DE algorithm which determine the mutation and crossover methods. The classic DE algorithm, shown in Algorithm <xref rid="j_info1236_fig_001">1</xref>, uses the <italic>rand</italic>/1/<italic>bin</italic> strategy. The algorithm generates new individual <inline-formula id="j_info1236_ineq_018"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="bold">m</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\mathbf{m}_{i}}$]]></tex-math></alternatives></inline-formula> by adding a weighted difference vector between two individuals from the population to a third individual from the population: 
<disp-formula id="j_info1236_eq_006">
<label>(6)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right left" columnspacing="0pt"><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">m</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="italic">r</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi mathvariant="italic">F</mml:mi><mml:mo>·</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="italic">r</mml:mi><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="italic">r</mml:mi><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:mi mathvariant="italic">r</mml:mi><mml:mn>1</mml:mn><mml:mo stretchy="false">≠</mml:mo><mml:mi mathvariant="italic">r</mml:mi><mml:mn>2</mml:mn><mml:mo stretchy="false">≠</mml:mo><mml:mi mathvariant="italic">r</mml:mi><mml:mn>3</mml:mn><mml:mo stretchy="false">≠</mml:mo><mml:mi mathvariant="italic">i</mml:mi><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[\begin{aligned}{}& {\textbf{m}_{i}}={\textbf{x}_{r1}}+F\cdot ({\textbf{x}_{r2}}-{\textbf{x}_{r3}}),\\ {} & r1\ne r2\ne r3\ne i.\end{aligned}\]]]></tex-math></alternatives>
</disp-formula> 
The integers <inline-formula id="j_info1236_ineq_019"><alternatives>
<mml:math><mml:mi mathvariant="italic">r</mml:mi><mml:mn>1</mml:mn></mml:math>
<tex-math><![CDATA[$r1$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1236_ineq_020"><alternatives>
<mml:math><mml:mi mathvariant="italic">r</mml:mi><mml:mn>2</mml:mn></mml:math>
<tex-math><![CDATA[$r2$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_info1236_ineq_021"><alternatives>
<mml:math><mml:mi mathvariant="italic">r</mml:mi><mml:mn>3</mml:mn></mml:math>
<tex-math><![CDATA[$r3$]]></tex-math></alternatives></inline-formula> are chosen randomly from the interval <inline-formula id="j_info1236_ineq_022"><alternatives>
<mml:math><mml:mo fence="true" stretchy="false">{</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mtext mathvariant="italic">Np</mml:mtext><mml:mo fence="true" stretchy="false">}</mml:mo></mml:math>
<tex-math><![CDATA[$\{1,\textit{Np}\}$]]></tex-math></alternatives></inline-formula> and are different from each other and the current index <italic>i. F</italic> is a constant factor which controls the amplification of the differential variation. 
<disp-formula id="j_info1236_eq_007">
<label>(7)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right left" columnspacing="0pt"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mi mathvariant="italic">c</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">j</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:mtd><mml:mtd class="align-even"><mml:mo>=</mml:mo><mml:mfenced separators="" open="{" close=""><mml:mrow><mml:mtable columnspacing="4.0pt" equalrows="false" columnlines="none" equalcolumns="false" columnalign="left left"><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">m</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">j</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mspace width="1em"/></mml:mtd><mml:mtd class="array"><mml:mtext>if</mml:mtext><mml:mspace width="2.5pt"/><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">rand</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mn>0</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>⩽</mml:mo><mml:mi mathvariant="italic">C</mml:mi><mml:mi mathvariant="italic">r</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mspace width="2.5pt"/><mml:mtext>or</mml:mtext><mml:mspace width="2.5pt"/><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">j</mml:mi><mml:mo>=</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="italic">j</mml:mi><mml:mi mathvariant="italic">rand</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">j</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mspace width="1em"/></mml:mtd><mml:mtd class="array"><mml:mtext>otherwise</mml:mtext><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mfenced></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="align-odd"><mml:mi mathvariant="italic">i</mml:mi></mml:mtd><mml:mtd class="align-even"><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mn>2</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mtext mathvariant="italic">Np</mml:mtext><mml:mo>;</mml:mo><mml:mspace width="2.5pt"/><mml:mi mathvariant="italic">j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mn>2</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">D</mml:mi><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[\begin{aligned}{}{c_{j,i}}& =\left\{\begin{array}{l@{\hskip4.0pt}l}{m_{j,i}}\hspace{1em}& \text{if}\hspace{2.5pt}(\mathit{rand}(0,1)\leqslant Cr)\hspace{2.5pt}\text{or}\hspace{2.5pt}(j==j\mathit{rand})\\ {} {x_{j,i}}\hspace{1em}& \text{otherwise}.\end{array}\right.\\ {} i& =1,2,\dots ,\textit{Np};\hspace{2.5pt}j=1,2,\dots ,D.\end{aligned}\]]]></tex-math></alternatives>
</disp-formula> 
<italic>Cr</italic> is a crossover probability which controls the fraction of parameters that are copied from the mutant vector <inline-formula id="j_info1236_ineq_023"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="bold">m</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\mathbf{m}_{i}}$]]></tex-math></alternatives></inline-formula>. The function <inline-formula id="j_info1236_ineq_024"><alternatives>
<mml:math><mml:mtext mathvariant="italic">rand</mml:mtext><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mn>0</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$\textit{rand}(0,1)$]]></tex-math></alternatives></inline-formula> returns a uniformly distributed random number within the range [0,1). The integer value <italic>jrand</italic> is the index of a randomly taken individual from the mutant vector <inline-formula id="j_info1236_ineq_025"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="bold">m</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\mathbf{m}_{i}}$]]></tex-math></alternatives></inline-formula> to ensure that the newly generated individual <inline-formula id="j_info1236_ineq_026"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="bold">c</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\mathbf{c}_{i}}$]]></tex-math></alternatives></inline-formula> does not duplicate <inline-formula id="j_info1236_ineq_027"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="bold">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\mathbf{x}_{i}}$]]></tex-math></alternatives></inline-formula>. If the newly created individual <inline-formula id="j_info1236_ineq_028"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="bold">c</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\mathbf{c}_{i}}$]]></tex-math></alternatives></inline-formula> yields better fitness value than the current individual <inline-formula id="j_info1236_ineq_029"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="bold">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\mathbf{x}_{i}}$]]></tex-math></alternatives></inline-formula>, then <inline-formula id="j_info1236_ineq_030"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="bold">c</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\mathbf{c}_{i}}$]]></tex-math></alternatives></inline-formula> survives into the next generation. The process of mutation, crossover, and selection is repeated until the optimum is located or a prespecified termination criterion is satisfied. In addition, the best individual is evaluated for every generation <italic>g</italic> in order to keep track of the progress that is made during the optimization process.</p>
<fig id="j_info1236_fig_001">
<label>Algorithm 1</label>
<caption>
<p>The Differential Evolution Algorithm</p>
</caption>
<graphic xlink:href="info1236_g001.jpg"/>
</fig>
</sec>
<sec id="j_info1236_s_004">
<label>2.2</label>
<title>jDE Algorithm</title>
<p>The classic DE algorithm has three control parameters, <italic>F</italic>, <inline-formula id="j_info1236_ineq_031"><alternatives>
<mml:math><mml:mi mathvariant="italic">C</mml:mi><mml:mi mathvariant="italic">r</mml:mi></mml:math>
<tex-math><![CDATA[$Cr$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_info1236_ineq_032"><alternatives>
<mml:math><mml:mi mathvariant="italic">N</mml:mi><mml:mi mathvariant="italic">p</mml:mi></mml:math>
<tex-math><![CDATA[$Np$]]></tex-math></alternatives></inline-formula>, that are fixed during the evolution. They are usually problem-dependent, and with different values during the evolutionary process, allow the algorithm to perform better. For this purpose, the jDE algorithm (Brest <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_006">2006a</xref>, <xref ref-type="bibr" rid="j_info1236_ref_007">2006b</xref>) recalculates two control parameters using the following equations: <disp-formula-group id="j_info1236_dg_001">
<disp-formula id="j_info1236_eq_008">
<label>(8)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right left" columnspacing="0pt"><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:msub><mml:mrow><mml:mi mathvariant="italic">F</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfenced separators="" open="{" close=""><mml:mrow><mml:mtable columnspacing="4.0pt" equalrows="false" columnlines="none" equalcolumns="false" columnalign="left left"><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">F</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">min</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">F</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">max</mml:mi></mml:mrow></mml:msub><mml:mo>·</mml:mo><mml:mi mathvariant="italic">rand</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mspace width="1em"/></mml:mtd><mml:mtd class="array"><mml:mtext>if</mml:mtext><mml:mspace width="2.5pt"/><mml:mi mathvariant="italic">rand</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">&lt;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">τ</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">F</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mspace width="1em"/></mml:mtd><mml:mtd class="array"><mml:mtext>otherwise,</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mfenced></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[\begin{aligned}{}& {F_{i}}=\left\{\begin{array}{l@{\hskip4.0pt}l}{F_{\mathit{min}}}+{F_{\mathit{max}}}\cdot \mathit{rand}()\hspace{1em}& \text{if}\hspace{2.5pt}\mathit{rand}()<{\tau _{1}}\\ {} {F_{i}}\hspace{1em}& \text{otherwise,}\end{array}\right.\end{aligned}\]]]></tex-math></alternatives>
</disp-formula>
<disp-formula id="j_info1236_eq_009">
<label>(9)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right left" columnspacing="0pt"><mml:mtr><mml:mtd class="align-odd"/><mml:mtd class="align-even"><mml:msub><mml:mrow><mml:mi mathvariant="italic">Cr</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfenced separators="" open="{" close=""><mml:mrow><mml:mtable columnspacing="4.0pt" equalrows="false" columnlines="none" equalcolumns="false" columnalign="left left"><mml:mtr><mml:mtd class="array"><mml:mi mathvariant="italic">rand</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mspace width="1em"/></mml:mtd><mml:mtd class="array"><mml:mtext>if</mml:mtext><mml:mspace width="2.5pt"/><mml:mi mathvariant="italic">rand</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo mathvariant="normal">&lt;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">τ</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">Cr</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mspace width="1em"/></mml:mtd><mml:mtd class="array"><mml:mtext>otherwise.</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mfenced></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[\begin{aligned}{}& {\mathit{Cr}_{i}}=\left\{\begin{array}{l@{\hskip4.0pt}l}\mathit{rand}()\hspace{1em}& \text{if}\hspace{2.5pt}\mathit{rand}()<{\tau _{2}}\\ {} {\mathit{Cr}_{i}}\hspace{1em}& \text{otherwise.}\end{array}\right.\end{aligned}\]]]></tex-math></alternatives>
</disp-formula>
</disp-formula-group> <inline-formula id="j_info1236_ineq_033"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">F</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">min</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.1</mml:mn></mml:math>
<tex-math><![CDATA[${F_{\mathit{min}}}=0.1$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_info1236_ineq_034"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">F</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">max</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.9</mml:mn></mml:math>
<tex-math><![CDATA[${F_{\mathit{max}}}=0.9$]]></tex-math></alternatives></inline-formula> determine the lower and upper bounds for the parameter <italic>F</italic>, and the function <inline-formula id="j_info1236_ineq_035"><alternatives>
<mml:math><mml:mi mathvariant="italic">rand</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$\mathit{rand}()$]]></tex-math></alternatives></inline-formula> returns a uniform random value within the interval <inline-formula id="j_info1236_ineq_036"><alternatives>
<mml:math><mml:mo fence="true" stretchy="false">[</mml:mo><mml:mn>0</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mn>1</mml:mn><mml:mo fence="true" stretchy="false">]</mml:mo></mml:math>
<tex-math><![CDATA[$[0,1]$]]></tex-math></alternatives></inline-formula>. <inline-formula id="j_info1236_ineq_037"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">τ</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\tau _{1}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_info1236_ineq_038"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">τ</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\tau _{2}}$]]></tex-math></alternatives></inline-formula> represent probabilities for recalculating <italic>F</italic> and <inline-formula id="j_info1236_ineq_039"><alternatives>
<mml:math><mml:mi mathvariant="italic">C</mml:mi><mml:mi mathvariant="italic">r</mml:mi></mml:math>
<tex-math><![CDATA[$Cr$]]></tex-math></alternatives></inline-formula>.</p>
<fig id="j_info1236_fig_002">
<label>Fig. 1</label>
<caption>
<p>The optimization process for finding the best models’ weights using the jDE algorithm.</p>
</caption>
<graphic xlink:href="info1236_g002.jpg"/>
</fig>
<p>As seen in Fig. <xref rid="j_info1236_fig_002">1</xref>, the jDE algorithm has three inputs: The tuning set, models, and the Moses decoder (Koehn <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_028">2007</xref>). The output of the algorithm is the best models’ weights <inline-formula id="j_info1236_ineq_040"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="bold-italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mo>∗</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${\boldsymbol{\lambda }^{\ast }}$]]></tex-math></alternatives></inline-formula>. The tuning set consists of a source and target sentences. This algorithm is a population-based algorithm, and the population <bold>P</bold> consists of individuals where an individual <inline-formula id="j_info1236_ineq_041"><alternatives>
<mml:math><mml:mi mathvariant="bold-italic">λ</mml:mi></mml:math>
<tex-math><![CDATA[$\boldsymbol{\lambda }$]]></tex-math></alternatives></inline-formula> is represented by the vector of models’ weights: <inline-formula id="j_info1236_ineq_042"><alternatives>
<mml:math><mml:mi mathvariant="bold-italic">λ</mml:mi><mml:mo>=</mml:mo><mml:mo fence="true" stretchy="false">{</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">D</mml:mi></mml:mrow></mml:msub><mml:mo fence="true" stretchy="false">}</mml:mo></mml:math>
<tex-math><![CDATA[$\boldsymbol{\lambda }=\{{\lambda _{1}},{\lambda _{2}},\dots ,{\lambda _{D}}\}$]]></tex-math></alternatives></inline-formula>. In the initial population, weights in vectors are generated randomly between lower (<inline-formula id="j_info1236_ineq_043"><alternatives>
<mml:math><mml:mi mathvariant="italic">m</mml:mi><mml:mi mathvariant="italic">i</mml:mi><mml:mi mathvariant="italic">n</mml:mi></mml:math>
<tex-math><![CDATA[$min$]]></tex-math></alternatives></inline-formula>) and upper (<inline-formula id="j_info1236_ineq_044"><alternatives>
<mml:math><mml:mi mathvariant="italic">m</mml:mi><mml:mi mathvariant="italic">a</mml:mi><mml:mi mathvariant="italic">x</mml:mi></mml:math>
<tex-math><![CDATA[$max$]]></tex-math></alternatives></inline-formula>) bounds, and the scale factor <italic>F</italic> and crossover rate <inline-formula id="j_info1236_ineq_045"><alternatives>
<mml:math><mml:mi mathvariant="italic">C</mml:mi><mml:mi mathvariant="italic">r</mml:mi></mml:math>
<tex-math><![CDATA[$Cr$]]></tex-math></alternatives></inline-formula> are set initially to 0.5 and 0.9, respectively. The algorithm generates a new trial vector for each vector in the population using mutation and crossover, and the <italic>F</italic> or <italic>Cr</italic> are recalculated if certain conditions are met. During the selection, this trial vector is then compared to the current vector in the population. The trial vector survives to the next generation if the trial vector is better than the current vector from the population. In order to evaluate a vector, the SMT system translates the source sentences into the target sentences using weights from this vector. The translated sentences are then compared with the target sentences from the tuning set using the BLEU metric, which returns a real number where a higher number implies greater similarity. The algorithm repeats this process until it reaches the maximum number of generations. In the last generation, the best individual is taken from the population and its weights are used for translating in real-time.</p>
</sec>
</sec>
<sec id="j_info1236_s_005">
<label>3</label>
<title>Experiment</title>
<p>In our experiment, we built two SMT systems for translating from Slovenian to English and vice versa. All of the codes and data necessary to begin work on an SMT system are available as a public source, Moses toolkit (Koehn <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_028">2007</xref>), and the freely available JRC-Acquis parallel corpora (Steinberger <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_041">2006</xref>) used as a benchmark in the SMT community (Koehn <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_027">2009</xref>).</p>
<p>Moses toolkit is an open-source toolkit for SMT which contains the SMT decoder and a wide variety of tools for training, tuning and applying the system to many translation tasks. The SMT system is frequency-based, where frequencies are trained on translated texts that are preprocessed and collected into a parallel corpus. Parallel corpora vary in size tremendously. Most of the language pairs, for example, Finnish to Irish, will have a far smaller parallel corpora available. Parallel corpora exist for all European languages and for many other pairs, such as Mandarin to English. However, one of the major challenges faced is the scarce availability of parallel corpora, so we need some methods for creating parallel corpora automatically and efficiently because manual creation of a large parallel corpus can be very costly in terms of effort and time. Currently, parallel corpora are an object of interest.</p>
<table-wrap id="j_info1236_tab_001">
<label>Table 1</label>
<caption>
<p>The aligned and selected JRC ACQUIS corpus.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin"/>
<td colspan="2" style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Aligned</td>
<td colspan="2" style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Selected</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"/>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Slovenian</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">English</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Slovenian</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">English</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Sentences</td>
<td colspan="2" style="vertical-align: top; text-align: center">1,170,663</td>
<td colspan="2" style="vertical-align: top; text-align: center">700,000</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Words</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">25,964,572</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">30,382,264</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">14,262,144</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">17,093,472</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The used JRC-Acquis corpus must not be seen as a legal reference corpus. Instead, the purpose of the JRC-Acquis is to provide a large parallel corpus of documents for (computational) linguistics research purposes. To align the sentences in a source and language text automatically, we used the HunAlign aligner (Varga <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_043">2005</xref>). After successful aligning, we selected 700,000 sentences which were used in our experiment. The exact size of the corpora is shown in Table <xref rid="j_info1236_tab_001">1</xref>. The used corpus was tokenized, lowercased, and sentences longer than 80 words were removed. It is important to obtain a representative sample as much as is possible. The translation quality of neighbouring sentences correlates positively, therefore, we chose sentences from different parts of the corpus to create the training and test sets.</p>
<p>The training set was divided further into training and tuning sets. Sentences shorter than 8 and longer than 60 words were removed from the tuning and test sets. The final sizes of all sets are seen in Table <xref rid="j_info1236_tab_002">2</xref>. The language model is estimated from a monolingual corpora, typically using relative frequency estimates which are then smoothed. For languages such as English, typically, billions and more words are used. Deploying such large models can pose significant engineering challenges. This is because the language model can easily be so large that it will not fit into the memory of conventional machines. Also, the language model can be queried millions of times when translating sentences, which precludes storing it on disk.</p>
<table-wrap id="j_info1236_tab_002">
<label>Table 2</label>
<caption>
<p>Divided JRC ACQUIS corpus.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin"/>
<td colspan="6" style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Slovenian ↔ English</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"/>
<td colspan="2" style="vertical-align: top; text-align: left; border-bottom: solid thin">Train</td>
<td colspan="2" style="vertical-align: top; text-align: left; border-bottom: solid thin">Tuning</td>
<td colspan="2" style="vertical-align: top; text-align: left; border-bottom: solid thin">Test</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Sentences</td>
<td colspan="2" style="vertical-align: top; text-align: center">560,133</td>
<td colspan="2" style="vertical-align: top; text-align: center">644</td>
<td colspan="2" style="vertical-align: top; text-align: center">1,987</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Words</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">11,614,065</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">13,213,582</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">15,065</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">16,944</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">70,245</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">74,922</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For each language (Slovenian and English) we built language and translation models. The language model was a 5-gram language model with improved Kneser-Ney smoothing using the IRST Language Modeling (IRSTLM) (Federico <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_020">2008</xref>) toolkit. The translation models were built using <italic>grow-diag-final-and</italic> alignment from GIZA++ (Och and Ney, <xref ref-type="bibr" rid="j_info1236_ref_033">2000</xref>). We also extended both SMT systems with four advanced models: The distortion model, the lexicalized reordering model (<italic>msd-bidirectional-fe</italic> reordering), the word and phrase penalty models. Each SMT system had six models and 14 weights:</p>
<list>
<list-item id="j_info1236_li_007">
<label>•</label>
<p>1 weight for the word penalty model (<inline-formula id="j_info1236_ineq_046"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\lambda _{1}}$]]></tex-math></alternatives></inline-formula>),</p>
</list-item>
<list-item id="j_info1236_li_008">
<label>•</label>
<p>1 weight for the phrase penalty model (<inline-formula id="j_info1236_ineq_047"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\lambda _{2}}$]]></tex-math></alternatives></inline-formula>),</p>
</list-item>
<list-item id="j_info1236_li_009">
<label>•</label>
<p>4 weights for the translation model (<inline-formula id="j_info1236_ineq_048"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\lambda _{3}},{\lambda _{4}},{\lambda _{5}},{\lambda _{6}}$]]></tex-math></alternatives></inline-formula>),</p>
</list-item>
<list-item id="j_info1236_li_010">
<label>•</label>
<p>6 weights for the lexical reordering model (<inline-formula id="j_info1236_ineq_049"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>9</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>11</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\lambda _{7}},{\lambda _{8}},{\lambda _{9}},{\lambda _{10}},{\lambda _{11}},{\lambda _{12}}$]]></tex-math></alternatives></inline-formula>),</p>
</list-item>
<list-item id="j_info1236_li_011">
<label>•</label>
<p>1 weight for the distortion model (<inline-formula id="j_info1236_ineq_050"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>13</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\lambda _{13}}$]]></tex-math></alternatives></inline-formula>), and</p>
</list-item>
<list-item id="j_info1236_li_012">
<label>•</label>
<p>1 weight for the language model (<inline-formula id="j_info1236_ineq_051"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow><mml:mrow><mml:mn>14</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\lambda _{14}}$]]></tex-math></alternatives></inline-formula>).</p>
</list-item>
</list>
<p>The translation quality is considered to be the correspondence between a machine and professional human (reference) translation. There are many metrics, i.e. BLEU, Translation Error Rate (TER) (Snover <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_039">2006</xref>), Word Error Rate (WER) (Saon <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_037">2006</xref>), etc. The more popular metric in SMT is the BLEU metric, because it is quick, inexpensive, language-independent, and one of the first metrics to achieve a high correlation with human judgments. The central idea behind BLEU is that the closer a machine translation is to a reference translation, the better it is. The primary task in the BLEU metric is to compare the <italic>n</italic>-grams of the machine translation with the <italic>n</italic>-grams of the reference translation and count the number of matches which are position-independent. The foundation of the BLEU metric is the modified <italic>n</italic>-gram precision measure. This captures two aspects of the translation: Adequacy and fluency. The unigram scores are found to account for how much the information is retained (adequacy), and the longer <italic>n</italic>-gram scores account for the fluency of the translation, i.e. if the target language is English, to what extent it reads like “good” English. The BLEU metric’s output is always a real-valued number between 0 and 1. This value indicates how similar the machine and reference translations are. Values closer to 1 represent the more similar texts, however, few machine translations will attain a score of 1 because, in that case, the machine translation must be identical to the reference translations.</p>
</sec>
<sec id="j_info1236_s_006">
<label>4</label>
<title>Results and Discussion</title>
<sec id="j_info1236_s_007">
<label>4.1</label>
<title>Results</title>
<p>We performed the experiment in order to compare the jDE algorithm with the state-of-the-art methods MERT and MIRA, and with the DE algorithm. For each optimizer we performed 30 independent runs on the tuning set described earlier. To evaluate optimizers, we used the test set, which consists of a source and target language texts. The source language text was translated using the decoder and models with the non-optimized and optimized weights. The translated text was then compared with the target language text and evaluated using the BLEU metric, as seen in Fig. <xref rid="j_info1236_fig_003">2</xref>.</p>
<p>The comparison of the results of SMT systems with the jDE optimization against the SMT systems without optimization is shown in Table <xref rid="j_info1236_tab_003">3</xref>. The jDE algorithm achieved BLEU scores of 60.57 for the Slovenian to English SMT system and 51.95 for the English to Slovenian SMT system, followed by MERT, the DE algorithm, and MIRA. Note that an improvement of 2-3 BLEU points is usually hard to obtain, and we will outline this further in Section <xref rid="j_info1236_s_008">4.2</xref>.</p>
<fig id="j_info1236_fig_003">
<label>Fig. 2</label>
<caption>
<p>Evaluating the translation quality using the test set.</p>
</caption>
<graphic xlink:href="info1236_g003.jpg"/>
</fig>
<table-wrap id="j_info1236_tab_003">
<label>Table 3</label>
<caption>
<p>Comparison between baseline system and the system optimized with the jDE algorithm on the test set.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin"/>
<td colspan="2" style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">BLEU ↑</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"/>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Slovenian → English</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">English → Slovenian</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Baseline (non-optimized)</td>
<td style="vertical-align: top; text-align: left">58.00</td>
<td style="vertical-align: top; text-align: left">50.47</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Optimized with jDE</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>60.57</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>51.95</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We compared the newly proposed jDE optimizer with the other state-of-the-art optimizers. Table <xref rid="j_info1236_tab_004">4</xref> shows a comparison between MERT, MIRA, the DE algorithm, and the jDE algorithm. The <italic>best</italic> and <italic>mean</italic> BLEU scores were obtained from 30 optimizer runs. The jDE algorithm achieved the <italic>best</italic> BLEU scores in the case of the Slovenian to English SMT, while DE and jDE were the best performing algorithms for English to Slovenian translation. MERT and MIRA obtained worse results for SMT systems in both translation directions.</p>
<table-wrap id="j_info1236_tab_004">
<label>Table 4</label>
<caption>
<p>Best and mean BLEU score of 30 runs for the Slovenian ↔ English SMT systems.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: middle; text-align: left; border-top: solid thin"/>
<td colspan="2" style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">MERT</td>
<td colspan="2" style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">MIRA</td>
<td colspan="2" style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">DE</td>
<td colspan="2" style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">jDE</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"/>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Best</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Mean</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Best</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Mean</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Best</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Mean</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Best</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Mean</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Slovenian → English</td>
<td style="vertical-align: top; text-align: left">60.56</td>
<td style="vertical-align: top; text-align: left">60.04</td>
<td style="vertical-align: top; text-align: left">60.32</td>
<td style="vertical-align: top; text-align: left">60.06</td>
<td style="vertical-align: top; text-align: left">60.52</td>
<td style="vertical-align: top; text-align: left">59.70</td>
<td style="vertical-align: top; text-align: left"><bold>60.57</bold></td>
<td style="vertical-align: top; text-align: left"><bold>60.12</bold></td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">English → Slovenian</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">51.85</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">51.28</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">51.40</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">51.02</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">51.86</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>51.52</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>51.95</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">51.51</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="j_info1236_fig_004">
<label>Fig. 3</label>
<caption>
<p>The best values of weights after the optimization for Slovenian-English SMT system using MERT, MIRA, DE and jDE.</p>
</caption>
<graphic xlink:href="info1236_g004.jpg"/>
</fig>
<p>The obtained values of weights for the Slovenian-English SMT system for MERT, MIRA, DE and jDE are shown in Fig. <xref rid="j_info1236_fig_004">3</xref>. Despite the difference in BLEU score, some of the parameters are very different. The phrase penalty parameter for MIRA and jDE was negative, while for MERT and DE it was positive. Also, for jDE, we can notice that phrase penalty, translation and language models contributed most to the translation.</p>
<fig id="j_info1236_fig_005">
<label>Fig. 4</label>
<caption>
<p>The best values of weights after the optimization for the English-Slovenian SMT system using MERT, MIRA, DE and jDE.</p>
</caption>
<graphic xlink:href="info1236_g005.jpg"/>
</fig>
<p>The obtained values of weights for the English-Slovenian SMT system for MERT, MIRA, DE and jDE are shown in Fig. <xref rid="j_info1236_fig_005">4</xref>. The 4-th parameter for the lexical reordering model was much higher in DE than in the other systems, and the parameter for the phrase penalty model was much lower in DE than in the other systems. Also, it can be seen, that for all systems, except DE, the word penalty model contributed the most to the translation.</p>
<p>Since the optimization process was a part of an offline training with a static corpus, described in Section <xref rid="j_info1236_s_005">3</xref>, and no time constraints, the optimization time was not so relevant. Once the SMT system was built, the actual time for translating was the same for all optimizations. For a text of 15,000 words it took approximately 5 minutes on a single CPU (i5). As we can see in Table <xref rid="j_info1236_tab_005">5</xref>, both DE and jDE used the same settings, and, with these, they both made 750 evaluations, and time for one evaluation was around 2.5 minutes on two CPU’s (i5), resulting in a total optimization time of 1,875 minutes for each.</p>
<table-wrap id="j_info1236_tab_005">
<label>Table 5</label>
<caption>
<p>Tuning process statistics for the SMT systems using DE and jDE.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin"/>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">DE</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">jDE</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Number of generations</td>
<td style="vertical-align: top; text-align: left">50</td>
<td style="vertical-align: top; text-align: left">50</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Population size</td>
<td style="vertical-align: top; text-align: left">15</td>
<td style="vertical-align: top; text-align: left">15</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Tuning set size [words]</td>
<td style="vertical-align: top; text-align: left">15,000</td>
<td style="vertical-align: top; text-align: left">15,000</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Number of evaluations</td>
<td style="vertical-align: top; text-align: left">750</td>
<td style="vertical-align: top; text-align: left">750</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Tuning time [min]</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">1,875.2</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">1,875.3</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The goal of the experimental testing was to assess the true translation quality of an SMT system on a text from a certain domain. However, this is an abstract concept, because it has to be computed on all possible sentences in that domain. In practice, we will always be able just to measure the performance of an SMT system on a specific sample. In our experiment, we compared an SMT system without the optimization (baseline) with an SMT system which was optimized with MERT, MIRA, the DE algorithm, and the jDE algorithm. We translated the same test set, and measured the translation quality using the BLEU metric. One important element of a solid experimental framework is a statistical significance test that allows us to judge if a change in the score that comes from a change in the system truly reflects a change in overall translation quality (Koehn, <xref ref-type="bibr" rid="j_info1236_ref_025">2004</xref>). To measure the reliability of the conclusion that one system is better than the other, or that the difference in test scores is statistically significant, we used the MultEval (Clark <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1236_ref_013">2011</xref>) tool, which is a recognized tool for MT significance testing within the field of SMT. MultEval takes translations from several optimizers and provides two popular metric scores (BLEU, TER), as well as Standard Deviations via bootstrap resampling, and <italic>p</italic>-values via approximate randomization. With this, we can mitigate some of the risk of using unstable optimizers, and it is intended to help in evaluating the impact of in-house experimental variations on the translation quality.</p>
<p>The statistical comparison using the MultEval tool is shown in Tables <xref rid="j_info1236_tab_006">6</xref> and <xref rid="j_info1236_tab_007">7</xref>, where the jDE algorithm was used as a baseline. This means that we are looking to see if MERT, MIRA and the DE algorithm differ statistically significantly according to the jDE algorithm. Again, we can see that the jDE algorithm achieved the higher BLEU scores and the lowest TER scores for both SMT systems.</p>
<table-wrap id="j_info1236_tab_006">
<label>Table 6</label>
<caption>
<p>Statistical test using MultEval for the Slovenian-English SMT system.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Metric</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Optimizer</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Avg</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Std</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin"><italic>p</italic>-value</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">BLEU ↑</td>
<td style="vertical-align: top; text-align: left">jDE</td>
<td style="vertical-align: top; text-align: left">60.12</td>
<td style="vertical-align: top; text-align: left">0.24</td>
<td style="vertical-align: top; text-align: left">–</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">MERT</td>
<td style="vertical-align: top; text-align: left">60.03</td>
<td style="vertical-align: top; text-align: left">0.25</td>
<td style="vertical-align: top; text-align: left">0.001</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">MIRA</td>
<td style="vertical-align: top; text-align: left">60.06</td>
<td style="vertical-align: top; text-align: left">0.19</td>
<td style="vertical-align: top; text-align: left">0.004</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">DE</td>
<td style="vertical-align: top; text-align: left">59.71</td>
<td style="vertical-align: top; text-align: left">0.49</td>
<td style="vertical-align: top; text-align: left">0.001</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">TER ↓</td>
<td style="vertical-align: top; text-align: left">jDE</td>
<td style="vertical-align: top; text-align: left">28.36</td>
<td style="vertical-align: top; text-align: left">0.20</td>
<td style="vertical-align: top; text-align: left">–</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">MERT</td>
<td style="vertical-align: top; text-align: left">28.53</td>
<td style="vertical-align: top; text-align: left">0.20</td>
<td style="vertical-align: top; text-align: left">0.001</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">MIRA</td>
<td style="vertical-align: top; text-align: left">28.51</td>
<td style="vertical-align: top; text-align: left">0.12</td>
<td style="vertical-align: top; text-align: left">0.001</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"/>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">DE</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">28.65</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">0.32</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">0.001</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="j_info1236_tab_007">
<label>Table 7</label>
<caption>
<p>Statistical test using MultEval for the English-Slovenian SMT system.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Metric</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Optimizer</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Avg</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Std</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin"><italic>p</italic>-value</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">BLEU ↑</td>
<td style="vertical-align: top; text-align: left">jDE</td>
<td style="vertical-align: top; text-align: left">51.51</td>
<td style="vertical-align: top; text-align: left">0.44</td>
<td style="vertical-align: top; text-align: left">–</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">MERT</td>
<td style="vertical-align: top; text-align: left">51.28</td>
<td style="vertical-align: top; text-align: left">0.48</td>
<td style="vertical-align: top; text-align: left">0.001</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">MIRA</td>
<td style="vertical-align: top; text-align: left">51.03</td>
<td style="vertical-align: top; text-align: left">0.29</td>
<td style="vertical-align: top; text-align: left">0.001</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">DE</td>
<td style="vertical-align: top; text-align: left">51.52</td>
<td style="vertical-align: top; text-align: left">0.32</td>
<td style="vertical-align: top; text-align: left">0.001</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">TER ↓</td>
<td style="vertical-align: top; text-align: left">jDE</td>
<td style="vertical-align: top; text-align: left">36.26</td>
<td style="vertical-align: top; text-align: left">0.45</td>
<td style="vertical-align: top; text-align: left">–</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">MERT</td>
<td style="vertical-align: top; text-align: left">36.57</td>
<td style="vertical-align: top; text-align: left">0.56</td>
<td style="vertical-align: top; text-align: left">0.001</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">MIRA</td>
<td style="vertical-align: top; text-align: left">36.61</td>
<td style="vertical-align: top; text-align: left">0.29</td>
<td style="vertical-align: top; text-align: left">0.001</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"/>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">DE</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">36.33</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">0.36</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">0.001</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="j_info1236_s_008">
<label>4.2</label>
<title>Discussion</title>
<p>According to the author in Koehn (<xref ref-type="bibr" rid="j_info1236_ref_025">2004</xref>), it is difficult to evaluate the translation quality decently, since it is not entirely clear what the focus of the evaluation should be. Of course, a good translation has to capture the meaning of the target language text. However, differences in emphasis are introduced based on the interpretation of the translator. At the same time, the output should be fluent so that it can be read easily. These two goals (adequacy and fluency) are the main criteria in an MT evaluation. A human translator may be asked to evaluate the adequacy and fluency of the translation output, but this is a laborious and expensive task. Therefore, it is hard to interpret what the BLEU score, for example, 60.57, means. It is not intuitive, and depends on the number of reference translations used. In our experiment, the example of the differences in BLEU scores between reference translations and SMT systems with and without optimization, can be seen in Table <xref rid="j_info1236_tab_008">8</xref>. It is interesting to note that, in all translations, there is a partial translation “for the detection” (or “for detection”), which corresponds to “pri odkrivanju” in the original text, while this part was not translated in the reference translation. We can also see that all the translations except the one with MIRA have the determiner “the”. Differences in the word order could be noticed as well.</p>
<p>The authors in Koehn <italic>et al.</italic> (<xref ref-type="bibr" rid="j_info1236_ref_027">2009</xref>) used the same JRC-Acquis corpus, and built SMT systems for 462 language pairs including the Slovenian-English language pair. As we can see from their results, the BLEU score for the Slovenian-English translation system was 61 and for English-Slovenian was 50.7. The exact division of the corpus into train, tuning and test sets is not published, so we could not make a direct comparison.</p>
<table-wrap id="j_info1236_tab_008">
<label>Table 8</label>
<caption>
<p>Example of one translation from Slovenian to English given by different systems.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">System</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Translation</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Original</td>
<td style="vertical-align: top; text-align: left">za nekatere referenčne laboratorije skupnosti pri odkrivanju bioloških tveganj na veterinarskem področju javnega zdravstvenega varstva</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Reference</td>
<td style="vertical-align: top; text-align: left">to certain community reference laboratories in the veterinary public health field of biological risks</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Baseline</td>
<td style="vertical-align: top; text-align: left">to certain community reference laboratories for the detection of biological risk in the veterinary field of public health protection</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">MERT</td>
<td style="vertical-align: top; text-align: left">to certain community reference laboratories for the detection of biological risk in the veterinary field of public health protection</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">MIRA</td>
<td style="vertical-align: top; text-align: left">to certain community reference laboratories for detection of biological risk in the veterinary public health field</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">DE</td>
<td style="vertical-align: top; text-align: left">to certain community reference laboratories for the detection of biological risk public health in the veterinary field</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">jDE</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">to certain community reference laboratories for the detection of biological risk in the veterinary public health field</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="j_info1236_s_009">
<label>4.3</label>
<title>Optimization Settings</title>
<p>The following settings were used for the optimization: <italic>D</italic> = 14, <italic>min</italic> = −1, <italic>max</italic> = 1, <italic>Np</italic> = 15, <italic>G</italic> = 50, <italic>F</italic> = 0.5, <italic>Cr</italic> = 0.9, <inline-formula id="j_info1236_ineq_052"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">τ</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\tau _{1}}$]]></tex-math></alternatives></inline-formula> = 0.1, and <inline-formula id="j_info1236_ineq_053"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">τ</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\tau _{2}}$]]></tex-math></alternatives></inline-formula> = 0.1. Values for <italic>F</italic> and <italic>Cr</italic> were set only within the initial population, and were self-adapting during the evolution process. The maximum number of generations <italic>G</italic> was the stopping criterion for the algorithm.</p>
</sec>
</sec>
<sec id="j_info1236_s_010">
<label>5</label>
<title>Conclusion</title>
<p>Weights in SMT systems can affect the quality of the translations significantly. In this paper, the two phrase-based SMT systems were built successfully, and their weights were optimized using the jDE algorithm. They were tested on the unseen set, and the results were comparable. However, based on extensive experiments, the jDE algorithm obtained better BLEU scores compared to the state-of-the-art algorithms MERT, MIRA and the DE algorithm. The jDE algorithm achieved the <italic>best</italic> BLEU scores of 60.57 for the Slovenian to English SMT system and 51.95 for the English to Slovenian SMT system, followed by MERT and the DE algorithm, and MIRA. We are confident of the promising characteristics of algorithms based on Differential Evolution and good attributes of the jDE self-adaptive mechanism.</p>
<p>Recently, the neural machine translation has emerged as a new paradigm in MT. It also has many parameters that could be optimized to yield better performance.</p>
</sec>
</body>
<back>
<ref-list id="j_info1236_reflist_001">
<title>References</title>
<ref id="j_info1236_ref_001">
<mixed-citation publication-type="other"><string-name><surname>Albat</surname>, <given-names>T.F.</given-names></string-name> (2007). US Patent 0185235. <italic>Systems and Methods for Automatically Estimating a Translation Time.</italic></mixed-citation>
</ref>
<ref id="j_info1236_ref_002">
<mixed-citation publication-type="other"><string-name><surname>Bertoldi</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Haddow</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Fouet</surname>, <given-names>J.-B.</given-names></string-name> (2009). Improved minimum error rate training in moses. <italic>ACL</italic>, 160–167.</mixed-citation>
</ref>
<ref id="j_info1236_ref_003">
<mixed-citation publication-type="chapter"><string-name><surname>Bojar</surname>, <given-names>O.</given-names></string-name>, <string-name><surname>Chatterjee</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Federmann</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Haddow</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Huck</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Hokamp</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Koehn</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Logacheva</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Monz</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Negri</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Post</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Scarton</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Specia</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Turchi</surname>, <given-names>M.</given-names></string-name> (<year>2015</year>). <chapter-title>Findings of the 2015 workshop on statistical machine translation</chapter-title>. In: <source>Proceedings of the Tenth Workshop on Statistical Machine Translation</source>, pp. <fpage>1</fpage>–<lpage>46</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_004">
<mixed-citation publication-type="other"><string-name><surname>Bošković</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Brest</surname>, <given-names>J.</given-names></string-name> (2016). Differential evolution for protein folding optimization based on a three-dimensional AB off-lattice model. <italic>Journal of Molecular Modeling</italic>, 1–15.</mixed-citation>
</ref>
<ref id="j_info1236_ref_005">
<mixed-citation publication-type="other"><string-name><surname>Bošković</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Brest</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Zamuda</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Greiner</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Žumer</surname>, <given-names>V.</given-names></string-name> (2011). History mechanism supported differential evolution for chess evaluation function tuning. <italic>Soft Computing – A Fusion of Foundations, Methodologies and Applications</italic>, 667–682.</mixed-citation>
</ref>
<ref id="j_info1236_ref_006">
<mixed-citation publication-type="other"><string-name><surname>Brest</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Greiner</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Bošković</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Mernik</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Žumer</surname>, <given-names>V.</given-names></string-name> (2006a). Self-Adapting Control Parameters in Differential Evolution: A comparative study on numerical benchmark problems. <italic>IEEE Transactions on Evolutionary Computation</italic>, 646–657.</mixed-citation>
</ref>
<ref id="j_info1236_ref_007">
<mixed-citation publication-type="other"><string-name><surname>Brest</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Bošković</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Greiner</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Žumer</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Sepesy Maučec</surname>, <given-names>M.</given-names></string-name> (2006b). Performance comparison of self-adaptive and adaptive differential evolution algorithms. <italic>Soft Computing – A Fusion of Foundations, Methodologies and Applications</italic>, 617–629.</mixed-citation>
</ref>
<ref id="j_info1236_ref_008">
<mixed-citation publication-type="chapter"><string-name><surname>Bungum</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Gambäck</surname>, <given-names>B.</given-names></string-name> (<year>2010</year>). <chapter-title>Evolutionary algorithms in NLP</chapter-title>. In: <source>Norwegian Artificial Intelligence Symposium</source>, pp. <fpage>7</fpage>–<lpage>18</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_009">
<mixed-citation publication-type="other"><string-name><surname>Callison-Burch</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Osborne</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Koehn</surname>, <given-names>P.</given-names></string-name> (2006). Re-evaluating the role of BLEU in machine translation research. <italic>EACL</italic>, 249–256.</mixed-citation>
</ref>
<ref id="j_info1236_ref_010">
<mixed-citation publication-type="chapter"><string-name><surname>Cherry</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Foster</surname>, <given-names>G.</given-names></string-name> (<year>2012</year>). <chapter-title>Batch tuning strategies for statistical machine translation</chapter-title>. In: <source>NAACL</source>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_011">
<mixed-citation publication-type="chapter"><string-name><surname>Chiang</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Marton</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Resnik</surname>, <given-names>P.</given-names></string-name> (<year>2008</year>). <chapter-title>Online large-margin training of syntactic and structural translation features</chapter-title>. In: <source>EMNLP</source>, pp. <fpage>224</fpage>–<lpage>233</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_012">
<mixed-citation publication-type="chapter"><string-name><surname>Chiang</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Knight</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>W.</given-names></string-name> (<year>2009</year>). <chapter-title>11,001 new features for statistical machine translation</chapter-title>. In: <source>HLT–NAACL</source>, <fpage>218</fpage>–<lpage>226</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_013">
<mixed-citation publication-type="chapter"><string-name><surname>Clark</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Dyer</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Lavie</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Smith</surname>, <given-names>N.</given-names></string-name> (<year>2011</year>). <chapter-title>Better hypothesis testing for statistical machine translation: controlling for optimizer instability</chapter-title>. In: <source>Proceedings of the Association for Computational Lingustics</source>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_014">
<mixed-citation publication-type="chapter"><string-name><surname>Das</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Suganthan</surname>, <given-names>P.N.</given-names></string-name> (<year>2011</year>). <chapter-title>Differential evolution: a survey of the state-of-the-art</chapter-title>. In: <source>IEEE Transactions on Evolutionary Computation</source>, pp. <fpage>27</fpage>–<lpage>54</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_015">
<mixed-citation publication-type="other"><string-name><surname>Das</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Maity</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Qu</surname>, <given-names>B.-Y.</given-names></string-name>, <string-name><surname>Suganthan</surname>, <given-names>P.N.</given-names></string-name> (2011). Real-parameter evolutionary multimodal optimization – a survey of the state-of-the-art. <italic>Swarm and Evolutionary Computation</italic>, 71–88.</mixed-citation>
</ref>
<ref id="j_info1236_ref_016">
<mixed-citation publication-type="journal"><string-name><surname>Das</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Mullick</surname>, <given-names>S.S.</given-names></string-name>, <string-name><surname>Suganthan</surname>, <given-names>P.N.</given-names></string-name> (<year>2016</year>). <article-title>Recent advances in differential evolution – an updated survey</article-title>. <source>Swarm and Evolutionary Computation</source>, <volume>27</volume>, <fpage>1</fpage>–<lpage>30</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.swevo.2016.01.004" xlink:type="simple">https://doi.org/10.1016/j.swevo.2016.01.004</ext-link>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_017">
<mixed-citation publication-type="journal"><string-name><surname>Dorr</surname>, <given-names>B.J.</given-names></string-name>, <string-name><surname>Jordan</surname>, <given-names>P.W.</given-names></string-name>, <string-name><surname>Benoit</surname>, <given-names>J.W.</given-names></string-name> (<year>1999</year>). <article-title>A survey of current paradigms in machine translation</article-title>. <source>Advances in Computers</source>, <volume>49</volume>, <fpage>1</fpage>–<lpage>68</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_018">
<mixed-citation publication-type="chapter"><string-name><surname>Du Bois</surname>, <given-names>J.W.</given-names></string-name>, <string-name><surname>Chafe</surname>, <given-names>W.L.</given-names></string-name>, <string-name><surname>Meyer</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Thompson</surname>, <given-names>S.A.</given-names></string-name>, <string-name><surname>Englebretson</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Martey</surname>, <given-names>N.</given-names></string-name> (<year>2005</year>). <chapter-title>Santa Barbara corpus of spoken American English</chapter-title>. In: <source>Philadelphia: Linguistic Data Consortium</source>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_019">
<mixed-citation publication-type="chapter"><string-name><surname>Dugonik</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Bošković</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Sepesy Maučec</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Brest</surname>, <given-names>J.</given-names></string-name> (<year>2014</year>). <chapter-title>The usage of differential evolution in a statistical machine translation</chapter-title>. In: <source>2014 IEEE Symposium on Differential Evolution, SDE 2014</source>, <conf-loc>Orlando, FL, USA, December 9–12</conf-loc>, <conf-date>2014</conf-date>, pp. <fpage>89</fpage>–<lpage>96</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_020">
<mixed-citation publication-type="chapter"><string-name><surname>Federico</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Bertoldi</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Cettolo</surname>, <given-names>M.</given-names></string-name> (<year>2008</year>). <chapter-title>IRSTLM: an open source toolkit for handling large scale language models</chapter-title>. In: <source>INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association</source>, pp. <fpage>1618</fpage>–<lpage>1621</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_021">
<mixed-citation publication-type="other"><string-name><surname>Glotić</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Zamuda</surname>, <given-names>A.</given-names></string-name> (2015). Short-term combined economic and emission hydrothermal optimization by surrogate differential evolution. <italic>Applied Energy</italic>, 42–56.</mixed-citation>
</ref>
<ref id="j_info1236_ref_022">
<mixed-citation publication-type="other"><string-name><surname>Hasler</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Haddow</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Koehn</surname>, <given-names>P.</given-names></string-name> (2011). Margin infused relaxed algorithm for moses. <italic>The Prague Bulletin of Mathematical Linguistics</italic>, 69–78.</mixed-citation>
</ref>
<ref id="j_info1236_ref_023">
<mixed-citation publication-type="chapter"><string-name><surname>Hopkins</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>May</surname>, <given-names>J.</given-names></string-name> (<year>2011</year>). <chapter-title>Tuning as ranking</chapter-title>. In: <source>EMNLP</source>, pp. <fpage>1352</fpage>–<lpage>1362</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_024">
<mixed-citation publication-type="other"><string-name><surname>Kasparaitis</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Anbinderis</surname>, <given-names>T.</given-names></string-name> (2014). Building text corpus for unit selection synthesis. <italic>Informatica</italic>, 551–562.</mixed-citation>
</ref>
<ref id="j_info1236_ref_025">
<mixed-citation publication-type="chapter"><string-name><surname>Koehn</surname>, <given-names>P.</given-names></string-name> (<year>2004</year>). <chapter-title>Statistical significance tests for machine translation evaluation</chapter-title>. In: <source>Proceedings of EMNLP 2004</source>, pp. <fpage>388</fpage>–<lpage>395</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_026">
<mixed-citation publication-type="chapter"><string-name><surname>Koehn</surname>, <given-names>P.</given-names></string-name> (<year>2005</year>). <chapter-title>Europarl: a parallel corpus for statistical machine translation</chapter-title>. In: <source>MT Summit 2005</source>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_027">
<mixed-citation publication-type="chapter"><string-name><surname>Koehn</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Birch</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Steinberger</surname>, <given-names>R.</given-names></string-name> (<year>2009</year>). <chapter-title>462 machine translation systems for europe</chapter-title>. In: <source>MT Summit XII</source>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_028">
<mixed-citation publication-type="chapter"><string-name><surname>Koehn</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Hoang</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Birch</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Callison-Burch</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Federico</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Bertoldi</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Cowan</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Shen</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Moran</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Zens</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Dyer</surname>, <given-names>C.J.</given-names></string-name>, <string-name><surname>Bojar</surname>, <given-names>O.</given-names></string-name>, <string-name><surname>Constantin</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Herbst</surname>, <given-names>E.</given-names></string-name> (<year>2007</year>). <chapter-title>Moses: Open source toolkit for statistical machine translation</chapter-title>. In: <source>ACL Demo and Poster Session</source>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_029">
<mixed-citation publication-type="journal"><string-name><surname>Lopez</surname>, <given-names>A.</given-names></string-name> (<year>1993</year>). <article-title>Statistical machine translation</article-title>. <source>ACM Computing Surveys</source>, <volume>40</volume>(<issue>3</issue>), <fpage>1</fpage>–<lpage>49</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_030">
<mixed-citation publication-type="chapter"><string-name><surname>Mlakar</surname>, <given-names>U.</given-names></string-name>, <string-name><surname>Brest</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Zamuda</surname>, <given-names>A.</given-names></string-name> (<year>2014</year>). <chapter-title>Differential evolution for self-adaptive triangular brushstrokes</chapter-title>. In: <source>BIOMA Workshop</source>, pp. <fpage>105</fpage>–<lpage>116</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_031">
<mixed-citation publication-type="other"><string-name><surname>Neri</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Tirronen</surname>, <given-names>V.</given-names></string-name> (2010). Recent advances in differential evolution: a survey and experimental analysis. <italic>Artificial Intelligence Review</italic>, 61–106.</mixed-citation>
</ref>
<ref id="j_info1236_ref_032">
<mixed-citation publication-type="chapter"><string-name><surname>Och</surname>, <given-names>F.J.</given-names></string-name> (<year>2003</year>). <chapter-title>Minimum error rate training for statistical machine translation</chapter-title>. In: <source>ACL</source>, 160–167.</mixed-citation>
</ref>
<ref id="j_info1236_ref_033">
<mixed-citation publication-type="chapter"><string-name><surname>Och</surname>, <given-names>F.J.</given-names></string-name>, <string-name><surname>Ney</surname>, <given-names>H.</given-names></string-name> (<year>2000</year>). <chapter-title>Improved statistical alignment models</chapter-title>. In: <source>ACL</source>, pp. <fpage>440</fpage>–<lpage>447</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_034">
<mixed-citation publication-type="chapter"><string-name><surname>Och</surname>, <given-names>F.J.</given-names></string-name>, <string-name><surname>Ney</surname>, <given-names>H.</given-names></string-name> (<year>2002</year>). <chapter-title>Discriminative training and maximum entropy models for statistical machine translation</chapter-title>. In: <source>ACL</source>, pp. <fpage>295</fpage>–<lpage>302</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_035">
<mixed-citation publication-type="chapter"><string-name><surname>Papineni</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Roukos</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Ward</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Zhu</surname>, <given-names>W.-J.</given-names></string-name> (<year>2002</year>). <chapter-title>BLEU: a method for automatic evaluation of machine translation</chapter-title>. In: <source>ACL</source>, pp. <fpage>311</fpage>–<lpage>318</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_036">
<mixed-citation publication-type="book"><string-name><surname>Price</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Storn</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Lampinen</surname>, <given-names>J.</given-names></string-name> (<year>2005</year>). <source>Differential Evolution, A Practical Approach to Global Optimization</source>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_037">
<mixed-citation publication-type="chapter"><string-name><surname>Saon</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Ramabhadran</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Zweig</surname>, <given-names>G.</given-names></string-name> (<year>2006</year>). <chapter-title>On the effect of word error rate on automated quality monitoring</chapter-title>. In: <source>Proceedings of Spoken Language Technology Workshop</source>, pp. <fpage>106</fpage>–<lpage>109</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_038">
<mixed-citation publication-type="journal"><string-name><surname>Sepesy Maučec</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Brest</surname>, <given-names>J.</given-names></string-name> (<year>2010</year>). <article-title>Reduction of morpho-syntactic features in statistical machine translation of highly inflective language</article-title>. <source>Informatica</source>, <fpage>95</fpage>–<lpage>116</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_039">
<mixed-citation publication-type="chapter"><string-name><surname>Snover</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Dorr</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Schwartz</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Micciulla</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Makhoul</surname>, <given-names>J.</given-names></string-name> (<year>2006</year>). <chapter-title>A study of translation edit rate with targeted human annotation</chapter-title>. In: <source>Proceedings of Association for Machine Translation in the Americas</source>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_040">
<mixed-citation publication-type="other"><string-name><surname>Specia</surname>, <given-names>L.</given-names></string-name> (2010). <italic>Fundamental and New Approaches to Statistical Machine Translation.</italic></mixed-citation>
</ref>
<ref id="j_info1236_ref_041">
<mixed-citation publication-type="chapter"><string-name><surname>Steinberger</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Pouliquen</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Widiger</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Ignat</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Erjavec</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Tufis</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Varga</surname>, <given-names>D.</given-names></string-name> (<year>2006</year>). <chapter-title>The JRC-acquis: a multilingual aligned parallel corpus with 20+ languages</chapter-title>. In: <article-title>LREC</article-title>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_042">
<mixed-citation publication-type="other"><string-name><surname>Storn</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Price</surname>, <given-names>K.</given-names></string-name> (1997). Differential evolution – a simple and efficient heuristic for global optimisation over continuous spaces. <italic>Journal of Global Optimization</italic>, 341–359.</mixed-citation>
</ref>
<ref id="j_info1236_ref_043">
<mixed-citation publication-type="chapter"><string-name><surname>Varga</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Nemeth</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Halacsy</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Kornai</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Tron</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Nagy</surname>, <given-names>V.</given-names></string-name> (<year>2005</year>). <chapter-title>Parallel corpora for medium density languages</chapter-title>. In: <source>Proceedings of the RANLP 2005n</source>, pp. <fpage>590</fpage>–<lpage>596</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_044">
<mixed-citation publication-type="chapter"><string-name><surname>Watanabe</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Suzuki</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Tsukada</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Isozaki</surname>, <given-names>H.</given-names></string-name> (<year>2007</year>). <chapter-title>Online large-margin training for statistical machine translation</chapter-title>. In: <source>EMNLP–CoNLL</source>, pp. <fpage>764</fpage>–<lpage>773</lpage>.</mixed-citation>
</ref>
<ref id="j_info1236_ref_045">
<mixed-citation publication-type="other"><string-name><surname>Zhang</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Sanderson</surname>, <given-names>A.C.</given-names></string-name> (2009). JADE: adaptive differential evolution with optional external archive. <italic>IEEE Transactions on Evolutionary Computation</italic>, 945–958.</mixed-citation>
</ref>
<ref id="j_info1236_ref_046">
<mixed-citation publication-type="other"><string-name><surname>Zhou</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Qu</surname>, <given-names>B.-Y.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Zhao</surname>, <given-names>S.-Z.</given-names></string-name>, <string-name><surname>Suganthan</surname>, <given-names>P.N.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>Q.</given-names></string-name> (2011). Multiobjective evolutionary algorithms: a survey of the state of the art. <italic>Swarm and Evolutionary Computation</italic>, 32–49.</mixed-citation>
</ref>
</ref-list>
</back>
</article>