<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">INFORMATICA</journal-id>
<journal-title-group><journal-title>Informatica</journal-title></journal-title-group>
<issn pub-type="epub">1822-8844</issn>
<issn pub-type="ppub">0868-4952</issn>
<issn-l>0868-4952</issn-l>
<publisher>
<publisher-name>Vilnius University</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">INFOR404</article-id>
<article-id pub-id-type="doi">10.15388/20-INFOR404</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Research Article</subject></subj-group></article-categories>
<title-group>
<article-title>Comparison of Classification Algorithms for Detection of Phishing Websites</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Vaitkevicius</surname><given-names>Paulius</given-names></name><email xlink:href="paulius.vaitkevicius@mif.vu.lt">paulius.vaitkevicius@mif.vu.lt</email><xref ref-type="aff" rid="j_infor404_aff_001"/><xref ref-type="corresp" rid="cor1"/><bio>
<p><bold>P. Vaitkevicius</bold> is a doctoral student at Vilnius University, Institute of Data Science and Digital Technologies. His research interests include machine learning, artificial intelligence, cybersecurity, and natural language processing.</p></bio>
</contrib>
<contrib contrib-type="author">
<name><surname>Marcinkevicius</surname><given-names>Virginijus</given-names></name><xref ref-type="aff" rid="j_infor404_aff_001"/><bio>
<p><bold>V. Marcinkevicius</bold> in 2010 received a doctoral degree in computer science (PhD) from Vytautas Magnus University. Since 2001 he is an employee of Vilnius University, Institute of Data Science and Digital Technologies. His present employment is senior researcher and the head or intelligent technologies research group of the Vilnius University, Institute of Data Science and Digital Technologies. His research interests include machine learning, artificial intelligence, cybersecurity, and natural language processing. He is the author of more than 70 scientific publications. He is a member of the Lithuanian Computer Society and Lithuanian Mathematical Society.</p></bio>
</contrib>
<aff id="j_infor404_aff_001"><institution>Vilnius University</institution>, Institute of Data Science and Digital Technologies, Akademijos str. 4, LT-08412 Vilnius, <country>Lithuania</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>*</label>Corresponding author. </corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2020</year></pub-date>
<pub-date pub-type="epub"><day>23</day><month>3</month><year>2020</year></pub-date>
<volume>31</volume><issue>1</issue><fpage>143</fpage><lpage>160</lpage>
<history>
<date date-type="received"><month>9</month><year>2019</year></date>
<date date-type="accepted"><month>1</month><year>2020</year></date>
</history>
<permissions><copyright-statement>© 2020 Vilnius University</copyright-statement><copyright-year>2020</copyright-year><license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/"><license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Phishing activities remain a persistent security threat, with global losses exceeding 2.7 billion USD in 2018, according to the FBI’s Internet Crime Complaint Center. In literature, different generations of phishing websites detection methods have been observed. The oldest methods include manual blacklisting of known phishing websites’ URLs in the centralized database, but they have not been able to detect newly launched phishing websites. More recent studies have attempted to solve phishing websites detection as a supervised machine learning problem on phishing datasets, designed on features extracted from phishing websites’ URLs. These studies have shown some classification algorithms performing better than others on differently designed datasets but have not distinguished the best classification algorithm for the phishing websites detection problem in general. The purpose of this research is to compare classic supervised machine learning algorithms on all publicly available phishing datasets with predefined features and to distinguish the best performing algorithm for solving the problem of phishing websites detection, regardless of a specific dataset design. Eight widely used classification algorithms were configured in Python using the Scikit Learn library and tested for classification accuracy on all publicly available phishing datasets. Later, classification algorithms were ranked by accuracy on different datasets using three different ranking techniques while testing the results for a statistically significant difference using Welch’s T-Test. The comparison results are presented in this paper, showing ensembles and neural networks outperforming other classical algorithms.</p>
</abstract>
<kwd-group>
<label>Key words</label>
<kwd>phishing detection</kwd>
<kwd>classification algorithms</kwd>
<kwd>phishing datasets</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="j_infor404_s_001">
<label>1</label>
<title>Introduction</title>
<p>Phishing is a form of cybercrime employing both social engineering and technical trickery to steal sensitive information, such as digital identity data, credit card data, login credentials, and other personal data, etc. from unsuspecting users by masking as a trustworthy entity. For example, the victim receives an e-mail from an adversary with a threatening message such as a possible bank or social media account termination or fake alert on illegal transaction (Lin Tan <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_015">2016</xref>), directing him to a fraudulent website that mimics a legitimate one. The adversary can use any information that the victim enters in the phishing website to steal identity or money (Whittaker <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_036">2010</xref>).</p>
<p>Although there are many existing anti-phishing solutions, phishers continue to lure more and more victims. In 2018, the Anti-Phishing Working Group (APWG) reported as many as 785,920 unique phishing websites detected, with a 69.5% increase during the last five years of monitoring, from 463,750 unique phishing websites detected in 2014 (Anti-Phishing Working Group, <xref ref-type="bibr" rid="j_infor404_ref_002">2018</xref>). Global losses from phishing activities exceeded 2.7 billion USD in 2018, according to the FBI’s Internet Crime Complaint Center (Internet Crime Complaint Center, <xref ref-type="bibr" rid="j_infor404_ref_010">2019</xref>).</p>
<p>Deceptive phishing attacks are still so successful nowadays because, in essence, they are “human-to-human” assaults performed by professional adversaries who (i) have financial motivation for their actions, (ii) exploit lack of awareness and computer illiteracy of ordinary Internet users (Adebowale <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_001">2019</xref>), and (iii) manage to learn from their previous experience and improve their future attacks to lure new victims more successfully. For this reason, ordinary Internet users cannot keep up with new trends of phishing attacks and learn to differentiate a legitimate website’s URL from a malicious one, relying solely on their efforts.</p>
<p>In order to protect Internet users from criminal assaults, automated detection techniques for phishing websites recognition were started to develop. The oldest approach included manual blacklisting of known phishing websites’ URLs in centralized databases, later used by Internet browsers to alert users about possible threats. The negative aspect of the blacklisting method is that these databases do not include newly launched phishing websites and therefore do not protect from “the zero hour” attacks, as most of the phishing URLs are inserted in centralized databases only 12 hours after the first phishing attack (Jain and Gupta, <xref ref-type="bibr" rid="j_infor404_ref_011">2018a</xref>). More recent studies have attempted to solve phishing websites detection as a supervised machine learning problem. Many authors have conducted experiments using various classification methods and different phishing datasets with predefined features (Chiew <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_006">2019</xref>; Marchal <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_019">2016</xref>; Sahoo <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_022">2017</xref>).</p>
<p>The following open questions motivate our research:</p>
<list>
<list-item id="j_infor404_li_001">
<label>1.</label>
<p>State-of-the-art methods of phishing website detection report classification accuracy (the classification accuracy measure is described in Section <xref rid="j_infor404_s_017">3.4.1</xref>) well above 99.50% and use different classification algorithms: ensembles (Gradient Boosting) (Marchal <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_017">2017</xref>), statistical models (Logistic Regression) (Whittaker <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_036">2010</xref>), probabilistic algorithms (Bayesian Network) (Xiang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_038">2011</xref>), classification trees (C4.5) (Cui <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_007">2018</xref>). There is no common agreement about what classification algorithm is the most accurate in phishing website prediction on datasets with predefined features (Chiew <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_006">2019</xref>).</p>
</list-item>
<list-item id="j_infor404_li_002">
<label>2.</label>
<p>State-of-the-art methods demonstrate such high classification accuracies on highly unbalanced datasets with minority and majority classes. Classification accuracy measure has low construct validity on datasets where class balance is not proportional and show better results for the preferred class. Doubts remain, whether these results were demonstrated due to dataset dependent method design (Chiew <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_006">2019</xref>) or algorithms used in state-of-the-art research are preeminent compared to others.</p>
</list-item>
<list-item id="j_infor404_li_003">
<label>3.</label>
<p>To the best of our knowledge, no studies comparing classic classification algorithms’ performance on all publicly available phishing datasets with predefined features were conducted to answer the questions mentioned above.</p>
</list-item>
</list>
<p>Therefore, the objective of this experimental research is to answer the research question: <italic>which classical classification algorithm is best for solving the phishing websites detection problem, on all publicly available datasets with predefined features?</italic></p>
<p>In this paper we compare eight classic supervised machine learning algorithms of different types (for more details see Section <xref rid="j_infor404_s_011">3.2</xref>) on three publicly available phishing datasets with predefined features being used by the scientific community in experiments with classification algorithms (for more details on datasets see Section <xref rid="j_infor404_s_012">3.3</xref>).</p>
<p>We have designed an experiment where we used such algorithms: 
<list>
<list-item id="j_infor404_li_004">
<label>1.</label>
<p>AdaBoost (Wang, <xref ref-type="bibr" rid="j_infor404_ref_035">2012</xref>),</p>
</list-item>
<list-item id="j_infor404_li_005">
<label>2.</label>
<p>Classification and Regression Tree (CART) (Breiman <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_004">1984</xref>),</p>
</list-item>
<list-item id="j_infor404_li_006">
<label>3.</label>
<p>Gradient Tree Boosting (Friedman, <xref ref-type="bibr" rid="j_infor404_ref_009">2002</xref>),</p>
</list-item>
<list-item id="j_infor404_li_007">
<label>4.</label>
<p>k-Nearest Neighbours (Dudani, <xref ref-type="bibr" rid="j_infor404_ref_008">1976</xref>),</p>
</list-item>
<list-item id="j_infor404_li_008">
<label>5.</label>
<p>Multilayer Perceptron (MLP) with backpropagation (Widrow and Lehr, <xref ref-type="bibr" rid="j_infor404_ref_037">1990</xref>),</p>
</list-item>
<list-item id="j_infor404_li_009">
<label>6.</label>
<p>Naïve–Bayes (Lewis, <xref ref-type="bibr" rid="j_infor404_ref_014">1998</xref>),</p>
</list-item>
<list-item id="j_infor404_li_010">
<label>7.</label>
<p>Random Forest (Breiman, <xref ref-type="bibr" rid="j_infor404_ref_003">2001</xref>),</p>
</list-item>
<list-item id="j_infor404_li_011">
<label>8.</label>
<p>Support-Vector Machine (SVM) with linear kernel (Scholkopf and Smola, <xref ref-type="bibr" rid="j_infor404_ref_024">2001</xref>),</p>
</list-item>
<list-item id="j_infor404_li_012">
<label>9.</label>
<p>Support-Vector Machine with 1st degree polynomial kernel (Scholkopf and Smola, <xref ref-type="bibr" rid="j_infor404_ref_024">2001</xref>),</p>
</list-item>
<list-item id="j_infor404_li_013">
<label>10.</label>
<p>Support-Vector Machine with 2nd degree polynomial (Scholkopf and Smola, <xref ref-type="bibr" rid="j_infor404_ref_024">2001</xref>).</p>
</list-item>
</list> 
We trained and tested all these algorithms upon all three datasets. Later we ranked these algorithms by their classification accuracy measure on different datasets using three different ranking techniques while testing the results for a statistically significant difference using Welch’s T-Test.</p>
<p>The rest of the paper is organized as follows: In Section <xref rid="j_infor404_s_002">2</xref> we give a review of related work. In Section <xref rid="j_infor404_s_006">3</xref> we describe our research methodology. In Section <xref rid="j_infor404_s_022">4</xref> we report our experiment results. We conclude the paper in Section <xref rid="j_infor404_s_023">5</xref>.</p>
</sec>
<sec id="j_infor404_s_002">
<label>2</label>
<title>Related Works</title>
<p>The scientific community has spent a lot of effort to tackle the problem of phishing websites detection. In general, approaches to solving this problem can be grouped into three different categories: (i) blacklisting and heuristic-based approaches (more in Section <xref rid="j_infor404_s_003">2.1</xref>), (ii) supervised machine learning approaches (more in Section <xref rid="j_infor404_s_004">2.2</xref>), and deep learning approaches (more in Section <xref rid="j_infor404_s_005">2.3</xref>) (Sahoo <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_022">2017</xref>).</p>
<sec id="j_infor404_s_003">
<label>2.1</label>
<title>Review of Blacklisting and Heuristics-Based Research</title>
<p>Although there are initiatives to use centralized phishing websites’ URLs blacklisting solutions (e.g., PhishTank, <xref ref-type="fn" rid="j_infor404_fn_001">1</xref> <fn id="j_infor404_fn_001"><label><sup>1</sup></label>
<p><uri>https://www.phishtank.com/</uri>.</p> </fn> Google Safe Browsing API <xref ref-type="fn" rid="j_infor404_fn_002">2</xref> <fn id="j_infor404_fn_002"><label><sup>2</sup></label>
<p><uri>https://developers.google.com/safe-browsing/</uri>.</p> </fn>), this method was proven unsuccessful as it takes time to detect and report a malicious URL, because phishing websites have a very short lifespan (from a few hours to a few days) (Verma and Das, <xref ref-type="bibr" rid="j_infor404_ref_033">2017</xref>). Therefore, new phishing websites’ URL detection methods were started to be implemented by the scientific community.</p>
<p>Heuristic approaches are an improvement on blacklisting techniques where the signatures of common attacks are identified and blacklisted for the future use of Intrusion Detection Systems (Seifert <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_025">2008</xref>). Heuristic methods supersede conventional blacklisting methods as they have better generalization capabilities and can detect threats in new URLs, but they cannot generalize to all types of new threats (Verma and Das, <xref ref-type="bibr" rid="j_infor404_ref_033">2017</xref>).</p>
</sec>
<sec id="j_infor404_s_004">
<label>2.2</label>
<title>Review of Supervised Machine Learning Based Research</title>
<p>During the last decade, most of the machine learning approaches to solve phishing websites detection problem were based on the supervised machine learning methods on phishing datasets with predefined features. In Table <xref rid="j_infor404_tab_001">1</xref>, we present a detailed summary of other authors’ results of this problem solving during the last ten years of study. Our review consists of the publication year, authors, used classifier, dataset composition (numbers of phishing and legitimate websites), and achieved classification accuracy. Results are sorted by accuracy from highest to lowest.</p>
<p>From this review, we can make the following observations:</p>
<list>
<list-item id="j_infor404_li_014">
<label>•</label>
<p>Two best approaches scored as high as 99.9% by accuracy.</p>
</list-item>
<list-item id="j_infor404_li_015">
<label>•</label>
<p>15 best approaches scored above 99.0% by accuracy.</p>
</list-item>
<list-item id="j_infor404_li_016">
<label>•</label>
<p>The most popular algorithms among researchers are Random Forest (8 papers), Naïve–Bayes (7 papers), SVM (7 papers), C4.5 (7 papers <xref ref-type="fn" rid="j_infor404_fn_003">3</xref> <fn id="j_infor404_fn_003"><label><sup>3</sup></label>
<p>Including J48, which is WEKA’s class for generating pruned or unpruned C4.5 decision tree (<uri>http://weka.sourceforge.net/doc.dev/weka/classifiers/trees/J48.html</uri>).</p> </fn>), Logistic Regression (6 papers).</p>
</list-item>
<list-item id="j_infor404_li_017">
<label>•</label>
<p>Best 5 approaches scored above 99.49% and were implemented using different types of classifiers: neural networks, regression, decision trees, ensembles, and Bayesian. We see no prevailing classification method or type of method among top results.</p>
</list-item>
<list-item id="j_infor404_li_018">
<label>•</label>
<p>Best 5 approaches use highly unbalanced datasets, therefore, evaluating classifier performance by accuracy is inadequate and does not tell how this classifier would perform on more balanced datasets.</p>
</list-item>
</list>
<table-wrap id="j_infor404_tab_001">
<label>Table 1</label>
<caption>
<p>Classification approaches to the solution of the phishing websites detection problem.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Reference</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Classifier</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Dataset # phish.</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin"># legit.</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Accuracy</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">(Marchal <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_017">2017</xref>)</td>
<td style="vertical-align: top; text-align: left">Gradient Boosting</td>
<td style="vertical-align: top; text-align: left">100,000</td>
<td style="vertical-align: top; text-align: left">1000</td>
<td style="vertical-align: top; text-align: left">99.90%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Whittaker <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_036">2010</xref>)</td>
<td style="vertical-align: top; text-align: left">Logistic Regression</td>
<td style="vertical-align: top; text-align: left">16,967</td>
<td style="vertical-align: top; text-align: left">1,499,109</td>
<td style="vertical-align: top; text-align: left">99.90%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Xiang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_038">2011</xref>)</td>
<td style="vertical-align: top; text-align: left">Bayesian Network</td>
<td style="vertical-align: top; text-align: left">8,118</td>
<td style="vertical-align: top; text-align: left">4,780</td>
<td style="vertical-align: top; text-align: left">99.60%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Cui <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_007">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">C4.5 <xref ref-type="fn" rid="j_infor404_fn_004">4</xref></td>
<td style="vertical-align: top; text-align: left">24,520</td>
<td style="vertical-align: top; text-align: left">138,925</td>
<td style="vertical-align: top; text-align: left">99.78%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Zhao and Hoi, <xref ref-type="bibr" rid="j_infor404_ref_041">2013</xref>)</td>
<td style="vertical-align: top; text-align: left">Classic Perceptron</td>
<td style="vertical-align: top; text-align: left">990,000</td>
<td style="vertical-align: top; text-align: left">10,000</td>
<td style="vertical-align: top; text-align: left">99.49%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Patil and Patil, <xref ref-type="bibr" rid="j_infor404_ref_020">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left">26,041</td>
<td style="vertical-align: top; text-align: left">26,041</td>
<td style="vertical-align: top; text-align: left">99.44%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Zhao and Hoi, <xref ref-type="bibr" rid="j_infor404_ref_041">2013</xref>)</td>
<td style="vertical-align: top; text-align: left">Label Efficient Perceptron</td>
<td style="vertical-align: top; text-align: left">990,000</td>
<td style="vertical-align: top; text-align: left">10,000</td>
<td style="vertical-align: top; text-align: left">99.41%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Chen <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_005">2014</xref>)</td>
<td style="vertical-align: top; text-align: left">Logistic Regression</td>
<td style="vertical-align: top; text-align: left">1,945</td>
<td style="vertical-align: top; text-align: left">404</td>
<td style="vertical-align: top; text-align: left">99.40%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Cui <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_007">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">SVM</td>
<td style="vertical-align: top; text-align: left">24,520</td>
<td style="vertical-align: top; text-align: left">138,925</td>
<td style="vertical-align: top; text-align: left">99.39%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Patil and Patil, <xref ref-type="bibr" rid="j_infor404_ref_020">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Fast Decision Tree Learner (REPTree)</td>
<td style="vertical-align: top; text-align: left">26,041</td>
<td style="vertical-align: top; text-align: left">26,041</td>
<td style="vertical-align: top; text-align: left">99.19%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Zhao and Hoi, <xref ref-type="bibr" rid="j_infor404_ref_041">2013</xref>)</td>
<td style="vertical-align: top; text-align: left">Cost-sensitive Perceptron</td>
<td style="vertical-align: top; text-align: left">990,000</td>
<td style="vertical-align: top; text-align: left">10,000</td>
<td style="vertical-align: top; text-align: left">99.18%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Patil and Patil, <xref ref-type="bibr" rid="j_infor404_ref_020">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">CART <xref ref-type="fn" rid="j_infor404_fn_005">5</xref></td>
<td style="vertical-align: top; text-align: left">26,041</td>
<td style="vertical-align: top; text-align: left">26,041</td>
<td style="vertical-align: top; text-align: left">99.15%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Jain and Gupta, <xref ref-type="bibr" rid="j_infor404_ref_012">2018b</xref>)</td>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left">2,141</td>
<td style="vertical-align: top; text-align: left">1,918</td>
<td style="vertical-align: top; text-align: left">99.09%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Patil and Patil, <xref ref-type="bibr" rid="j_infor404_ref_020">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">J48 <xref ref-type="fn" rid="j_infor404_fn_006">6</xref></td>
<td style="vertical-align: top; text-align: left">26,041</td>
<td style="vertical-align: top; text-align: left">26,041</td>
<td style="vertical-align: top; text-align: left">99.03%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Verma and Dyer, <xref ref-type="bibr" rid="j_infor404_ref_034">2015</xref>)</td>
<td style="vertical-align: top; text-align: left">J48</td>
<td style="vertical-align: top; text-align: left">11,271</td>
<td style="vertical-align: top; text-align: left">13,274</td>
<td style="vertical-align: top; text-align: left">99.01%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Verma and Dyer, <xref ref-type="bibr" rid="j_infor404_ref_034">2015</xref>)</td>
<td style="vertical-align: top; text-align: left">PART <xref ref-type="fn" rid="j_infor404_fn_007">7</xref></td>
<td style="vertical-align: top; text-align: left">11,271</td>
<td style="vertical-align: top; text-align: left">13,274</td>
<td style="vertical-align: top; text-align: left">98.98%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Verma and Dyer, <xref ref-type="bibr" rid="j_infor404_ref_034">2015</xref>)</td>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left">11,271</td>
<td style="vertical-align: top; text-align: left">13,274</td>
<td style="vertical-align: top; text-align: left">98.88%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Shirazi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_028">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Gradient Boosting</td>
<td style="vertical-align: top; text-align: left">1,000</td>
<td style="vertical-align: top; text-align: left">1,000</td>
<td style="vertical-align: top; text-align: left">98,78%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Cui <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_007">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Naïve–Bayes</td>
<td style="vertical-align: top; text-align: left">24,520</td>
<td style="vertical-align: top; text-align: left">138,925</td>
<td style="vertical-align: top; text-align: left">98,72%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Cui <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_007">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">C4.5</td>
<td style="vertical-align: top; text-align: left">356,215</td>
<td style="vertical-align: top; text-align: left">2,953,700</td>
<td style="vertical-align: top; text-align: left">98.70%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Patil and Patil, <xref ref-type="bibr" rid="j_infor404_ref_020">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Alternating Decision Tree</td>
<td style="vertical-align: top; text-align: left">26,041</td>
<td style="vertical-align: top; text-align: left">26,041</td>
<td style="vertical-align: top; text-align: left">98.48%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Shirazi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_028">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">SVM (Linear)</td>
<td style="vertical-align: top; text-align: left">1,000</td>
<td style="vertical-align: top; text-align: left">1,000</td>
<td style="vertical-align: top; text-align: left">98,46%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Shirazi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_028">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">CART</td>
<td style="vertical-align: top; text-align: left">1,000</td>
<td style="vertical-align: top; text-align: left">1,000</td>
<td style="vertical-align: top; text-align: left">98,42%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Adebowale <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_001">2019</xref>)</td>
<td style="vertical-align: top; text-align: left">Adaptive Neuro-Fuzzy Inference System</td>
<td style="vertical-align: top; text-align: left">6,843</td>
<td style="vertical-align: top; text-align: left">6,157</td>
<td style="vertical-align: top; text-align: left">98.30%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Vanhoenshoven <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_031">2016</xref>)</td>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left">1,541,000</td>
<td style="vertical-align: top; text-align: left">759,000</td>
<td style="vertical-align: top; text-align: left">98.26%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Jain and Gupta, <xref ref-type="bibr" rid="j_infor404_ref_012">2018b</xref>)</td>
<td style="vertical-align: top; text-align: left">Logistic Regression</td>
<td style="vertical-align: top; text-align: left">2,141</td>
<td style="vertical-align: top; text-align: left">1,918</td>
<td style="vertical-align: top; text-align: left">98.25%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Patil and Patil, <xref ref-type="bibr" rid="j_infor404_ref_020">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Random Tree</td>
<td style="vertical-align: top; text-align: left">26,041</td>
<td style="vertical-align: top; text-align: left">26,041</td>
<td style="vertical-align: top; text-align: left">98.18%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Shirazi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_028">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">k-Nearest Neighbuors</td>
<td style="vertical-align: top; text-align: left">1,000</td>
<td style="vertical-align: top; text-align: left">1,000</td>
<td style="vertical-align: top; text-align: left">98,05%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Vanhoenshoven <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_031">2016</xref>)</td>
<td style="vertical-align: top; text-align: left">Multi Layer Perceptron</td>
<td style="vertical-align: top; text-align: left">1,541,000</td>
<td style="vertical-align: top; text-align: left">759,000</td>
<td style="vertical-align: top; text-align: left">97.97%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Verma and Dyer, <xref ref-type="bibr" rid="j_infor404_ref_034">2015</xref>)</td>
<td style="vertical-align: top; text-align: left">Logistic Regression</td>
<td style="vertical-align: top; text-align: left">11,271</td>
<td style="vertical-align: top; text-align: left">13,274</td>
<td style="vertical-align: top; text-align: left">97.70%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Jain and Gupta, <xref ref-type="bibr" rid="j_infor404_ref_012">2018b</xref>)</td>
<td style="vertical-align: top; text-align: left">Naïve–Bayes</td>
<td style="vertical-align: top; text-align: left">2,141</td>
<td style="vertical-align: top; text-align: left">1,918</td>
<td style="vertical-align: top; text-align: left">97.59%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Vanhoenshoven <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_031">2016</xref>)</td>
<td style="vertical-align: top; text-align: left">k-Nearest Neighbours</td>
<td style="vertical-align: top; text-align: left">1,541,000</td>
<td style="vertical-align: top; text-align: left">759,000</td>
<td style="vertical-align: top; text-align: left">97.54%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Shirazi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_028">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">SVM (Gaussian)</td>
<td style="vertical-align: top; text-align: left">1,000</td>
<td style="vertical-align: top; text-align: left">1,000</td>
<td style="vertical-align: top; text-align: left">97,42%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Vanhoenshoven <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_031">2016</xref>)</td>
<td style="vertical-align: top; text-align: left">C5.0 <xref ref-type="fn" rid="j_infor404_fn_008">8</xref></td>
<td style="vertical-align: top; text-align: left">1,541,000</td>
<td style="vertical-align: top; text-align: left">759,000</td>
<td style="vertical-align: top; text-align: left">97.40%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Karabatak and Mustafa, <xref ref-type="bibr" rid="j_infor404_ref_013">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left">6,157</td>
<td style="vertical-align: top; text-align: left">4,898</td>
<td style="vertical-align: top; text-align: left">97.34%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Vanhoenshoven <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_031">2016</xref>)</td>
<td style="vertical-align: top; text-align: left">C4.5</td>
<td style="vertical-align: top; text-align: left">1,541,000</td>
<td style="vertical-align: top; text-align: left">759,000</td>
<td style="vertical-align: top; text-align: left">97.33%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Vanhoenshoven <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_031">2016</xref>)</td>
<td style="vertical-align: top; text-align: left">SVM</td>
<td style="vertical-align: top; text-align: left">1,541,000</td>
<td style="vertical-align: top; text-align: left">759,000</td>
<td style="vertical-align: top; text-align: left">97.11%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Karabatak and Mustafa, <xref ref-type="bibr" rid="j_infor404_ref_013">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Multilayer Perceptron</td>
<td style="vertical-align: top; text-align: left">6,157</td>
<td style="vertical-align: top; text-align: left">4,898</td>
<td style="vertical-align: top; text-align: left">96.90%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Karabatak and Mustafa, <xref ref-type="bibr" rid="j_infor404_ref_013">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Logistic Model Tree (LMT)</td>
<td style="vertical-align: top; text-align: left">6,157</td>
<td style="vertical-align: top; text-align: left">4,898</td>
<td style="vertical-align: top; text-align: left">96.87%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Karabatak and Mustafa, <xref ref-type="bibr" rid="j_infor404_ref_013">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">PART</td>
<td style="vertical-align: top; text-align: left">6,157</td>
<td style="vertical-align: top; text-align: left">4,898</td>
<td style="vertical-align: top; text-align: left">96.76%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Karabatak and Mustafa, <xref ref-type="bibr" rid="j_infor404_ref_013">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">ID3 <xref ref-type="fn" rid="j_infor404_fn_009">9</xref></td>
<td style="vertical-align: top; text-align: left">6,157</td>
<td style="vertical-align: top; text-align: left">4,898</td>
<td style="vertical-align: top; text-align: left">96.49%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Zhao <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_040">2019</xref>)</td>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left">40,000</td>
<td style="vertical-align: top; text-align: left">150,000</td>
<td style="vertical-align: top; text-align: left">96.40%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Karabatak and Mustafa, <xref ref-type="bibr" rid="j_infor404_ref_013">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Random Tree</td>
<td style="vertical-align: top; text-align: left">6,157</td>
<td style="vertical-align: top; text-align: left">4,898</td>
<td style="vertical-align: top; text-align: left">96.37%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Chiew <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_006">2019</xref>)</td>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left">5,000</td>
<td style="vertical-align: top; text-align: left">5,000</td>
<td style="vertical-align: top; text-align: left">96.17%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Jain and Gupta, <xref ref-type="bibr" rid="j_infor404_ref_012">2018b</xref>)</td>
<td style="vertical-align: top; text-align: left">SVM</td>
<td style="vertical-align: top; text-align: left">2,141</td>
<td style="vertical-align: top; text-align: left">1,918</td>
<td style="vertical-align: top; text-align: left">96.16%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Vanhoenshoven <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_031">2016</xref>)</td>
<td style="vertical-align: top; text-align: left">Naïve–Bayes</td>
<td style="vertical-align: top; text-align: left">1,541,000</td>
<td style="vertical-align: top; text-align: left">759,000</td>
<td style="vertical-align: top; text-align: left">95.98%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Shirazi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_028">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Naïve-Bayes</td>
<td style="vertical-align: top; text-align: left">1,000</td>
<td style="vertical-align: top; text-align: left">1,000</td>
<td style="vertical-align: top; text-align: left">95,97%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Karabatak and Mustafa, <xref ref-type="bibr" rid="j_infor404_ref_013">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">J48</td>
<td style="vertical-align: top; text-align: left">6,157</td>
<td style="vertical-align: top; text-align: left">4,898</td>
<td style="vertical-align: top; text-align: left">95.87%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Ma <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_016">2009</xref>)</td>
<td style="vertical-align: top; text-align: left">Logistic Regression</td>
<td style="vertical-align: top; text-align: left">20,500</td>
<td style="vertical-align: top; text-align: left">15,000</td>
<td style="vertical-align: top; text-align: left">95.50%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Karabatak and Mustafa, <xref ref-type="bibr" rid="j_infor404_ref_013">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">JRip <xref ref-type="fn" rid="j_infor404_fn_010">10</xref></td>
<td style="vertical-align: top; text-align: left">6,157</td>
<td style="vertical-align: top; text-align: left">4,898</td>
<td style="vertical-align: top; text-align: left">95.01%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Marchal <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_018">2014</xref>)</td>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left">48,009</td>
<td style="vertical-align: top; text-align: left">48,009</td>
<td style="vertical-align: top; text-align: left">94.91%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Verma and Dyer, <xref ref-type="bibr" rid="j_infor404_ref_034">2015</xref>)</td>
<td style="vertical-align: top; text-align: left">SVM</td>
<td style="vertical-align: top; text-align: left">11,271</td>
<td style="vertical-align: top; text-align: left">13,274</td>
<td style="vertical-align: top; text-align: left">94.79%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Chiew <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_006">2019</xref>)</td>
<td style="vertical-align: top; text-align: left">C4.5</td>
<td style="vertical-align: top; text-align: left">5,000</td>
<td style="vertical-align: top; text-align: left">5,000</td>
<td style="vertical-align: top; text-align: left">94.37%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Karabatak and Mustafa, <xref ref-type="bibr" rid="j_infor404_ref_013">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Randomizable Filtered Classifier</td>
<td style="vertical-align: top; text-align: left">6,157</td>
<td style="vertical-align: top; text-align: left">4,898</td>
<td style="vertical-align: top; text-align: left">94.21%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Chiew <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_006">2019</xref>)</td>
<td style="vertical-align: top; text-align: left">JRip</td>
<td style="vertical-align: top; text-align: left">5,000</td>
<td style="vertical-align: top; text-align: left">5,000</td>
<td style="vertical-align: top; text-align: left">94.17%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Chiew <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_006">2019</xref>)</td>
<td style="vertical-align: top; text-align: left">PART</td>
<td style="vertical-align: top; text-align: left">5,000</td>
<td style="vertical-align: top; text-align: left">5,000</td>
<td style="vertical-align: top; text-align: left">94.13%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_039">2017</xref>)</td>
<td style="vertical-align: top; text-align: left">Extreme Learning Machines (ELM)</td>
<td style="vertical-align: top; text-align: left">2,784</td>
<td style="vertical-align: top; text-align: left">3,121</td>
<td style="vertical-align: top; text-align: left">94.04%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Karabatak and Mustafa, <xref ref-type="bibr" rid="j_infor404_ref_013">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Stochastic Gradient Descent</td>
<td style="vertical-align: top; text-align: left">6,157</td>
<td style="vertical-align: top; text-align: left">4,898</td>
<td style="vertical-align: top; text-align: left">93.95%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Karabatak and Mustafa, <xref ref-type="bibr" rid="j_infor404_ref_013">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Naïve–Bayes</td>
<td style="vertical-align: top; text-align: left">6,157</td>
<td style="vertical-align: top; text-align: left">4,898</td>
<td style="vertical-align: top; text-align: left">93.39%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Karabatak and Mustafa, <xref ref-type="bibr" rid="j_infor404_ref_013">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Bayesian Network</td>
<td style="vertical-align: top; text-align: left">6,157</td>
<td style="vertical-align: top; text-align: left">4,898</td>
<td style="vertical-align: top; text-align: left">92.98%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Chiew <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_006">2019</xref>)</td>
<td style="vertical-align: top; text-align: left">SVM</td>
<td style="vertical-align: top; text-align: left">5,000</td>
<td style="vertical-align: top; text-align: left">5,000</td>
<td style="vertical-align: top; text-align: left">92.20%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Thomas <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_030">2011</xref>)</td>
<td style="vertical-align: top; text-align: left">Logistic Regression</td>
<td style="vertical-align: top; text-align: left">500,000</td>
<td style="vertical-align: top; text-align: left">500,000</td>
<td style="vertical-align: top; text-align: left">90.78%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">(Chiew <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_006">2019</xref>)</td>
<td style="vertical-align: top; text-align: left">Naïve–Bayes</td>
<td style="vertical-align: top; text-align: left">5,000</td>
<td style="vertical-align: top; text-align: left">5,000</td>
<td style="vertical-align: top; text-align: left">84.10%</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">(Verma and Dyer, <xref ref-type="bibr" rid="j_infor404_ref_034">2015</xref>)</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Naïve–Bayes</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">11,271</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">13,274</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">83.88%</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>
<fn id="j_infor404_fn_004"><label><sup>4</sup></label><p>C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan.</p></fn> 
<fn id="j_infor404_fn_005"><label><sup>5</sup></label><p>Classification And Regression Tree.</p></fn>
<fn id="j_infor404_fn_006"><label><sup>6</sup></label><p>WEKA’s class for generating a pruned or unpruned C4.5 decision tree.</p></fn>
<fn id="j_infor404_fn_007"><label><sup>7</sup></label><p>Rule based learner which combines C4.5 trees and RIPPER learning.</p></fn>
<fn id="j_infor404_fn_008"><label><sup>8</sup></label><p>C5.0 is an algorithm used to generate a decision tree developed by Ross Quinlan.</p></fn>
<fn id="j_infor404_fn_009"><label><sup>9</sup></label><p>ID3 (Iterative Dichotomiser 3) is an algorithm used to generate a decision tree developed by Ross Quinlan.</p></fn> 
<fn id="j_infor404_fn_010"><label><sup>10</sup></label><p>A propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by William W. Cohen.</p></fn></p>
</sec>
<sec id="j_infor404_s_005">
<label>2.3</label>
<title>Review of Deep Learning Based Research</title>
<p>During the past few years, novel approaches to solve phishing websites detection problem using deep learning techniques were introduced by the scientific community. Zhao <italic>et al.</italic> have demonstrated that Gated Recurrent Neural Network (GRU) without the need for manual feature creation is capable of classifying malicious URLs with 98.5% accuracy on 240,000 phishing and 150,000 legitimate websites URL samples (Zhao <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_040">2019</xref>). Saxe and Berlin have performed an experiment with Convolutional Neural Network (CNN), automating the process of feature design and extraction from generic raw character strings (malicious URLs, file paths, etc.) and gaining 99.30% accuracy on 19,067,879 randomly sampled websites URLs (Saxe and Berlin, <xref ref-type="bibr" rid="j_infor404_ref_023">2017</xref>). Vazhayil <italic>et al.</italic> have performed a comparative study, demonstrating the 98.7% accuracy of CNN and 98.9% accuracy of CNN Long Short-Term Memory (CNN-LSTM) deep learning networks on 116,101 URL samples (Vazhayil <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_032">2018</xref>). Selvaganapathy <italic>et al.</italic> have implemented a method where feature selection is made using Greedy Multilayer Deep Belief Network (DBN) and binary classification is done using Deep Neural Networks (DNN), capable of classifying malicious URLs with 75.0% accuracy on 17.700 phishing and 10,000 legitimate websites URL samples (Selvaganapathy <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_026">2018</xref>).</p>
</sec>
</sec>
<sec id="j_infor404_s_006">
<label>3</label>
<title>Research Methodology</title>
<p>In this section, we describe our research methodology by defining:</p>
<list>
<list-item id="j_infor404_li_019">
<label>•</label>
<p>experimental design for our research (Section <xref rid="j_infor404_s_007">3.1</xref>),</p>
</list-item>
<list-item id="j_infor404_li_020">
<label>•</label>
<p>algorithms used in the experiment and grounds for algorithm selection (Section <xref rid="j_infor404_s_011">3.2</xref>),</p>
</list-item>
<list-item id="j_infor404_li_021">
<label>•</label>
<p>datasets used in the experiment (Section <xref rid="j_infor404_s_012">3.3</xref>),</p>
</list-item>
<list-item id="j_infor404_li_022">
<label>•</label>
<p>metrics (i.e. Classification Accuracy), methods (i.e. T-test, ranking techniques, etc.), used in the experiment (Section <xref rid="j_infor404_s_016">3.4</xref>).</p>
</list-item>
</list>
<p>We discuss the validity of our results in Section <xref rid="j_infor404_s_021">3.5</xref>.</p>
<sec id="j_infor404_s_007">
<label>3.1</label>
<title>Experimental Design</title>
<p>In this subsection, we present our experimental design, employed to perform the experiment, and answer the research question. The experiment was divided into three parts: (i) training the classifiers for each dataset, (ii) ranking the classifiers, and (iii) creating unified classifier ranking.</p>
<sec id="j_infor404_s_008">
<label>3.1.1</label>
<title>Part I: Training the Classifiers</title>
<p>The objective of this part is to train all the classifiers from Section <xref rid="j_infor404_s_011">3.2</xref> on all the datasets from Section <xref rid="j_infor404_s_012">3.3</xref> for their best possible classification accuracy described in Section <xref rid="j_infor404_s_017">3.4.1</xref>, formula (<xref rid="j_infor404_eq_002">2</xref>). For every dataset, for every classifier, we take the following steps: 
<list>
<list-item id="j_infor404_li_023">
<label>1.</label>
<p>Set up the classifier for a specific dataset in Python’s environment using the Scikit Learn library (Pedregosa <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_021">2011</xref>).</p>
</list-item>
<list-item id="j_infor404_li_024">
<label>2.</label>
<p>Manually select a set of hyper-parameters, referring to Scikit Learn’s user guide.</p>
</list-item>
<list-item id="j_infor404_li_025">
<label>3.</label>
<p>Train and test the classifier using Scikit Learn’s cross-validation (CV) function, with 30 stratified folds.</p>
</list-item>
<list-item id="j_infor404_li_026">
<label>4.</label>
<p>Plot learning curves.</p>
</list-item>
<list-item id="j_infor404_li_027">
<label>5.</label>
<p>Analyse learning curves and make a decision on tuning the hyper-parameters by answering the following questions: 
<list>
<list-item id="j_infor404_li_028">
<label>•</label>
<p>Is the algorithm learning on training data or memorizing it? If the training curve is flat at 100%, then the algorithm is not learning but memorizing the data. To solve this issue, we take actions, e.g. reduce the number of weak learners in an ensemble, reduce the depth of the tree, increase the regularization parameter, etc.</p>
</list-item>
<list-item id="j_infor404_li_029">
<label>•</label>
<p>Is the algorithm prone to overfitting (low bias, high variance) or underfitting (high bias, low variance), or learns “just right”? If the gap between training and CV curves is small, the algorithm is underfitting; if the gap is big, it is overfitting. To solve this issue we take actions to reduce high bias or high variance, e.g. (i) add more training examples, use a smaller set of features, increase the regularization parameter, etc. to reduce high variance, and (ii) use bigger set of features, add polynomial features, increase the number of layers in the neural network, reduce the regularization parameter, etc. to reduce high bias.</p>
</list-item>
</list> 
If the decision is made to tune the hyper-parameters to avoid high bias or high variance, then we start over from Step 2; if not, we go to Step 6.</p>
</list-item>
<list-item id="j_infor404_li_030">
<label>6.</label>
<p>Perform a Wilk–Shapiro test, as described in Section <xref rid="j_infor404_s_016">3.4</xref>, formula (<xref rid="j_infor404_eq_005">4</xref>) to check if the accuracies of classifier’s classification from 30-fold CV testing are normally distributed. If not, take action to normalize the values.</p>
</list-item>
<list-item id="j_infor404_li_031">
<label>7.</label>
<p>Save the results for further actions.</p>
</list-item>
</list> 
We finish this part when all the classifiers are trained on all the datasets, and we have normally distributed sets of classification accuracies for each classifier on each dataset.</p>
</sec>
<sec id="j_infor404_s_009">
<label>3.1.2</label>
<title>Part II: Ranking the Classifiers</title>
<p>The objective of this part is to rank all the classifiers by their classification results within each individual dataset.</p>
<p>For every dataset, we take the following steps:</p>
<list>
<list-item id="j_infor404_li_032">
<label>1.</label>
<p>Using Welch’s T-test, described in Section <xref rid="j_infor404_s_018">3.4.2</xref>, formula (<xref rid="j_infor404_eq_004">3</xref>), check every possible pair of classifiers if their classification results produced in Part I have statistically significant differences. The classification results are distributed by normal distribution.</p>
</list-item>
<list-item id="j_infor404_li_033">
<label>2.</label>
<p>Arrange all classifiers by their mean classification accuracy in descending order.</p>
</list-item>
<list-item id="j_infor404_li_034">
<label>3.</label>
<p>Assign each classifier three ranks using ranking techniques described in Section <xref rid="j_infor404_s_020">3.4.4</xref>. <bold>Important notice:</bold> classifiers whose results have no statistically significant differences receive the same rank.</p>
</list-item>
<list-item id="j_infor404_li_035">
<label>4.</label>
<p>For each ranking technique, distribute points from the highest 10 to the lowest 1 for each classifier, depending on the received rank. Points are calculated using formula (<xref rid="j_infor404_eq_001">1</xref>). 
<disp-formula id="j_infor404_eq_001">
<label>(1)</label>
<alternatives><mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="eqnarray-1"><mml:msub><mml:mrow><mml:mi mathvariant="italic">Points</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">N</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">methods</mml:mi></mml:mrow></mml:msub><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">Rank</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math><tex-math><![CDATA[\[ {\mathit{Points}_{i}}={N_{\mathit{methods}}}-{\mathit{Rank}_{i}}+1,\]]]></tex-math></alternatives>
</disp-formula> 
where</p>
<list>
<list-item id="j_infor404_li_036">
<label>•</label>
<p><inline-formula id="j_infor404_ineq_001">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">N</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">methods</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{N_{\mathit{methods}}}$]]></tex-math></alternatives></inline-formula> – number of algorithms participating in ranking,</p>
</list-item>
<list-item id="j_infor404_li_037">
<label>•</label>
<p><inline-formula id="j_infor404_ineq_002">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">Rank</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{\mathit{Rank}_{i}}$]]></tex-math></alternatives></inline-formula> – rank of <inline-formula id="j_infor404_ineq_003">
<alternatives><mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">th</mml:mi></mml:mrow></mml:msup></mml:math><tex-math><![CDATA[$
{i^{\mathit{th}}}$]]></tex-math></alternatives></inline-formula> algorithm.</p>
</list-item>
</list>
</list-item>
<list-item id="j_infor404_li_038">
<label>5.</label>
<p>Save the results for further actions.</p>
</list-item>
</list>
<p>We finish this part when all the classifiers receive ranks and ranking points by all ranking techniques on all datasets.</p>
</sec>
<sec id="j_infor404_s_010">
<label>3.1.3</label>
<title>Part III: Creating the Unified Classifier Ranking</title>
<p>The objective of this part is to summarize the performance of selected classifiers on all datasets by creating a unified ranking. To do this, we combine rankings for each classifier by adding all the points received upon all datasets. Our experiment is complete after finishing this part.</p>
</sec>
</sec>
<sec id="j_infor404_s_011">
<label>3.2</label>
<title>Algorithms</title>
<p>In the review of supervised machine learning approaches in Section <xref rid="j_infor404_s_004">2.2</xref>, we showed that five best implementations employ different classifiers from separate types of supervised machine learning algorithms: neural networks, decision trees, ensembles, regression, and Bayesian. We also disclosed that the top 3 classifiers by popularity are Random Forest (8 papers), Naïve–Bayes (7 papers), SVM (7 papers).</p>
<p>For our research, we built the set of algorithms consisting of: 
<list>
<list-item id="j_infor404_li_039">
<label>•</label>
<p>three most popular algorithms from the review of related works (Section <xref rid="j_infor404_s_004">2.2</xref>),</p>
</list-item>
<list-item id="j_infor404_li_040">
<label>•</label>
<p>five more algorithms from the Scikit Learn library, belonging to the best performing types of classifiers in the review of related works (Section <xref rid="j_infor404_s_004">2.2</xref>).</p>
</list-item>
</list> 
All possible classical classification algorithms were not used due to the limitation of resources available for this research.</p>
<p>Therefore, in our experiment, we chose to use classic supervised machine learning algorithms such as AdaBoost, Classification and Regression Tree, Gradient Tree Boosting, k-Nearest Neighbours, Multilayer Perceptron with backpropagation, Naïve–Bayes, Random Forest, and Support-Vector Machine.</p>
</sec>
<sec id="j_infor404_s_012">
<label>3.3</label>
<title>Datasets</title>
<p>In our experiment, we used three publicly available phishing websites datasets with predefined features. To our knowledge, these are the only phishing datasets with predefined features made publicly available by other researchers.</p>
<sec id="j_infor404_s_013">
<label>3.3.1</label>
<title>UCI-2015</title>
<p>UCI-2015 dataset from UCI repository <xref ref-type="fn" rid="j_infor404_fn_011">11</xref> <fn id="j_infor404_fn_011"><label><sup>11</sup></label>
<p><uri>https://archive.ics.uci.edu/ml/datasets/phishing+websites</uri>.</p> </fn> was donated in March 2015 by Mohammad, McCluskey (University of Huddersfield), and Thabtah (Canadian University of Dubai). This dataset contains 6,157 phishing and 4,898 legitimate website samples. A total of 30 different URLs, DNS, HTML, JavaScript, and External statistics based features were extracted from these websites.</p>
</sec>
<sec id="j_infor404_s_014">
<label>3.3.2</label>
<title>UCI-2016</title>
<p>UCI-2016 dataset from UCI repository, <xref ref-type="fn" rid="j_infor404_fn_012">12</xref> <fn id="j_infor404_fn_012"><label><sup>12</sup></label>
<p><uri>https://archive.ics.uci.edu/ml/datasets/Website+Phishing</uri>.</p> </fn> contributed by Abdelhamid (Auckland Institute of Studies) in November 2016. This dataset contains 805 phishing and 548 legitimate website samples. A total of 9 features were extracted from these websites.</p>
</sec>
<sec id="j_infor404_s_015">
<label>3.3.3</label>
<title>MDP-2018</title>
<p>MDP-2018 dataset from Mendeley Data portal <xref ref-type="fn" rid="j_infor404_fn_013">13</xref> <fn id="j_infor404_fn_013"><label><sup>13</sup></label>
<p><uri>https://data.mendeley.com/datasets/h3cgnj8hft/1</uri>.</p> </fn> was published by Choon Lin Tan (Universiti Malaysia Sarawak) in March 2018. This balanced dataset contains 5,000 phishing and 5,000 legitimate website samples. A total of 48 features were extracted from these websites.</p>
</sec>
</sec>
<sec id="j_infor404_s_016">
<label>3.4</label>
<title>Measures and Methods</title>
<sec id="j_infor404_s_017">
<label>3.4.1</label>
<title>Classification Accuracy</title>
<p>Classification accuracy in our experiment is the rate of phishing and legitimate websites which are identified correctly with respect to all the websites, defined as follows: 
<disp-formula id="j_infor404_eq_002">
<label>(2)</label>
<alternatives><mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="eqnarray-1"><mml:mi mathvariant="italic">ACCURACY</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac><mml:mrow><mml:mi mathvariant="italic">TP</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="italic">TN</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">TP</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="italic">FP</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="italic">TN</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="italic">FN</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math><tex-math><![CDATA[\[ \mathit{ACCURACY}=\frac{\mathit{TP}+\mathit{TN}}{\mathit{TP}+\mathit{FP}+\mathit{TN}+\mathit{FN}},\]]]></tex-math></alternatives>
</disp-formula> 
where</p>
<list>
<list-item id="j_infor404_li_041">
<label>•</label>
<p><inline-formula id="j_infor404_ineq_004">
<alternatives><mml:math><mml:mi mathvariant="italic">TP</mml:mi></mml:math><tex-math><![CDATA[$
\mathit{TP}$]]></tex-math></alternatives></inline-formula> – number of websites, correctly detected as phishing (True Positive),</p>
</list-item>
<list-item id="j_infor404_li_042">
<label>•</label>
<p><inline-formula id="j_infor404_ineq_005">
<alternatives><mml:math><mml:mi mathvariant="italic">TN</mml:mi></mml:math><tex-math><![CDATA[$
\mathit{TN}$]]></tex-math></alternatives></inline-formula> – number of websites, correctly detected as benign (True Negative),</p>
</list-item>
<list-item id="j_infor404_li_043">
<label>•</label>
<p><inline-formula id="j_infor404_ineq_006">
<alternatives><mml:math><mml:mi mathvariant="italic">FP</mml:mi></mml:math><tex-math><![CDATA[$
\mathit{FP}$]]></tex-math></alternatives></inline-formula> – number of legitimate websites, incorrectly detected as phishing (False Positive),</p>
</list-item>
<list-item id="j_infor404_li_044">
<label>•</label>
<p><inline-formula id="j_infor404_ineq_007">
<alternatives><mml:math><mml:mi mathvariant="italic">FN</mml:mi></mml:math><tex-math><![CDATA[$
\mathit{FN}$]]></tex-math></alternatives></inline-formula> – number of phishing websites, incorrectly detected as legitimate (False Negative).</p>
</list-item>
</list>
<p>We chose classification accuracy as our classification quality quantification metric because: (i) most other researchers use classification accuracy to define results of their experiments (see Section <xref rid="j_infor404_s_002">2</xref>), therefore the comparability of research results is homogeneous throughout our work; (ii) in our experiment we used datasets with equal or close to equal class distributions (there is no significant disparity between the number of positive and negative labels), therefore we do not have the majority and minority classes; (iii) we used cross-validation function with stratification option which generates test sets such that all contain the same distribution of classes, or as close as possible; (iv) we do not directly compare classification results of different datasets by accuracy and do not draw any conclusions from this information; to distinguish top classifiers we employ ranking techniques (see Section <xref rid="j_infor404_s_020">3.4.4</xref>). In these circumstances, classification accuracy is a useful non-bias measure.</p>
</sec>
<sec id="j_infor404_s_018">
<label>3.4.2</label>
<title>Welch’s T-Test</title>
<p>Welch’s T-test in our experiment is used to determine whether the means of classification accuracy results produced by any two classifiers within the same dataset have a statistically significant difference. The two-sample T-test for unpaired data is defined as follows (Snedecor and Cochran, <xref ref-type="bibr" rid="j_infor404_ref_029">1989</xref>).</p>
<p>Let <inline-formula id="j_infor404_ineq_008">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">X</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">X</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{X_{1}},\dots ,{X_{n}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_infor404_ineq_009">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">Y</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">Y</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">m</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{Y_{1}},\dots ,{Y_{m}}$]]></tex-math></alternatives></inline-formula> be two independent samples from normal distributions, and <inline-formula id="j_infor404_ineq_010">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">μ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{\mu _{x}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_infor404_ineq_011">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">μ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">y</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{\mu _{y}}$]]></tex-math></alternatives></inline-formula> be the means of these distributions. Then, the hypothesis to be tested is defined as 
<disp-formula id="j_infor404_eq_003">
<alternatives><mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi mathvariant="italic">H</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">μ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">μ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">y</mml:mi></mml:mrow></mml:msub><mml:mspace width="2.5pt"/><mml:mtext>vs.</mml:mtext><mml:mspace width="5pt"/><mml:msub><mml:mrow><mml:mi mathvariant="italic">H</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow></mml:msub><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">μ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">≠</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">μ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">y</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math><tex-math><![CDATA[\[ {H_{0}}:{\mu _{x}}={\mu _{y}}\hspace{2.5pt}\text{vs.}\hspace{5pt}{H_{A}}:{\mu _{x}}\ne {\mu _{y}}.\]]]></tex-math></alternatives>
</disp-formula>
</p>
<p>The test statistic for testing the hypothesis is calculated as follows: 
<disp-formula id="j_infor404_eq_004">
<label>(3)</label>
<alternatives><mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="eqnarray-1"><mml:mi mathvariant="italic">T</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">X</mml:mi></mml:mrow><mml:mo stretchy="false">¯</mml:mo></mml:mover><mml:mo>−</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">Y</mml:mi></mml:mrow><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac><mml:mrow><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>+</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac><mml:mrow><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">y</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mi mathvariant="italic">m</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac></mml:mstyle><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math><tex-math><![CDATA[\[ T=\frac{\bar{X}-\bar{Y}}{\sqrt{\frac{{S_{x}^{2}}}{n}+\frac{{S_{y}^{2}}}{m}}},\]]]></tex-math></alternatives>
</disp-formula> 
where</p>
<list>
<list-item id="j_infor404_li_045">
<label>•</label>
<p><inline-formula id="j_infor404_ineq_012">
<alternatives><mml:math><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">X</mml:mi></mml:mrow><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:math><tex-math><![CDATA[$
\bar{X}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_infor404_ineq_013">
<alternatives><mml:math><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">Y</mml:mi></mml:mrow><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:math><tex-math><![CDATA[$
\bar{Y}$]]></tex-math></alternatives></inline-formula> are the sample <inline-formula id="j_infor404_ineq_014">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">X</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">X</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{X_{1}},\dots ,{X_{n}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_infor404_ineq_015">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">Y</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">Y</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">m</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{Y_{1}},\dots ,{Y_{m}}$]]></tex-math></alternatives></inline-formula> means,</p>
</list-item>
<list-item id="j_infor404_li_046">
<label>•</label>
<p><italic>n</italic> and <italic>m</italic> are the sample <inline-formula id="j_infor404_ineq_016">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">X</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">X</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{X_{1}},\dots ,{X_{n}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_infor404_ineq_017">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">Y</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">Y</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">m</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{Y_{1}},\dots ,{Y_{m}}$]]></tex-math></alternatives></inline-formula> sizes,</p>
</list-item>
<list-item id="j_infor404_li_047">
<label>•</label>
<p><inline-formula id="j_infor404_ineq_018">
<alternatives><mml:math><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math><tex-math><![CDATA[$
{S_{x}^{2}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_infor404_ineq_019">
<alternatives><mml:math><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">y</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math><tex-math><![CDATA[$
{S_{y}^{2}}$]]></tex-math></alternatives></inline-formula> are the sample <inline-formula id="j_infor404_ineq_020">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">X</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">X</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{X_{1}},\dots ,{X_{n}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_infor404_ineq_021">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">Y</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">Y</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">m</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{Y_{1}},\dots ,{Y_{m}}$]]></tex-math></alternatives></inline-formula> variances.</p>
</list-item>
</list>
<p>We reject the <italic>null</italic> hypothesis <inline-formula id="j_infor404_ineq_022">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">H</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{H_{0}}$]]></tex-math></alternatives></inline-formula> that the two means are equal if <inline-formula id="j_infor404_ineq_023">
<alternatives><mml:math><mml:mo stretchy="false">|</mml:mo><mml:mi mathvariant="italic">T</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:mo mathvariant="normal">&gt;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">t</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi mathvariant="italic">α</mml:mi><mml:mo mathvariant="normal" stretchy="false">/</mml:mo><mml:mn>2</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">v</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
|T|>{t_{1-\alpha /2,v}}$]]></tex-math></alternatives></inline-formula>, where <inline-formula id="j_infor404_ineq_024">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">t</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi mathvariant="italic">α</mml:mi><mml:mo mathvariant="normal" stretchy="false">/</mml:mo><mml:mn>2</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">v</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{t_{1-\alpha /2,v}}$]]></tex-math></alternatives></inline-formula> is the critical value of the <italic>t</italic> distribution with <italic>v</italic> degrees of freedom with our chosen <inline-formula id="j_infor404_ineq_025">
<alternatives><mml:math><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn>0.05</mml:mn></mml:math><tex-math><![CDATA[$
\alpha =0.05$]]></tex-math></alternatives></inline-formula>. Welch’s T-test can only be performed on samples from normal distributions. We used <italic>scipy.stats</italic> package for Python to perform a T-test.</p>
</sec>
<sec id="j_infor404_s_019">
<label>3.4.3</label>
<title>Shapiro–Wilk Test</title>
<p>Shapiro–Wilk test is used to check whether samples came from a normally distributed population (Shapiro and Wilk, <xref ref-type="bibr" rid="j_infor404_ref_027">1965</xref>). This test is defined as follows: 
<disp-formula id="j_infor404_eq_005">
<label>(4)</label>
<alternatives><mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="eqnarray-1"><mml:mi mathvariant="italic">W</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:msubsup><mml:mrow><mml:mo largeop="false" movablelimits="false">∑</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">i</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:mfenced></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:msubsup><mml:mrow><mml:mo largeop="false" movablelimits="false">∑</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow></mml:msubsup><mml:msup><mml:mrow><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>−</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:mrow></mml:mfenced></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mstyle><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math><tex-math><![CDATA[\[ W=\frac{{\left({\textstyle\textstyle\sum _{i=1}^{n}}{a_{i}}{x_{(i)}}\right)^{2}}}{{\textstyle\textstyle\sum _{i=1}^{n}}{\left({x_{i}}-\bar{x}\right)^{2}}},\]]]></tex-math></alternatives>
</disp-formula> 
where</p>
<list>
<list-item id="j_infor404_li_048">
<label>•</label>
<p><inline-formula id="j_infor404_ineq_026">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">i</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{x_{(i)}}$]]></tex-math></alternatives></inline-formula> are the ordered sample values, <inline-formula id="j_infor404_ineq_027">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{x_{(1)}}$]]></tex-math></alternatives></inline-formula> being the smallest,</p>
</list-item>
<list-item id="j_infor404_li_049">
<label>•</label>
<p><inline-formula id="j_infor404_ineq_028">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{a_{i}}$]]></tex-math></alternatives></inline-formula> are constants generated from the means, variances and covariances of the order statistics of a sample of size <italic>n</italic> from normal distributions,</p>
</list-item>
<list-item id="j_infor404_li_050">
<label>•</label>
<p><inline-formula id="j_infor404_ineq_029">
<alternatives><mml:math><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:math><tex-math><![CDATA[$
\bar{x}$]]></tex-math></alternatives></inline-formula> is the sample mean,</p>
</list-item>
<list-item id="j_infor404_li_051">
<label>•</label>
<p><italic>n</italic> is the sample size.</p>
</list-item>
</list>
<p>We reject the <italic>null</italic> hypothesis <inline-formula id="j_infor404_ineq_030">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">H</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{H_{0}}$]]></tex-math></alternatives></inline-formula> that the sample belongs to normal distribution if <inline-formula id="j_infor404_ineq_031">
<alternatives><mml:math><mml:mi mathvariant="italic">W</mml:mi><mml:mo mathvariant="normal">&lt;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">W</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">α</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
W<{W_{\alpha }}$]]></tex-math></alternatives></inline-formula>, where <inline-formula id="j_infor404_ineq_032">
<alternatives><mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">W</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">α</mml:mi></mml:mrow></mml:msub></mml:math><tex-math><![CDATA[$
{W_{\alpha }}$]]></tex-math></alternatives></inline-formula> is the critical threshold. We used <italic>scipy.stats</italic> package for Python to perform a Shapiro–Wilk test.</p>
</sec>
<sec id="j_infor404_s_020">
<label>3.4.4</label>
<title>Ranking Techniques</title>
<p>Ranking techniques used in our research are: 
<list>
<list-item id="j_infor404_li_052">
<label>1.</label>
<p>Standard Competition Ranking (SCR), where equal items get the same ranking number, and then a gap is left in the ranking numbers, i.e. “1224” ranking.</p>
</list-item>
<list-item id="j_infor404_li_053">
<label>2.</label>
<p>Dense Ranking (DR), where equal get the same ranking number, and the next item gets the immediately following ranking number, i.e. “1223” ranking.</p>
</list-item>
<list-item id="j_infor404_li_054">
<label>3.</label>
<p>Fractional Ranking (FR), where equal items get the same ranking number, which is the mean of what they would have under ordinal rankings, i.e. “1 2.5 2.5 4” ranking.</p>
</list-item>
</list>
</p>
</sec>
</sec>
<sec id="j_infor404_s_021">
<label>3.5</label>
<title>Validity</title>
<p>In our experiment we used classification accuracy measure described in Section <xref rid="j_infor404_s_017">3.4.1</xref>, formula (<xref rid="j_infor404_eq_002">2</xref>) and balanced datasets (see Section <xref rid="j_infor404_s_012">3.3</xref>). Classification accuracy has a high construct validity on balanced datasets.</p>
<p>We used the cross-validation procedure with 30 stratified folds to evaluate classification accuracy, which provides an objective measure of how well the model fits and how well it will generalize to new data.</p>
<p>Welch’s T-test was used to measure if the means of classification accuracy results produced by any two classifiers within the same dataset have a statistically significant difference. This test eliminated the possibility to miss-rank the classifiers, whose results had no statistically significant differences.</p>
<p>Three different ranking techniques were introduced to overcome ranking bias, where distinct ranking techniques give different outcomes.</p>
<p>We provide the source code of our experiment to other researchers at <ext-link ext-link-type="uri" xlink:href="https://github.com/PauliusVaitkevicius/Exp001">https://github.com/PauliusVaitkevicius/Exp001</ext-link>.</p>
</sec>
</sec>
<sec id="j_infor404_s_022">
<label>4</label>
<title>Results</title>
<p>In this section we present our experiment results based on the research methodology described in Section <xref rid="j_infor404_s_006">3</xref>.</p>
<p>First, we configured selected classification algorithms (described in Section <xref rid="j_infor404_s_011">3.2</xref>) for each dataset (described in Section <xref rid="j_infor404_s_012">3.3</xref>). We used implementations for all selected algorithms from the Scikit Learn library (version 0.20.1) in Python (version 3.7.1), which provides open-source tools for data mining and data analysis (Pedregosa <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor404_ref_021">2011</xref>). Later, we chose the best fitting hyper-parameters for each algorithm on each dataset with 30-fold cross validation, following our experimental design, described in Section <xref rid="j_infor404_s_007">3.1</xref>. Selected best hyper-parameters for each classifier are described in Table <xref rid="j_infor404_tab_002">2</xref>. Selected best hyper-parameters differ in algorithm configurations for different datasets due to applying the hyper-parameter selection technique, described in Section <xref rid="j_infor404_s_007">3.1</xref> Part I, to datasets with different designs and data quantities.</p>
<table-wrap id="j_infor404_tab_002">
<label>Table 2</label>
<caption>
<p>Hyper-parameters used in the experiment.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Algorithm</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">UCI-2015</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">UCI-2016</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">MDP-2018</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">AdaBoost</td>
<td style="vertical-align: top; text-align: left"><italic># of estimators</italic>: 200</td>
<td style="vertical-align: top; text-align: left"><italic># of estimators</italic>: 50 <italic>Algorithm</italic>: SAMME;</td>
<td style="vertical-align: top; text-align: left"><italic># of estimators</italic>: 200</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">CART</td>
<td style="vertical-align: top; text-align: left"><italic>Max tree depth</italic>: 9; <italic>Split evaluation criteria</italic>: entropy; <italic>Min samples at leaf node</italic>: 2;</td>
<td style="vertical-align: top; text-align: left"><italic>Max tree depth</italic>: 9; <italic>Split evaluation criteria</italic>: entropy; <italic>Min samples at leaf node</italic>: 2;</td>
<td style="vertical-align: top; text-align: left"><italic>Max tree depth</italic>: 5; <italic>Split evaluation criteria</italic>: entropy; <italic>Min samples at leaf node</italic>: 2;</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Gradient Tree Boosting</td>
<td style="vertical-align: top; text-align: left"><italic>Max estimator depth</italic>: 1; <italic>Learning rate</italic>: 1;</td>
<td style="vertical-align: top; text-align: left"><italic>Min samples at leaf node</italic>: 2; <italic>Learning rate</italic>: 1; <italic># of estimators</italic>: 50;</td>
<td style="vertical-align: top; text-align: left"><italic>Max estimator depth</italic>: 1; <italic>Learning rate</italic>: 1;</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">k-Nearest Neighbours</td>
<td style="vertical-align: top; text-align: left"><italic>Number of neighbours:</italic> 5; <italic>Weights:</italic> uniform weights – all points in each neighbourhood are weighted equally; <italic>Algorithm:</italic> auto;</td>
<td style="vertical-align: top; text-align: left"><italic>Number of neighbours:</italic> 5; <italic>Weights:</italic> uniform weights – all points in each neighbourhood are weighted equally; <italic>Algorithm:</italic> auto;</td>
<td style="vertical-align: top; text-align: left"><italic>Number of neighbours:</italic> 5; <italic>Weights:</italic> uniform weights – all points in each neighbourhood are weighted equally; <italic>Algorithm:</italic> auto;</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Naïve–Bayes</td>
<td style="vertical-align: top; text-align: left">Multivariate Bernoulli models;</td>
<td style="vertical-align: top; text-align: left">Multivariate Bernoulli models;</td>
<td style="vertical-align: top; text-align: left">Multivariate Bernoulli models;</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Multilayer Perceptron</td>
<td style="vertical-align: top; text-align: left"><italic>Hidden layers</italic>: 30; <italic>Number of max iterations</italic>: 3000;</td>
<td style="vertical-align: top; text-align: left"><italic>Hidden layers</italic>: 150; <italic>Number of max iterations</italic>: 1000;</td>
<td style="vertical-align: top; text-align: left"><italic>Hidden layers</italic>: 100; <italic>Number of max iterations</italic>: 1000;</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left"><italic># of estimators</italic>: 7; <italic>Max tree depth</italic>: 11; <italic>Split evaluation criteria</italic>: entropy;</td>
<td style="vertical-align: top; text-align: left"><italic># of estimators</italic>: 7; <italic>Max tree depth</italic>: 8; <italic>Split evaluation criteria</italic>: entropy;</td>
<td style="vertical-align: top; text-align: left"><italic># of estimators</italic>: 7; <italic>Max tree depth</italic>: 11; <italic>Split evaluation criteria</italic>: entropy;</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">SVM with linear kernel</td>
<td style="vertical-align: top; text-align: left"><italic>Penalty parameter C:</italic> 1.0; <italic>Kernel:</italic> Linear;</td>
<td style="vertical-align: top; text-align: left"><italic>Penalty parameter C:</italic> 1.0; <italic>Kernel:</italic> Linear;</td>
<td style="vertical-align: top; text-align: left"><italic>Penalty parameter C:</italic> 1.0; <italic>Kernel:</italic> Linear;</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">SVM with 1st deg. pol. kernel</td>
<td style="vertical-align: top; text-align: left"><italic>Penalty parameter C:</italic> 1.0; <italic>Kernel:</italic> Polynomial; <italic>Degree of the polynomial kernel function:</italic> 1;</td>
<td style="vertical-align: top; text-align: left"><italic>Penalty parameter C:</italic> 1.0; <italic>Kernel:</italic> Polynomial; <italic>Degree of the polynomial kernel function:</italic> 1;</td>
<td style="vertical-align: top; text-align: left"><italic>Penalty parameter C:</italic> 1.0; <italic>Kernel:</italic> Polynomial; <italic>Degree of the polynomial kernel function:</italic> 1;</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">SVM with 2nd deg. pol. kernel</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><italic>Penalty parameter C:</italic> 1.0; <italic>Kernel:</italic> Polynomial; <italic>Degree of the polynomial kernel function:</italic> 2;</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><italic>Penalty parameter C:</italic> 1.0; <italic>Kernel:</italic> Polynomial; <italic>Degree of the polynomial kernel function:</italic> 2;</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><italic>Penalty parameter C:</italic> 1.0; <italic>Kernel:</italic> Polynomial; <italic>Degree of the polynomial kernel function:</italic> 2;</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Subsequently, we trained and tested all the classifiers chosen for this experiment on all the datasets. We measured classification performance by accuracy: the ratio of phishing and legitimate URLs, which are classified correctly with respect to all the URLs in the dataset as described in Section <xref rid="j_infor404_s_017">3.4.1</xref>, formula (<xref rid="j_infor404_eq_002">2</xref>). Classification results are given in Table <xref rid="j_infor404_tab_003">3</xref>. Initial results showed that the Gradient Tree Boosting algorithm performed best on MDP-2018 and UCI-2016 datasets, and Multilayer Perceptron with backpropagation performed best on the UCI-2015 dataset.</p>
<table-wrap id="j_infor404_tab_003">
<label>Table 3</label>
<caption>
<p>Classification results by average classification accuracy.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Algorithm</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">UCI-2015</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">UCI-2016</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">MDP-2018</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">AdaBoost</td>
<td style="vertical-align: top; text-align: left">0.9352</td>
<td style="vertical-align: top; text-align: left">0.8495</td>
<td style="vertical-align: top; text-align: left">0.9728</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">CART</td>
<td style="vertical-align: top; text-align: left">0.9363</td>
<td style="vertical-align: top; text-align: left">0.8930</td>
<td style="vertical-align: top; text-align: left">0.9574</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Gradient Tree Boosting</td>
<td style="vertical-align: top; text-align: left">0.9381</td>
<td style="vertical-align: top; text-align: left"><bold>0.9034</bold></td>
<td style="vertical-align: top; text-align: left"><bold>0.9742</bold></td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">k-Nearest Neighbours</td>
<td style="vertical-align: top; text-align: left">0.9481</td>
<td style="vertical-align: top; text-align: left">0.8641</td>
<td style="vertical-align: top; text-align: left">0.8564</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Naïve–Bayes</td>
<td style="vertical-align: top; text-align: left">0.9057</td>
<td style="vertical-align: top; text-align: left">0.8225</td>
<td style="vertical-align: top; text-align: left">0.9177</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Multilayer Perceptron</td>
<td style="vertical-align: top; text-align: left"><bold>0.9722</bold></td>
<td style="vertical-align: top; text-align: left">0.9028</td>
<td style="vertical-align: top; text-align: left">0.9671</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left">0.9525</td>
<td style="vertical-align: top; text-align: left">0.8916</td>
<td style="vertical-align: top; text-align: left">0.9715</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">SVM with linear kernel</td>
<td style="vertical-align: top; text-align: left">0.9271</td>
<td style="vertical-align: top; text-align: left">0.8365</td>
<td style="vertical-align: top; text-align: left">0.9422</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">SVM with 1st deg. pol. kernel</td>
<td style="vertical-align: top; text-align: left">0.9257</td>
<td style="vertical-align: top; text-align: left">0.8328</td>
<td style="vertical-align: top; text-align: left">0.9334</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">SVM with 2nd deg. pol. kernel</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">0.9388</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">0.7152</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">0.9549</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Later, we evaluated all classification results against each other within an individual dataset using Welch’s T-test, as described in Section <xref rid="j_infor404_s_018">3.4.2</xref>, formula (<xref rid="j_infor404_eq_004">3</xref>), to check if they have statistically significant differences. Afterward, we ordered all the classifiers by their performance upon each dataset using three different ranking techniques: SCR, FR, and DR, as described in Section <xref rid="j_infor404_s_007">3.1</xref>. Classifiers, whose results had no statistically significant differences, were given equal ranks. Next, points from the highest 10 to the lowest 1 were distributed to each classifier depending on the assigned rank.</p>
<p>Ranking results for the UCI-2015 dataset are presented in Table <xref rid="j_infor404_tab_004">4</xref>, with Multilayer Perceptron ranking in the first place for all ranking techniques.</p>
<table-wrap id="j_infor404_tab_004">
<label>Table 4</label>
<caption>
<p>Classifier rankings on UCI-2015 dataset.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">SCR rank</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">FR rank</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">DR rank</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Algorithm</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">SCR points</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">FR points</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">DR points</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">Multilayer Perceptron</td>
<td style="vertical-align: top; text-align: left">10</td>
<td style="vertical-align: top; text-align: left">10</td>
<td style="vertical-align: top; text-align: left">10</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">4.5</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">6.5</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">4.5</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">k-Nearest Neighbours</td>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">6.5</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">4.5</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">SVM with 2nd deg. pol. kernel</td>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">6.5</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">4.5</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">Gradient Tree Boosting</td>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">6.5</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">4.5</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">CART</td>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">6.5</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">4.5</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">AdaBoost</td>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">6.5</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">8</td>
<td style="vertical-align: top; text-align: left">8.5</td>
<td style="vertical-align: top; text-align: left">3</td>
<td style="vertical-align: top; text-align: left">SVM with linear kernel</td>
<td style="vertical-align: top; text-align: left">3</td>
<td style="vertical-align: top; text-align: left">2.5</td>
<td style="vertical-align: top; text-align: left">8</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">8</td>
<td style="vertical-align: top; text-align: left">8.5</td>
<td style="vertical-align: top; text-align: left">3</td>
<td style="vertical-align: top; text-align: left">SVM with 1st deg. pol. kernel</td>
<td style="vertical-align: top; text-align: left">3</td>
<td style="vertical-align: top; text-align: left">2.5</td>
<td style="vertical-align: top; text-align: left">8</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">10</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">10</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">4</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Naïve–Bayes</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">1</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">1</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">7</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Results for the UCI-2016 dataset are presented in Table <xref rid="j_infor404_tab_005">5</xref>, showing Multilayer Perceptron, Gradient Tree Boosting, CART, and Random Forest all scoring maximum points, as their classification accuracy had no statistically significant difference.</p>
<table-wrap id="j_infor404_tab_005">
<label>Table 5</label>
<caption>
<p>Classifier rankings on UCI-2016 dataset.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">SCR rank</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">FR rank</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">DR rank</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Algorithm</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">SCR points</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">FR points</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">DR points</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">2.5</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">Gradient Tree Boosting</td>
<td style="vertical-align: top; text-align: left">10</td>
<td style="vertical-align: top; text-align: left">8.5</td>
<td style="vertical-align: top; text-align: left">10</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">2.5</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">Multilayer Perceptron</td>
<td style="vertical-align: top; text-align: left">10</td>
<td style="vertical-align: top; text-align: left">8.5</td>
<td style="vertical-align: top; text-align: left">10</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">2.5</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">CART</td>
<td style="vertical-align: top; text-align: left">10</td>
<td style="vertical-align: top; text-align: left">8.5</td>
<td style="vertical-align: top; text-align: left">10</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">2.5</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left">10</td>
<td style="vertical-align: top; text-align: left">8.5</td>
<td style="vertical-align: top; text-align: left">10</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">5</td>
<td style="vertical-align: top; text-align: left">7</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">k-Nearest Neighbours</td>
<td style="vertical-align: top; text-align: left">6</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">5</td>
<td style="vertical-align: top; text-align: left">7</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">AdaBoost</td>
<td style="vertical-align: top; text-align: left">6</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">5</td>
<td style="vertical-align: top; text-align: left">7</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">SVM with linear kernel</td>
<td style="vertical-align: top; text-align: left">6</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">5</td>
<td style="vertical-align: top; text-align: left">7</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">SVM with 1st deg. pol. kernel</td>
<td style="vertical-align: top; text-align: left">6</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">5</td>
<td style="vertical-align: top; text-align: left">7</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">Naïve–Bayes</td>
<td style="vertical-align: top; text-align: left">6</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">10</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">10</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">3</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">SVM with 2nd deg. pol. kernel</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">1</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">1</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">8</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>And last, ranking results for the MDP-2018 dataset are presented in Table <xref rid="j_infor404_tab_006">6</xref>, with Gradient Tree Boosting, AdaBoost, and Random Forest all ranking in the first place for all ranking techniques.</p>
<table-wrap id="j_infor404_tab_006">
<label>Table 6</label>
<caption>
<p>Classifier rankings on MDP-2018 dataset.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">SCR rank</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">FR rank</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">DR rank</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Algorithm</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">SCR points</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">FR points</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">DR points</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">Gradient Tree Boosting</td>
<td style="vertical-align: top; text-align: left">10</td>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">10</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">AdaBoost</td>
<td style="vertical-align: top; text-align: left">10</td>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">10</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left">10</td>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">10</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">Multilayer Perceptron</td>
<td style="vertical-align: top; text-align: left">7</td>
<td style="vertical-align: top; text-align: left">7</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">5</td>
<td style="vertical-align: top; text-align: left">5.5</td>
<td style="vertical-align: top; text-align: left">3</td>
<td style="vertical-align: top; text-align: left">CART</td>
<td style="vertical-align: top; text-align: left">6</td>
<td style="vertical-align: top; text-align: left">5.5</td>
<td style="vertical-align: top; text-align: left">8</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">5</td>
<td style="vertical-align: top; text-align: left">5.5</td>
<td style="vertical-align: top; text-align: left">3</td>
<td style="vertical-align: top; text-align: left">SVM with 2nd deg. pol. kernel</td>
<td style="vertical-align: top; text-align: left">6</td>
<td style="vertical-align: top; text-align: left">5.5</td>
<td style="vertical-align: top; text-align: left">8</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">7</td>
<td style="vertical-align: top; text-align: left">7.5</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">SVM with linear kernel</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">3.5</td>
<td style="vertical-align: top; text-align: left">7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">7</td>
<td style="vertical-align: top; text-align: left">7.5</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">SVM with 1st deg. pol. kernel</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">3.5</td>
<td style="vertical-align: top; text-align: left">7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">5</td>
<td style="vertical-align: top; text-align: left">Naïve–Bayes</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">6</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">10</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">10</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">6</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">k-Nearest Neighbours</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">1</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">1</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">5</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Finally, combined dataset rankings were calculated in Table <xref rid="j_infor404_tab_007">7</xref>, summing up all the points each classifier has scored for each dataset, showing various sets of algorithms ending up in the 1st place with different ranking techniques. If we rank results using the Standard Competition Ranking technique, we get Random Forest and Gradient Tree Boosting ranked at the top. If we rank results using the Fractional Ranking technique, we get Multilayer Perceptron ranked at the top. If we rank results using the Dense Ranking technique, we get Random Forest, Multilayer Perceptron, and Gradient Tree Boosting ranked at the top. There is no single algorithm ranked at the top using all three ranking techniques.</p>
<table-wrap id="j_infor404_tab_007">
<label>Table 7</label>
<caption>
<p>Combined classifier rankings</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Algorithm</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">SCR points</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">FR points</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">DR points</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Multilayer Perceptron</td>
<td style="vertical-align: top; text-align: left">27</td>
<td style="vertical-align: top; text-align: left">25.5</td>
<td style="vertical-align: top; text-align: left">29</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Gradient Tree Boosting</td>
<td style="vertical-align: top; text-align: left">29</td>
<td style="vertical-align: top; text-align: left">24</td>
<td style="vertical-align: top; text-align: left">29</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Random Forest</td>
<td style="vertical-align: top; text-align: left">29</td>
<td style="vertical-align: top; text-align: left">24</td>
<td style="vertical-align: top; text-align: left">29</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">AdaBoost</td>
<td style="vertical-align: top; text-align: left">25</td>
<td style="vertical-align: top; text-align: left">19.5</td>
<td style="vertical-align: top; text-align: left">28</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">CART</td>
<td style="vertical-align: top; text-align: left">25</td>
<td style="vertical-align: top; text-align: left">20.5</td>
<td style="vertical-align: top; text-align: left">27</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">SVM with 2nd deg. pol. kernel</td>
<td style="vertical-align: top; text-align: left">16</td>
<td style="vertical-align: top; text-align: left">13</td>
<td style="vertical-align: top; text-align: left">25</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">k-Nearest Neighbours</td>
<td style="vertical-align: top; text-align: left">16</td>
<td style="vertical-align: top; text-align: left">11.5</td>
<td style="vertical-align: top; text-align: left">23</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">SVM with linear kernel</td>
<td style="vertical-align: top; text-align: left">13</td>
<td style="vertical-align: top; text-align: left">10</td>
<td style="vertical-align: top; text-align: left">24</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">SVM with 1st deg. pol. kernel</td>
<td style="vertical-align: top; text-align: left">13</td>
<td style="vertical-align: top; text-align: left">10</td>
<td style="vertical-align: top; text-align: left">24</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Naïve–Bayes</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">9</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">7</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">22</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="j_infor404_s_023">
<label>5</label>
<title>Conclusions</title>
<p>In this paper, we provide an answer to our research question: <italic>which classical classification algorithm is best for solving the phishing websites detection problem, on all publicly available datasets with predefined features?</italic> From our research, we make the following conclusions:</p>
<list>
<list-item id="j_infor404_li_055">
<label>1.</label>
<p>Neural Networks, in our case Multilayer Perceptron and ensemble type algorithms (Random Forest, Gradient Tree Boosting, and AdaBoost) perform best for solving the phishing websites detection problem, on datasets used in the experiment.</p>
</list-item>
<list-item id="j_infor404_li_056">
<label>2.</label>
<p>Instance similarity-based and Bayesian classifiers, i.e. SVM, k-Nearest Neighbours, and Naïve–Bayes performance is the poorest for solving the phishing websites detection problem, regardless of a specific dataset design.</p>
</list-item>
<list-item id="j_infor404_li_057">
<label>3.</label>
<p>Results discussed in conclusions #1 and #2 coincide with general trends in related works review (Section <xref rid="j_infor404_s_004">2.2</xref>): best classification results are achieved with neural networks, decision trees, and ensemble types of classification algorithms.</p>
</list-item>
<list-item id="j_infor404_li_058">
<label>4.</label>
<p>Classifiers showing above a 99.0% classification accuracy on highly unbalanced datasets in related works review (Section <xref rid="j_infor404_s_004">2.2</xref>), i.e. Random Forest, SVM, Perceptron, and CART did not score such high accuracy on any balanced dataset in our experiment.</p>
</list-item>
</list>
<p>In future work, hyper-parameter tuning can be automated using the Grid Search algorithm instead of manual expert hyper-parameter evaluation.</p>
</sec>
</body>
<back>
<ref-list id="j_infor404_reflist_001">
<title>References</title>
<ref id="j_infor404_ref_001">
<mixed-citation publication-type="journal"><string-name><surname>Adebowale</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Lwin</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Sánchez</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Hossain</surname>, <given-names>M.</given-names></string-name> (<year>2019</year>). <article-title>Intelligent web-phishing detection and protection scheme using integrated features of images, frames and text</article-title>. <source>Expert Systems with Applications</source>, <volume>115</volume>, <fpage>300</fpage>–<lpage>313</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_002">
<mixed-citation publication-type="other"><string-name><surname>Anti-Phishing Working Group</surname>, <given-names>I.</given-names></string-name> (2018). Phishing Activity Trends Reports.</mixed-citation>
</ref>
<ref id="j_infor404_ref_003">
<mixed-citation publication-type="journal"><string-name><surname>Breiman</surname>, <given-names>L.</given-names></string-name> (<year>2001</year>). <article-title>Random forests</article-title>. <source>Machine Learning</source>, <volume>45</volume>(<issue>1</issue>), <fpage>5</fpage>–<lpage>32</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_004">
<mixed-citation publication-type="book"><string-name><surname>Breiman</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Friedman</surname>, <given-names>J.H.</given-names></string-name>, <string-name><surname>Olshen</surname>, <given-names>R.A.</given-names></string-name>, <string-name><surname>Stone</surname>, <given-names>C.J.</given-names></string-name> (<year>1984</year>). <source>Classification and regression trees</source>. <publisher-name>Wadsworth International Group</publisher-name>, <publisher-loc>Belmont, CA</publisher-loc>, p. <fpage>432</fpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_005">
<mixed-citation publication-type="journal"><string-name><surname>Chen</surname>, <given-names>T.C.</given-names></string-name>, <string-name><surname>Stepan</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Dick</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Miller</surname>, <given-names>J.</given-names></string-name> (<year>2014</year>). <article-title>An anti-phishing system employing diffused information</article-title>. <source>ACM Transactions on Information and System Security</source>, <volume>16</volume>(<issue>4</issue>), <fpage>1</fpage>–<lpage>31</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_006">
<mixed-citation publication-type="journal"><string-name><surname>Chiew</surname>, <given-names>K.L.</given-names></string-name>, <string-name><surname>Tan</surname>, <given-names>C.L.</given-names></string-name>, <string-name><surname>Wong</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Yong</surname>, <given-names>K.S.</given-names></string-name>, <string-name><surname>Tiong</surname>, <given-names>W.K.</given-names></string-name> (<year>2019</year>). <article-title>A new hybrid ensemble feature selection framework for machine learning-based phishing detection system</article-title>. <source>Information Sciences</source>, <volume>484</volume>, <fpage>153</fpage>–<lpage>166</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_007">
<mixed-citation publication-type="journal"><string-name><surname>Cui</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>He</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Yao</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Shi</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Yao</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>He</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Cui</surname>, <given-names>B.</given-names></string-name> (<year>2018</year>). <article-title>Malicious URL detection with feature extraction based on machine learning</article-title>. <source>International Journal of High Performance Computing and Networking</source>, <volume>12</volume>(<issue>2</issue>), <fpage>166</fpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_008">
<mixed-citation publication-type="journal"><string-name><surname>Dudani</surname>, <given-names>S.A.</given-names></string-name> (<year>1976</year>). <article-title>The distance-weighted k-nearest-neighbor rule</article-title>. <source>IEEE Transactions on Systems, Man, and Cybernetics</source>, <volume>SMC-6</volume>(<issue>4</issue>), <fpage>325</fpage>–<lpage>327</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_009">
<mixed-citation publication-type="journal"><string-name><surname>Friedman</surname>, <given-names>J.H.</given-names></string-name> (<year>2002</year>). <article-title>Stochastic gradient boosting</article-title>. <source>Computational Statistics and Data Analysis</source>, <volume>38</volume>(<issue>4</issue>), <fpage>367</fpage>–<lpage>378</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_010">
<mixed-citation publication-type="other"><string-name><surname>Internet Crime Complaint Center</surname></string-name> (<year>2019</year>). <italic>2018 Internet Crime Report</italic>. Tech. Rep., Internet Crime Complaint Center at the Federal Bureau of Investigation of United States of America.</mixed-citation>
</ref>
<ref id="j_infor404_ref_011">
<mixed-citation publication-type="journal"><string-name><surname>Jain</surname>, <given-names>A.K.</given-names></string-name>, <string-name><surname>Gupta</surname>, <given-names>B.B.</given-names></string-name> (<year>2018</year>a). <article-title>A machine learning based approach for phishing detection using hyperlinks information</article-title>. <source>Journal of Ambient Intelligence and Humanized Computing</source>, <fpage>1</fpage>–<lpage>14</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_012">
<mixed-citation publication-type="journal"><string-name><surname>Jain</surname>, <given-names>A.K.</given-names></string-name>, <string-name><surname>Gupta</surname>, <given-names>B.B.</given-names></string-name> (<year>2018</year>b). <article-title>Towards detection of phishing websites on client-side using machine learning based approach</article-title>. <source>Telecommunication Systems</source>, <volume>68</volume>(<issue>4</issue>), <fpage>687</fpage>–<lpage>700</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_013">
<mixed-citation publication-type="chapter"><string-name><surname>Karabatak</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Mustafa</surname>, <given-names>T.</given-names></string-name> (<year>2018</year>). <chapter-title>Performance comparison of classifiers on reduced phishing website dataset</chapter-title>. In: <source>2018 6th International Symposium on Digital Forensic and Security (ISDFS)</source>. <publisher-name>IEEE</publisher-name>, pp. <fpage>1</fpage>–<lpage>5</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_014">
<mixed-citation publication-type="chapter"><string-name><surname>Lewis</surname>, <given-names>D.D.</given-names></string-name> (<year>1998</year>). <chapter-title>Naive (Bayes) at forty: the independence assumption in information retrieval</chapter-title>. In: <source>ECML 1998: Machine Learning: ECML-98</source>. <publisher-name>Springer</publisher-name>, <publisher-loc>Berlin, Heidelberg</publisher-loc>, pp. <fpage>4</fpage>–<lpage>15</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_015">
<mixed-citation publication-type="journal"><string-name><surname>Lin Tan</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Leng Chiew</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Wong</surname>, <given-names>K.S.</given-names></string-name>, <string-name><surname>Nah Sze</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Tan</surname>, <given-names>C.L.</given-names></string-name>, <string-name><surname>Chiew</surname>, <given-names>K.L.</given-names></string-name>, <string-name><surname>Wong</surname>, <given-names>K.S.</given-names></string-name>, <string-name><surname>Sze</surname>, <given-names>S.N.</given-names></string-name> (<year>2016</year>). <article-title>PhishWHO: phishing webpage detection via identity keywords extraction and target domain name finder</article-title>. <source>Decision Support Systems</source>, <volume>88</volume>, <fpage>18</fpage>–<lpage>27</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_016">
<mixed-citation publication-type="chapter"><string-name><surname>Ma</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Saul</surname>, <given-names>L.K.</given-names></string-name>, <string-name><surname>Savage</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Voelker</surname>, <given-names>G.M.</given-names></string-name> (<year>2009</year>). <chapter-title>Beyond blacklists</chapter-title>. In: <source>Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – KDD ’09</source>. <publisher-name>ACM Press</publisher-name>, <publisher-loc>New York, USA</publisher-loc>, p. <fpage>1245</fpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_017">
<mixed-citation publication-type="journal"><string-name><surname>Marchal</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Armano</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Grondahl</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Saari</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Singh</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Asokan</surname>, <given-names>N.</given-names></string-name> (<year>2017</year>). <article-title>Off-the-hook: an efficient and usable client-side phishing prevention application</article-title>. <source>IEEE Transactions on Computers</source>, <volume>66</volume>(<issue>10</issue>), <fpage>1717</fpage>–<lpage>1733</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_018">
<mixed-citation publication-type="journal"><string-name><surname>Marchal</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Francois</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>State</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Engel</surname>, <given-names>T.</given-names></string-name> (<year>2014</year>). <article-title>Phish storm: detecting phishing with streaming analytics</article-title>. <source>IEEE Transactions on Network and Service Management</source>, <volume>11</volume>(<issue>4</issue>), <fpage>458</fpage>–<lpage>471</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_019">
<mixed-citation publication-type="chapter"><string-name><surname>Marchal</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Saari</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Singh</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Asokan</surname>, <given-names>N.</given-names></string-name> (<year>2016</year>). <chapter-title>Know your phish: novel techniques for detecting phishing sites and their targets</chapter-title>. In: <source>2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS)</source>. <publisher-name>IEEE</publisher-name>, pp. <fpage>323</fpage>–<lpage>333</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_020">
<mixed-citation publication-type="journal"><string-name><surname>Patil</surname>, <given-names>D.R.</given-names></string-name>, <string-name><surname>Patil</surname>, <given-names>J.B.</given-names></string-name> (<year>2018</year>). <article-title>Malicious URLs detection using decision tree classifiers and majority voting technique</article-title>. <source>Cybernetics and Information Technologies</source>, <volume>18</volume>(<issue>1</issue>), <fpage>11</fpage>–<lpage>29</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_021">
<mixed-citation publication-type="journal"><string-name><surname>Pedregosa</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Varoquaux</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Gramfort</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Michel</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Thirion</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Grisel</surname>, <given-names>O.</given-names></string-name>, <string-name><surname>Blondel</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Prettenhofer</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Weiss</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Dubourg</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Vanderplas</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Passos</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Cournapeau</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Brucher</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Perrot</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Duchesnay</surname>, <given-names>É.</given-names></string-name> (<year>2011</year>). <article-title>Scikit-learn: machine learning in Python</article-title>. <source>Journal of Machine Learning Research</source>, <volume>12</volume>, <fpage>2825</fpage>–<lpage>2830</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_022">
<mixed-citation publication-type="book"><string-name><surname>Sahoo</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Hoi</surname>, <given-names>S.C.H.</given-names></string-name> (<year>2017</year>). <source>Malicious URL Detection using Machine Learning: A Survey</source>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_023">
<mixed-citation publication-type="other"><string-name><surname>Saxe</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Berlin</surname>, <given-names>K.</given-names></string-name> (2017). eXpose: a character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys. arXiv preprint <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/arXiv:1702.08568">arXiv:1702.08568</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_024">
<mixed-citation publication-type="book"><string-name><surname>Scholkopf</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Smola</surname>, <given-names>A.J.</given-names></string-name> (<year>2001</year>). <source>Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond</source>. <publisher-name>MIT Press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_025">
<mixed-citation publication-type="chapter"><string-name><surname>Seifert</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Welch</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Komisarczuk</surname>, <given-names>P.</given-names></string-name> (<year>2008</year>). <chapter-title>Identification of malicious web pages with static heuristics</chapter-title>. In: <source>2008 Australasian Telecommunication Networks and Applications Conference</source>. <publisher-name>IEEE</publisher-name>, pp. <fpage>91</fpage>–<lpage>96</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_026">
<mixed-citation publication-type="journal"><string-name><surname>Selvaganapathy</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Nivaashini</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Natarajan</surname>, <given-names>H.</given-names></string-name> (<year>2018</year>). <article-title>Deep belief network based detection and categorization of malicious URLs</article-title>. <source>Information Security Journal: A Global Perspective</source>, <volume>27</volume>(<issue>3</issue>), <fpage>145</fpage>–<lpage>161</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_027">
<mixed-citation publication-type="journal"><string-name><surname>Shapiro</surname>, <given-names>S.S.</given-names></string-name>, <string-name><surname>Wilk</surname>, <given-names>M.B.</given-names></string-name> (<year>1965</year>). <article-title>An analysis of variance test for normality (complete samples)</article-title>. <source>Biometrika</source>, <volume>52</volume>(<issue>3–4</issue>), <fpage>591</fpage>–<lpage>611</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_028">
<mixed-citation publication-type="chapter"><string-name><surname>Shirazi</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Bezawada</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Ray</surname>, <given-names>I.</given-names></string-name> (<year>2018</year>). <chapter-title>“Kn0w Thy Doma1n Name”</chapter-title>. In: <source>Proceedings of the 23nd ACM on Symposium on Access Control Models and Technologies – SACMAT ’18</source> Vol. <volume>18</volume>. <publisher-name>ACM Press</publisher-name>, <publisher-loc>New York, USA</publisher-loc>, pp. <fpage>69</fpage>–<lpage>75</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_029">
<mixed-citation publication-type="book"><string-name><surname>Snedecor</surname>, <given-names>G.W.</given-names></string-name>, <string-name><surname>Cochran</surname>, <given-names>W.G.</given-names></string-name> (<year>1989</year>). <source>Statistical Methods</source>, <edition>eight</edition> ed. <publisher-name>Iowa State University Press</publisher-name>, <publisher-loc>Ames, Iowa</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_030">
<mixed-citation publication-type="chapter"><string-name><surname>Thomas</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Grier</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Ma</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Paxson</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Song</surname>, <given-names>D.</given-names></string-name> (<year>2011</year>). <chapter-title>Design and evaluation of a real-time URL spam filtering service</chapter-title>. In: <source>2011 IEEE Symposium on Security and Privacy</source>. <publisher-name>IEEE</publisher-name>, pp. <fpage>447</fpage>–<lpage>462</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_031">
<mixed-citation publication-type="chapter"><string-name><surname>Vanhoenshoven</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Napoles</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Falcon</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Vanhoof</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Koppen</surname>, <given-names>M.</given-names></string-name> (<year>2016</year>). <chapter-title>Detecting malicious URLs using machine learning techniques</chapter-title>. In: <source>2016 IEEE Symposium Series on Computational Intelligence (SSCI)</source>. <publisher-name>IEEE</publisher-name>, pp. <fpage>1</fpage>–<lpage>8</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_032">
<mixed-citation publication-type="chapter"><string-name><surname>Vazhayil</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Vinayakumar</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Soman</surname>, <given-names>K.</given-names></string-name> (<year>2018</year>). <chapter-title>Comparative study of the detection of malicious URLs using shallow and deep networks</chapter-title>. In: <source>2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT)</source>. <publisher-name>IEEE</publisher-name>, pp. <fpage>1</fpage>–<lpage>6</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_033">
<mixed-citation publication-type="chapter"><string-name><surname>Verma</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Das</surname>, <given-names>A.</given-names></string-name> (<year>2017</year>). <chapter-title>What’s in a URL</chapter-title>. In: <source>Proceedings of the 3rd ACM on International Workshop on Security And PrivacyAnalytics – IWSPA ’17</source>. <publisher-name>ACM Press</publisher-name>, <publisher-loc>New York, USA</publisher-loc>, pp. <fpage>55</fpage>–<lpage>63</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_034">
<mixed-citation publication-type="chapter"><string-name><surname>Verma</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Dyer</surname>, <given-names>K.</given-names></string-name> (<year>2015</year>). <chapter-title>On the character of phishing URLs</chapter-title>. In: <source>Proceedings of the 5th ACM Conference on Data and Application Security and Privacy – CODASPY ’15</source>. <publisher-name>ACM Press</publisher-name>, <publisher-loc>New York, USA</publisher-loc>, pp. <fpage>111</fpage>–<lpage>122</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_035">
<mixed-citation publication-type="journal"><string-name><surname>Wang</surname>, <given-names>R.</given-names></string-name> (<year>2012</year>). <article-title>AdaBoost for feature selection, classification and its relation with SVM, a review</article-title>. <source>Physics Procedia</source>, <volume>25</volume>, <fpage>800</fpage>–<lpage>807</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_036">
<mixed-citation publication-type="chapter"><string-name><surname>Whittaker</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Ryner</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Nazif</surname>, <given-names>M.</given-names></string-name> (<year>2010</year>). <chapter-title>Large-scale automatic classification of phishing pages</chapter-title>. In: <source>The 17th Annual Network and Distributed System Security Symposium (NDSS ’10)</source>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_037">
<mixed-citation publication-type="journal"><string-name><surname>Widrow</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Lehr</surname>, <given-names>M.A.</given-names></string-name> (<year>1990</year>). <article-title>30 years of adaptive neural networks: perceptron, madaline, and backpropagation</article-title>. <source>Proceedings of the IEEE</source>, <volume>78</volume>(<issue>9</issue>), <fpage>1415</fpage>–<lpage>1442</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_038">
<mixed-citation publication-type="journal"><string-name><surname>Xiang</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Hong</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Rose</surname>, <given-names>C.P.</given-names></string-name>, <string-name><surname>Cranor</surname>, <given-names>L.</given-names></string-name> (<year>2011</year>). <article-title>CANTINA+: a feature-rich machine learning framework for detecting phishing web sites</article-title>. <source>ACM Transactions on Information and System Security</source>, <volume>14</volume>(<issue>2</issue>), <fpage>1</fpage>–<lpage>28</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_039">
<mixed-citation publication-type="journal"><string-name><surname>Zhang</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Jiang</surname>, <given-names>Q.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>C.</given-names></string-name> (<year>2017</year>). <article-title>Two-stage ELM for phishing Web pages detection using hybrid features</article-title>. <source>World Wide Web</source>, <volume>20</volume>(<issue>4</issue>), <fpage>797</fpage>–<lpage>813</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_040">
<mixed-citation publication-type="chapter"><string-name><surname>Zhao</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Ma</surname>, <given-names>Q.</given-names></string-name>, <string-name><surname>Cheng</surname>, <given-names>Z.</given-names></string-name> (<year>2019</year>). <chapter-title>Classifying malicious urls using gated recurrent neural networks</chapter-title>. In: <source>International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing</source>. <publisher-name>Springer</publisher-name>, pp. <fpage>385</fpage>–<lpage>394</lpage>.</mixed-citation>
</ref>
<ref id="j_infor404_ref_041">
<mixed-citation publication-type="chapter"><string-name><surname>Zhao</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Hoi</surname>, <given-names>S.C.</given-names></string-name> (<year>2013</year>). <chapter-title>Cost-sensitive online active learning with application to malicious URL detection</chapter-title>. In: <source>Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – KDD ’13</source>. <publisher-name>ACM Press</publisher-name>, <publisher-loc>New York, USA</publisher-loc>, p. <fpage>919</fpage>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>