Pub. online:23 Mar 2020Type:Research ArticleOpen Access
Volume 31, Issue 1 (2020), pp. 143–160
Phishing activities remain a persistent security threat, with global losses exceeding 2.7 billion USD in 2018, according to the FBI’s Internet Crime Complaint Center. In literature, different generations of phishing websites detection methods have been observed. The oldest methods include manual blacklisting of known phishing websites’ URLs in the centralized database, but they have not been able to detect newly launched phishing websites. More recent studies have attempted to solve phishing websites detection as a supervised machine learning problem on phishing datasets, designed on features extracted from phishing websites’ URLs. These studies have shown some classification algorithms performing better than others on differently designed datasets but have not distinguished the best classification algorithm for the phishing websites detection problem in general. The purpose of this research is to compare classic supervised machine learning algorithms on all publicly available phishing datasets with predefined features and to distinguish the best performing algorithm for solving the problem of phishing websites detection, regardless of a specific dataset design. Eight widely used classification algorithms were configured in Python using the Scikit Learn library and tested for classification accuracy on all publicly available phishing datasets. Later, classification algorithms were ranked by accuracy on different datasets using three different ranking techniques while testing the results for a statistically significant difference using Welch’s T-Test. The comparison results are presented in this paper, showing ensembles and neural networks outperforming other classical algorithms.
Pub. online:1 Jan 2012Type:Research ArticleOpen Access
Volume 23, Issue 3 (2012), pp. 335–355
Glaucoma is one of the most insidious eye diseases the occurrence and progression of which a human does not feel. This article provides a brief overview of the eye nerve parameterization methods and algorithms. Parameterization itself is an important task that provides and uniquely defines the structure of the optic nerve disc and further can be used in disease detection or other studies that require a parametric estimate of the eye fundus pattern. So far, planimetric completely automated parameterization of excavation from eye fundus images has not been investigated in detail in the scientific literature. In this article, the authors describe an automated excavation and parameterization algorithm and make the correlation analysis of parameters obtained by both automated and interactive techniques. The obtained results are then compared with those produced by Optical Coherence and Heidelberg Retina Tomography. Finally, the article discusses glaucoma disease detection abilities using the estimated parameters of the eye fundus structures, obtained by different parameterization techniques.
Pub. online:1 Jan 2011Type:Research ArticleOpen Access
Volume 22, Issue 4 (2011), pp. 507–520
The most classical visualization methods, including multidimensional scaling and its particular case – Sammon's mapping, encounter difficulties when analyzing large data sets. One of possible ways to solve the problem is the application of artificial neural networks. This paper presents the visualization of large data sets using the feed-forward neural network – SAMANN. This back propagation-like learning rule has been developed to allow a feed-forward artificial neural network to learn Sammon's mapping in an unsupervised way. In its initial form, SAMANN training is computation expensive. In this paper, we discover conditions optimizing the computational expenditure in visualization even of large data sets. It is shown possibility to reduce the original dimensionality of data to a lower one using small number of iterations. The visualization results of real-world data sets are presented.
Pub. online:1 Jan 2007Type:Research ArticleOpen Access
Volume 18, Issue 2 (2007), pp. 187–202
In this paper, the relative multidimensional scaling method is investigated. This method is designated to visualize large multidimensional data. The method encompasses application of multidimensional scaling (MDS) to the so-called basic vector set and further mapping of the remaining vectors from the analyzed data set. In the original algorithm of relative MDS, the visualization process is divided into three steps: the set of basis vectors is constructed using the k-means clustering method; this set is projected onto the plane using the MDS algorithm; the set of remaining data is visualized using the relative mapping algorithm. We propose a modification, which differs from the original algorithm in the strategy of selecting the basis vectors. The experimental investigation has shown that the modification exceeds the original algorithm in the visualization quality and computational expenses. The conditions, where the relative MDS efficiency exceeds that of standard MDS, are estimated.