After Morris and Thompson wrote the first paper on password security in 1979, strict password policies have been enforced to make sure users follow the rules on passwords. Many such policies require users to select and use a system-generated password. The objective of this paper is to analyse the effectiveness of strict password management policies with respect to how users remember system-generated passwords of different textual types – plaintext strings, passphrases, and hybrid graphical-textual PsychoPass passwords. In an experiment, participants were assigned a random string, passphrase, and PsychoPass passwords and had to memorize them. Surprisingly, no one has remembered either the random string or the passphrase, whereas only 10% of the participants remembered their PsychoPass password. The policies where administrators let systems assign passwords to users are not appropriate. Although PsychoPass passwords are easier to remember, the recall rate of any system-assigned password is below the acceptable level. The findings of this study explain that system-assigned strong passwords are inappropriate and put unacceptable memory burden on users.
Pub. online:1 Jan 2014Type:Research ArticleOpen Access
Volume 25, Issue 3 (2014), pp. 385–399
Background: In the area of artificial learners, not much research on the question of an appropriate description of artificial learner's (empirical) performance has been conducted. The optimal solution of describing a learning problem would be a functional dependency between the data, the learning algorithm's internal specifics and its performance. Unfortunately, a general, restrictions-free theory on performance of arbitrary artificial learners has not been developed yet.
Objective: The objective of this paper is to investigate which function is most appropriately describing the learning curve produced by C4.5 algorithm.
Methods: The J48 implementation of the C4.5 algorithm was applied to datasets (n=121) from publicly available repositories (e.g. UCI) in step wise k-fold cross-validation. First, four different functions (power, linear, logarithmic, exponential) were fit to the measured error rates. Where the fit was statistically significant (n=86), we measured the average mean squared error rate for each function and its rank. The dependent samples T-test was performed to test whether the differences between mean squared error are significantly different, and Wilcoxon's signed rank test was used to test whether the differences between ranks are significant.
Results: The decision trees error rate can be successfully modeled by an exponential function. In a total of 86 datasets, exponential function was a better descriptor of error rate function in 64 of 86 cases, power was best in 13, logarithmic in 3, and linear in 6 out of 86 cases. Average mean squared error across all datasets was 0.052954 for exponential function, and was significantly different at P=0.001 from power and at P=0.000 from linear function. The results also show that exponential function's rank is significantly different at any reasonable threshold (P=0.000) from the rank of any other model.
Conclusion: Our findings are consistent with tests performed in the area of human cognitive performance, e.g. with works by Heathcote et al. (2000), who were observing that the exponential function is best describing an individual learner. In our case we did observe an individual learner (C4.5 algorithm) at different tasks. The work can be used to forecast and model the future performance of C4.5 when not all data have been used or there is a need to obtain more data for better accuracy.
Pub. online:1 Jan 2012Type:Research ArticleOpen Access
Volume 23, Issue 4 (2012), pp. 521–536
In a supervised learning, the relationship between the available data and the performance (what is learnt) is not well understood. How much data to use, or when to stop the learning process, are the key questions.
In the paper, we present an approach for an early assessment of the extracted knowledge (classification models) in the terms of performance (accuracy). The key questions are answered by detecting the point of convergence, i.e., where the classification model's performance does not improve any more even when adding more data items to the learning set. For the learning process termination criteria we developed a set of equations for detection of the convergence that follow the basic principles of the learning curve. The developed solution was evaluated on real datasets. The results of the experiment prove that the solution is well-designed: the learning process stopping criteria are not subjected to local variance and the convergence is detected where it actually has occurred.
Pub. online:1 Jan 2007Type:Research ArticleOpen Access
Volume 18, Issue 3 (2007), pp. 343–362
One of the tasks of data mining is classification, which provides a mapping from attributes (observations) to pre-specified classes. Classification models are built by using underlying data. In principle, the models built with more data yield better results. However, the relationship between the available data and the performance is not well understood, except that the accuracy of a classification model has diminishing improvements as a function of data size. In this paper, we present an approach for an early assessment of the extracted knowledge (classification models) in the terms of performance (accuracy), based on the amount of data used. The assessment is based on the observation of the performance on smaller sample sizes. The solution is formally defined and used in an experiment. In experiments we show the correctness and utility of the approach.
Pub. online:1 Jan 2003Type:Research ArticleOpen Access
Volume 14, Issue 3 (2003), pp. 277–288
In the paper, we present an algorithm that can be applied to protect data before a data mining process takes place. The data mining, a part of the knowledge discovery process, is mainly about building models from data. We address the following question: can we protect the data and still allow the data modelling process to take place? We consider the case where the distributions of original data values are preserved while the values themselves change, so that the resulting model is equivalent to the one built with original data. The presented formal approach is especially useful when the knowledge discovery process is outsourced. The application of the algorithm is demonstrated through an example.
Volume 16, Issue 2 (2005), pp. 295–312
Software size is an important attribute in software project planning. Several methods for software size estimation are available; most of them are based on function points. Albrecht introduced function points as a technologically independent method with its own software abstraction layer. However, it is difficult to apply original abstraction elements to current technologies. Therefore researchers introduced additional rules and mappings for object-based solutions. In this paper several mapping strategies are discussed and compared. Based on the similarities in compared mappings, a common mapping strategy is then defined. This mapping is then tested on the reference application portfolio containing five applications. The aim of the test scenario is to evaluate the impact of the diverse detail levels in the class diagrams on software size measurement. Although the question of how to perform quality size measurements in object-oriented projects remains unanswered, the paper gives valuable information on the topic, supported by mathematics.