After Morris and Thompson wrote the first paper on password security in 1979, strict password policies have been enforced to make sure users follow the rules on passwords. Many such policies require users to select and use a system-generated password. The objective of this paper is to analyse the effectiveness of strict password management policies with respect to how users remember system-generated passwords of different textual types – plaintext strings, passphrases, and hybrid graphical-textual PsychoPass passwords. In an experiment, participants were assigned a random string, passphrase, and PsychoPass passwords and had to memorize them. Surprisingly, no one has remembered either the random string or the passphrase, whereas only 10% of the participants remembered their PsychoPass password. The policies where administrators let systems assign passwords to users are not appropriate. Although PsychoPass passwords are easier to remember, the recall rate of any system-assigned password is below the acceptable level. The findings of this study explain that system-assigned strong passwords are inappropriate and put unacceptable memory burden on users.
Suppose you just bought a brand new car – on average you would spend a bit more than $36.000 in the USA (Buehler and Mrasek,
The following is a historical list of sample breaches that are originating from weak passwords or password management policies. The list is far from being complete; it only gives a glimpse into the variety, scope and damages done by hacking into passwords:
In 1978, Stanley Rifkin obtained the electronic transfer code for the Security Pacific Bank and used the code to transfer $13 million from Security Pacific to his Swiss bank account (Tom,
In 1986, a group of German hackers penetrated dozens of military, government, and commercial computer systems by cracking passwords of legitimate users and system administrators. They were looking for military information that could be sold to the Soviet Union (Stoll,
In April 1994, two English teenagers penetrated several systems through the Air Force’s Rome (New York) Laboratory. Among others, they obtained all of data stored on the Korean Atomic Research Institute system and deposited it on Rome Lab’s system. Initially it was unclear whether the Korean systems belonged to North Korea or South Korea. The concern was that if it was North Korea, the North Koreans would think the logical transfer of the storage space was an intrusion by the US Air Force, which could be perceived as an aggressive act of war (USA,
In November 1998, Robert Morris, Jr., a student at Cornell University created what later became known as the first computer worm distributed via the Internet. It contained a bug that caused it to propagate itself far faster than Morris intended. While no known alteration or destruction of data occurred, the program filled all available memory space on infected computers, bringing them to a grinding halt. The cost of clearing memory space and restarting systems was estimated at US$ 100 million. A key element of the Internet worm involved attempts to discover user passwords. It exploited the tendency of users to choose easy-to-remember passwords and used lists of words, including the standard online dictionary, name lists, and combinations of four-digit numbers, as potential passwords (Seeley,
In June 2005, the hackers broke into CardSystems’ database. The company did not encrypt any of users’ information. The names, accounts numbers, and verification codes of more than 40 million card holders were stolen and exposed (Krim and Barbaro,
An intrusion into TJX’s payment system took place in July 2005, but was not detected until mid-December 2006. Between 45,6 and 94 million credit and debit card numbers were stolen (Pereira,
In April 2011, the Sony Playstation Network outage has affected 77 million users and the costs are estimated at more than US$171 million (Hachman,
The LinkedIn password leak in June 2012 has exposed more than 6,5 million users (Kamp,
In April 2013, the hackers have obtained personal information of 50 million LivingSocial’s users (Acohido,
In 2017, the largest U.S. credit bureau, Equifax, suffered a breach that exposed the personal data of 143 million people, including Social Security numbers. It was among the worst breaches on record because of the amount of sensitive information stolen (Gressin,
A comprehensive list of breaches since 2005 can be found at Privacy Rights Clearinghouse (PRC,
A typical survey evaluating the generation and use of passwords revealed that users have several password uses and the average password has more than one application. Two thirds of passwords are designed around one’s personal characteristics, with most of the remainder relating to relatives, friends or lovers. Proper names and birthdays are the primary information used in constructing passwords, accounting for about half of all password constructions. Almost all respondents reuse passwords, and about two thirds of password uses are duplications. Passwords have been forgotten by a third of respondents, and over half keep a written record of them (Brown
It seems that nothing has been learnt and changed in the course of almost 50 years. Most researchers claim that users and their passwords are the weakest link (Adams and Sasse,
The most natural question is: why we have so many password-related breaches? The answer is relatively simple: passwords need to be as long and as complex as possible to render guessing, dictionary and brute-force attacks prohibitively expensive and time consuming; yet at the same time passwords need to be memorable and simple to support user experience.
One of the basic principles of security (Stallings,
The time required for breaking the password is all we can count on. Let us assume that the useful lifetime of a stored information is 60 years, which is a typical assumption for medical data (Brumen
When we come to 10 (or more) characters to remember, they constitute a much larger corpus than is the capacity of a human memory, where the well-known 7 ± 2 principle applies (Miller,
Starting from the findings of first research on users’ role in password security almost two decades ago (Adams and Sasse,
This work contributes to understanding of the impact of strict password management policies to usability and memorability of such passwords. Namely, previous research has predominantly dealt with memorability and/or usability of user-generated passwords, see e.g. Biddle
The rest of the paper is organized as follows: the next sub-section presents the state of the art in the field by review of related works and is followed by a presentation of PsychoPass method; the articulated research question concludes this introductory section. In Section
User authentication schemes are based on the following principles (or combinations thereof): “what you know”, “what you are” and “what you have” (Pfleeger and Pfleeger,
The “what you know” principle-based authentications relies on passwords of two types: textual and graphical ones (Davis
The strongest passwords by far are those randomly selected, but they are at the same time the hardest to remember and thus subject to unsafe practices (Pfleeger and Pfleeger,
Textual password creation schemes.
Principle | Advantages | Disadvantages | Source |
Personal characteristics, e.g. birthdate, names, pets, addresses, etc. | Easy to remember | Easy to crack, easy to guess | (FIPS, |
Cognitive, a randomly selected set of personal questions which only an authorized user can answer correctly | High recall rate | Easy to guess by family and friends | (Brostoff, |
Pass-sentences and pass-phrases | Memorable, cracking resistant | Inappropriate for mobile use, inconvenient, useless for repeated use | (Brostoff, |
Randomly generated pronounceable passwords | Memorable, brute force cracking resistant | Vulnerable to a special dictionary attack | (Ganesan |
Mnemonic, a memorable phrase (e.g. first letters of a sentence) | Memorable, brute force cracking resistant | Vulnerable to a special dictionary attack | (Kuo |
Orthogonal to the works on different textual password generating and management schemes are contributions that deal with password metrics, principally meters that show users how strong their password might be (Bishop and Klein,
Complementary to our work are also contributions dealing with users’ compliance to different password creation policies (Adams and Sasse,
With respect to password management policies, the U.S. National Institute of Standards and Technology published a draft Guide to enterprise password management, publication NIST 800-118 (Scarfone and Souppaya,
A strict password management policy in the mentioned NIST publication using the above factors could be implemented as follows: (a) a minimum 8 characters; (b) type: at least upper and lower case plus one numeral or a special symbol and at least three of those; (c) composition restrictions: no biographic elements and no dictionary words; (d) frequency: password change frequency (at least every 12 months); (e) technical password management: no stored passwords allowed, only salted hashes, no password transmission over insecure networks; (f) management restrictions: password reuse not allowed, writing down of passwords not allowed, deriving passwords from other passwords is not allowed; and (g) password origin:
We can see many of these requirements nowadays implemented in many web pages and services: a minimum 8-character, mixed upper and lowercase plus numeral plus special character, not in a dictionary password. Most of the elements can be system-controlled by imposing a set of rules and measures, except the element c) where system cannot control if a user has included her biographic elements into the password. The only way to control this is to use the g) element: passwords are generated and assigned by a system. Yet, the element on management restriction (f), the part that prohibits writing down of passwords, completely relies on a user (Scarfone and Souppaya,
Despite the fact that the NIST 800-118 (Scarfone and Souppaya,
Here, we briefly present a hybrid method for generating textual passwords proposed by Cipresso and colleagues (Cipresso
“
A circle and a square on keyboard producing a strong password.
However, the improved PsychoPass method requires the use of SHIFT and ALT-GR keys and that the keys that are not always adjacent to each other in the sequence. Suppose that key #1 is pressed without SHIFT or ALT-GR, key #2 is pressed in combination with shift key, and so on, as given in Table
Passwordresult of the visual representation from Fig.
Sequence #: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
Shift (sh)/alt-gr (al) used | Sh | Al | sh | sh | sh | al | sh | |||||||
Resulting character: | a | E | y | | | d | X | % | 7 | T | u | ] | 6 | h | J |
The password representing the circle and the square in Fig.
Passwordresult of the visual representation from Fig.
Sequence #: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
Shift (sh)/alt-gr (al) used | Sh | Al | sh | sh | sh | al | sh | |||||||
Resulting character: | s | R | x | E | f | C | & | 8 | Z | i | h | 7 | j | K |
A representation of password »aEy|dX%7Tu]6hJ«.
A representation of password »sRxefC&8Zih7jK«.
The representations of passwords »aEy|dX%7Tu]6hJ« and »sRxefC&8Zih7jK« are shown in Fig.
The user thus memorizes a password based on its visual representation (action sequence) and additionally when to press SHIFT or ALT-GR.
It may seem that the password produced (e.g. »aEy|dX%7Tu]6hJ«) is totally random (with 7514 = 178.179.480.135.440.826.416.015.625 = 1,78E+26 different combinations, brute force attack would take some 5, 6E+9 years), but in reality it is not so. The total number of different combinations using the improved PsychoPass method is
The research question is as follows: what is the impact of a strict password creation policy on the convenience and memorability of different system-assigned passwords? We expect that users will spend less time entering passphrases, followed by random and psychopass. In terms of memorability, we expect that approximately 25% of participants will remember their assigned passwords after one week (Zviran and Haga,
We conducted an experiment where a group of second year computer science students
Next, the system created a password by using concatenations of words and symbols and/or numbers (pass-phrase password). Here, each password was created by using a 6-letter word (mixed upper and lowercase letters), concatenated by two digits, and followed again by a 6-letter word, totaling 14 characters. The words were chosen randomly by the system from a custom built dictionary of Slovenian 6-lettered words which were in turn obtained from On-line dictionary of Slovenian Words (SASA,
Finally, a password was created by the system using the improved PsychoPass method (referred to as a psychopass password). The length of the password (
It can be noted that the strength of a random password is of one order of magnitude higher than the other two. However, 7-character random password would yield ∼1E+13 combinations, one order of magnitude lower. We decided for the 8-character password to have the length of the password higher and more comparable to 14 and 11 characters in pass-phrase and psychopass passwords, respectively. Additionally, length 8 is typical (Dell’Amico
Each consenting participant was assigned a username and an initial password that were sent to her or him by email prior to the beginning of the experiment. The experiment itself first took place in a classroom where the participants were explained the outline and the purpose of the experiment. They were also told that the passwords need to be memorized not only for the day of the experiment but for a longer period and that they should not write down the password; for this reason the participants had to put away bags, papers, pens and even mobile devices prior to entering the experiment room and for the entire duration of the experiment. After the presentation phase, the participants moved without their belongings to a computer room. This way we controlled that the participants could not write down or else store their assigned passwords. After the experiment the participants entered a classroom for lectures, further delaying them from access to their belongings for one hour.
When a participant has logged in to the experimental web page, the system has displayed a randomly generated password. If the participant did not like the assigned password, an alternative was offered. This way we emulated a strict password policy which does not allow a user to create her own weak password but may choose from several alternatives offered by the system. Once the password was accepted, the user was re-typing the assigned password back to the system for two minutes for the random and pass-phrase, and for five minutes for the psychopass password. The allowed time for entering the repetitions was determined in the testing phase of the web page. The selected password with additional data was stored in a database with user’s details. The additional data included the measured time needed for typing the password and whether the re-types of the password were successful or not.
The experiment continued in one week. This time it was measured only if a participant had remembered any of the assigned passwords. The participant had a possibility to enter the password correctly three times only (simulating a real-world lockout). If she or he did not remember it, the system had it displayed for the user’s reference, and marked a failure.
The data from the experiment and its web page were collected in a database. For each user a login username and password were initially stored. Additionally, the time taken to enter each password was measured for all the participants. The measurement of time started with the first keystroke and ended when the ENTER key was pressed. The data on successful password recall was collected as well.
From the collected data we removed 5 users’ entries because they did not complete all three tests or they did not enter some of the passwords correctly at least once in the first phase. The final dataset contains data from 40 users.
First, we checked for the usability of the passwords in terms of the time needed for the input. We compare the times needed to enter the password in the system at two points of the first part of the experiment, the first time entry and last time entry. First time entry was recorded when participants first repeated the system-assigned password, and last time entry was recorded at the end of 2- and 4-minute interval for random/passphrase and psychopass, respectively.
We expect that the mean times needed to enter a password at the beginning and at the end will significantly differ across the groups. At the beginning, we expect that it will be the easiest (shortest times) to enter a passphrase compared to the other two groups. At the end, we expect that cognitive-based methods (psychopass, passphrase) will require less time to enter the password compared to a randomly selected password.
The primary experimental hypotheses are the following:
Hypothesis 1: H10: Alternative hypothesis 1: H1a: Hypothesis 2: H20: Alternative hypothesis 1: H2a: Hypothesis 1A-10: Alternative hypothesis 1A-1a: Hypothesis 1A-20: Alternative hypothesis 1A-2a: Hypothesis 1A-30: Alternative hypothesis 1A-3a: Hypothesis 2A-10: the mean times for the last time entering a random and psychopass password are the same. Alternative hypothesis 2A-1a: the mean times for the last time entering a random and psychopass password are different. Hypothesis 2A-20: Alternative hypothesis 2A-2a: the mean times for the last time entering a random and passphrase password are different. Hypothesis 2A-30: Alternative hypothesis 2A-3a: the mean times for the last time entering a passphrase and psychopass password are different. Hypothesis 3: H30: the recall rate is not associated with the password type.
In case H1a holds (H10 is rejected) we shall test the following hypotheses, which are actually pairwise comparisons to see where the differences are coming from:
In case H2a holds (H20 is rejected) we shall test the following hypotheses, which are actually pairwise comparisons to see where the differences are coming from:
Second, we were interested whether the recall rate at the second stage of the experiment is somehow connected to the password type. Here, the hypothesis is as follows:
Alternative hypothesis 3: H3a: the recall rate depends on the password type.
The data sets containing measurements of time needed to enter a password for the first time and for the last time in a given time-frame for three different groups of measurements (group 1: random, group 2: passphrase, group 3: psychopass) were analysed using 3-way ANOVA and independent samples
We used the Bonferroni correction to counteract the problem of multiple comparisons in
SPSS version 25 (IBM Corporation, Armonk, NY, USA) was used for analysis.
First, we calculated the descriptive statistics for the data obtained. The results are shown in Table
Descriptive statistics for experimental data, time to enter the password.
Mean | Std. deviation | Std. error | 95% Confidence interval for mean | Minimum | Maximum | ||||
Lower bound | Upper bound | ||||||||
first_time | random | 40 | 17387,25 | 13826,064 | 2186,093 | 12965,46 | 21809,04 | 3481 | 60866 |
passphrase | 40 | 16894,18 | 10169,751 | 1607,979 | 13641,73 | 20146,62 | 7680 | 67295 | |
psychopass | 40 | 30327,85 | 14929,805 | 2360,609 | 25553,07 | 35102,63 | 7797 | 83947 | |
Combined | 120 | 21536,43 | 14443,180 | 1318,476 | 18925,71 | 24147,14 | 3481 | 83947 | |
last_time | random | 40 | 10010,18 | 16905,091 | 2672,930 | 4603,66 | 15416,69 | 2763 | 112728 |
passphrase | 40 | 11210,08 | 6557,797 | 1036,879 | 9112,79 | 13307,36 | 3919 | 31178 | |
psychopass | 40 | 11086,38 | 7549,515 | 1193,683 | 8671,92 | 13500,83 | 3018 | 35659 | |
Combined | 120 | 10768,88 | 11257,245 | 1027,641 | 8734,04 | 12803,71 | 2763 | 112728 |
Next, we tested for the differences in means of times needed to enter each password at the beginning and at the end of the experiment (tested for H10 and H20). We used the ANOVA test. The results are shown in Table
Results of ANOVA tests.
Sum of squares | df | Mean square | F | Sig. | ||
first_time | Between Groups | 4642211670,950 | 2 | 2321105835,475 | 13,456 | |
Within Groups | 20181835238,375 | 117 | 172494318,277 | |||
Total | 24824046909,325 | 119 | ||||
last_time | Between Groups | 34843575,200 | 2 | 17421787,600 | 0,135 | 0,873 |
Within Groups | 15045497409,925 | 117 | 128593994,957 | |||
Total | 15080340985,125 | 119 |
The results show that the hypothesis H10 needs to be rejected at
On the other hand, the hypothesis H20 cannot be rejected: the mean times to enter any password at the end of the first part of the experiment were not statistically significantly different from each other at
Since H10 was rejected, we tested the additional hypotheses (1A-10, 1A-20, and 1A-30) to see which pairs are comparable. As mentioned, multiple (=3) comparisons were performed, so Bonferroni adjustment was used. The results are given in Table
Results of multiple comparisons, Bonferroni adjusted.
Dependent variable | (I) group | (J) group | Mean difference (I-J) | Std. error | Sig.* | 95% Confidence interval | |
Lower bound | Upper bound | ||||||
first_time | random | passphrase | 493,075 | 2936,787 | 1,000 | −6640,05 | 7626,20 |
psychopass | −12940,600* | 2936,787 | −20073,73 | −5807,47 | |||
passphrase | random | −493,075 | 2936,787 | 1,000 | −7626,20 | 6640,05 | |
psychopass | −13433,675* | 2936,787 | −20566,80 | −6300,55 | |||
psychopass | random | 12940,600* | 2936,787 | 5807,47 | 20073,73 | ||
passphrase | 13433,675* | 2936,787 | 6300,55 | 20566,80 |
* The mean difference is significant at the 0,05 (and at Bonferroni adjusted 0,0167) level.
The results show that mean times to enter the first passphrase and psychopass passwords are statistically significantly different at any reasonable threshold. The same holds for the random-psychopass pair. The hypotheses 1A-10 and 1A-30 need to be rejected at
The second part of the experiment was implemented after one week from the first part. Here, the participants were asked by the system to enter each of the three previously assigned passwords. The three-times-and-out system policy was enforced, meaning users had to be successful within three trials. The results – how many participants (of total
The results of the second part of the experiment: remembered vs. not remembered by password type.
Random | Passphrase | Psychopass | |
Did not remember | 40 | 40 | 36 |
Did remember | 0 | 0 | 4 |
The results show that 10% of the participants completing both part of the experiment were able to remember their psychopass password after one week, but no one remembered the random or passphrase-based password. Of all those that have remembered, they were successful only on the third try. Our means to control the password write-down were proven successful. Otherwise, if a participant were able to somehow write down the password, she would have entered it correctly on the first try, not on the third.
We have checked whether the better results in remembering the psychopass passwords are due to chance or is there a systematic reason behind the ease of recall. The chi-square (
Results of Chi-Square test.
Value | df | Asymp. sig. (2-sided) | |
Pearson chi-square | 8,276 | 2 | |
Likelihood ratio | 9,068 | 2 | |
120 |
We can see here that
Passwords are the Achilles’ heel of modern computing as they are mostly at users’ responsibility. The computer community has not made a very much needed shift in password management for almost 40 years. It seems nothing has changed since Robert Morris and Ken Thompson wrote the seminal paper on (UNIX) password security in 1979: the passwords are still the main method of authentication (Creese
It was observed that most common password creation policies remain vulnerable to off-line attacks and that external password creation policies need to be enforced (Weir
System and/or security administrators have tried to avoid weak users’ passwords by introducing very strict password management policies requiring users to pick and use a system-assigned password. This way they have (inadvertently?) put users to very high memory loads and at the same time, because users tend to write passwords down, to inacceptable security practices and risks.
We designed an experiment where we tested how such strict password management policies reflect in users memorizing their system-assigned passwords.
We first tested the times needed to enter a password produced by three different methods: random, passphrase and psychopass. At the beginning of the experiment, when users typed-in the passwords for the first time, the easiest (and the fastest) password to enter was passphrase, followed by random and psychopass with mean times of 16.894,18, 17.387,25 and 30.327,85 seconds, respectively. The mean times of passphrase and psychopass passwords and of random and psychopass are statistically significantly different at any reasonable threshold, while the pair random-passphrase is not. This finding partially confirms our expectations: the mean times did differ, and the passphrase was easiest (fastest) to enter.
However, at the end of entering (learning) of passwords, the mean times for entering various passwords are not statistically significantly different from each other. This was a surprise, meaning that the users were in average able to enter the PsychoPass-generated password as quickly as the other two. Additional surprise was that the average time to enter any password was around 10 to 11 seconds, although they were of different lengths: 8, 14, and 11 characters for random, passphrase and psychopass, respectively.
None of the participants remembered neither the random string nor passphrase password. However, 4 participants out of 40 (10%) did remember their psychopass password. There is a statistically significant association (
As a side effect, we found several advantages of the PsychoPass method. First, the main advantage of the method seems to be the memorability of the password, yet this needs to be checked under more lax security policies. Second, a psychopass password looks like a randomly generated one and hence, the attackers cannot recognize it as such. Third, the passwords are currently resilient to dictionary attacks as there are no known dictionaries built and the currently available are useless. Fourth, the method enables the password reuse: the same visual effect can produce several different passwords by just shifting the starting point of the first key. For each of different authentication services a user only needs to know the starting key for a particular service; the visual sequence is always the same. Thus, an attack that would repeat a compromised password on a different service would fail. Further research is needed to show the perceived benefits of the method in settings where users may create their own passwords.
It is true that the PsychoPass method performed better than the other two in terms of memorability and was just as good in terms of usability (speed of typing/entering), however, the results also show that the achieved threshold of 10% is way below expected and previously measured by Zviran and Haga (
Our findings raise a serious question on applicability of strict password management policies not allowing the users to select their own passwords. It is true that system-assigned passwords are hard (or close to impossible) to break using brute force or dictionary attacks, but at the same time users forget them. An adversary who knows the details of password management policy would simply not use brute force or dictionary attacks, but other available means (e.g. shoulder surfing, workplace browsing, garbage shifting, stealing of notes, etc.).
The password management policy implementation is not an easy task. Users should not be considered as an uneducated and ignorant enemy. In many cases system/security administrators can be their own worst enemies. Tightening restriction in one field may open up a new hole in an unexpected way and area. A sound password management policy today needs to implement a dictionary checking and also probabilistic checking (e.g. Markov models based, grammar based, or a combination) to prevent weak passwords.
Additional thanks goes to Mr. Renato Ivačič for his help in the experiment phase and implementing the application support.