We use passwords just about everywhere in our daily lives. It’s difficult to think of an online service where we don’t have a need to enter some kind of credentials to access our content. DShield honeypots collect a variety of data, including passwords, that are submitted from SSH and telnet attacks.
Figure 1: Snapshot on 9/1/2023 of DShield submitted usernames and passwords 
The passwords in the above image are ones that are very common week passwords. This is only a small sample of the passwords submitted to honeypots and it made me curious whether there was any particular origin of the submitted passwords:
Default system passwords
Data breach passwords
Randomly generated passwords 
As a starting point, I complared the almost 250,000 unique passwords submitted to my honeypot with some publicly available sources:
HaveIBeenPwned Passwords 
Extracting Honeypot Passwords
There are many ways to get the passwords out of a DShield honeypot, especially if external logging of the cowrie data is set up. The method used in this case was to pull it out of the local JSON logs I regularly archive.
# read all cowrie JSON logs
# cat /logs/cowrie.json.*
# select logs with the .password key present
# jq ‘select(.password)’
# query the value in the password key and return in raw format (without surrounding quotes)
# jq -r .password
# sort the values alphabetically
# return only unique values and output to a text file
# uniq > 2023-08-15_unique_passwords_raw.txt
cat /logs/cowrie.json.* | jq ‘select(.password)’ | jq -r .password | sort | uniq > 2023-08-15_unique_passwords_raw.txt
Comparing Password Data
The data available from the three sources came in different formats and and needed to be converted for comparison.
SHA1 hash with frequency count
Since a hash cannot be reversed, hashing the passwords supplied to the honeypot and from the rockyou was performed. This actually made the process easy since little processing was needed for the HaveIBeenPwned password list, which was around 36GB in size.
Seconds to process:
Total honeypot hashes:
Total HaveIBeenPwned hashes:
Total RockYou hashes:
RockYou Matched Hashes:
RockYou ONLY Matched Hashes:
HaveIBeenPwned Matched Hashes:
Percentage of honeypot passwords found in HaveIBeenPwned breach data:
Percentage of honeypot passwords found in RockYou data:
Percentage of honeypot passwords found ONLY in RockYou data:
Average processing pace:
481080.78 hashes per second
Something learned from this process was that using a Python set() is much faster than using a Python list. Nothing makes this much more evident than processing a 36GB text file. Since these values were unique within each data set, a Python set() worked very well.
Also, latin-1 strings were used with the Rockyou list due issues with attempting utf-8 encoding.
Looking frequently at cowrie attacks regularly from the DShield honeypot, I knew that there was going to be some unusual results. Rather than filter those out ahead of time, I decided to look at the information visually by comparing password length frequencies.
Figure 2: Password length frequencies from honeypot submissions
The data shows that the most common password length is 8 characters, but there are a lot of passwords with much greater length and lower frequencies. The longest password that had a match in the HaveIBeenPwned data was 48 characters.
Figure 3: Longest password matching HaveIBeenPwned data was 48 characters in length
So, what are these longer passwords? In most cases, the data is most likely not a password, but another part of an attack such as a terminal command or even data meant to be sent to another protocol, such as HTTP.
Figure 4: Examples of data that were not likely meant for password submissions
As the passwords get longer, these commands stand out even more. When filtering out passwords longer than 48 characters, there is not a large difference in the match percentages. It turns out that there are only a few hundred of these passwords out of almost 250,000.
HaveIBeen Pwned Matches
Passwords Without Matches
Approximately 2/3 of the passwords used to attack my honeypot were available in HaveIBeenPwned password data. What about the other 1/3 of the passwords? I pulled out one specific password example since it had no matches within the breach data used, but was also one of the top 20 passwords attempted this year .
Figure 5: Password example with no matches in breach data, but frequently seen
There are a variety of search results in Google when searching for this value. From the search results I was unable to find a source, but many of the results came from honeypot data. The password below the one identified also came up in a variety of articles. WIthin GitHub, that password was available in other honeypot data. This left me with some other questions:
How do write-ups about specific passwords impact those passwords being used in attacks?
How often is reported information security data used to perpetuate attacks?
What is the source of these other “unmatched” passwords? Are they generated or just from breach data not as freely available?
Password breach data is commonly used in credential stuffing attacks.
Use a password manager (could even be a notebook in a locked drawer)
Use unique passwords in combination with Multifactor Authentication (MFA)
Check sites like HaveIBeenPwned  to see if your email has been part of a reported breach
Use password breach data to diallow the use of those paswords
If you find a password you use publicly available, change it
Jesse La Grew
(c) SANS Internet Storm Center. https://isc.sans.edu Creative Commons Attribution-Noncommercial 3.0 United States License.