Passwords 101 part 1: Cracking Password Hashes

This is the first of a series blogposts I’ll be doing about passwords. You’ve probably heard about password leaks in the past; the LinkedIn breach was among the bigger ones. Although password leaks are generally unpleasant, they do offer opportunity, and not only for hackers. What’s interesting here, from a security perspective, is that suddenly we have access to a huge database of actually used passwords. This dataset can teach us a lot about password behavior, and maybe show you the importance of choosing truly individual passwords. I managed to get my hands on the complete dataset of the LinkedIn breach and decided this would be a great opportunity to analyze user password behavior, as well as showing how easy it is these days to crack passwords using free tools. In this blogpost I will show you how we get from the raw data (the hashes) to some interesting statistics.  I will be using a MySQL database to store my data, Python to handle and parse the large data files, and the Hashcat password cracking tool to crack the LinkedIn SHA-1 unsalted hash database from the 2012-2016 databreach. Since exposing peoples passwords is bit of a touchy subject, I would like to make it clear that I will not be linking the passwords to individual e-mail addresses, I will only be analyzing general behaviour.

About the breach

The LinkedIn breach itself dates from 2012. Hackers were able to breach the LinkedIn database and extract millions of passwords. Initially only 6.4 million unique hashes were posted on a Russian password cracking forum. LinkedIn took action and issued a password reset for all accounts that were believed to be compromised. In May 2016 however it became clear that the hackers did not just take a subset of the database. They took everything. The password database that was put up for sale revealed about 160 million entries. This corresponds to the LinkedIn user count during the second quarter of 2012. LinkedIn immediately started to issue password resets for all users who haven’t reset their password since 2012.

So the passwords themselves cannot be used to log in to random LinkedIn accounts to, for example, replace the profile pictures with random internet cats, since all passwords were reset. There is however still the possibility that the passwords were used by the same people on different services. We now know for example, thanks to the hacker group OurMine, that Mark Zuckerberg himself used the password “dadada” for multiple websites including LinkedIn, Pinterest and Twitter. The hacker group used the data from the LinkedIn breach to get the hash and cracked it to reveal the rather simple password. We’re just hoping Mark will be using a slightly better password from now on. So if you were in the LinkedIn breach, make sure the password you used there is not used on any other account. Netflix for example already issued a precautionary password change, and TeamViewer reported a growing number of account take-overs due to external password compromises and password reuse.

Initial setup

Since I’m starting from the dataset containing SHA-1 hashes, this first post is about cracking the hashes and finding the passwords. One of my goals is to show you that you do not need a pile of fancy servers to crack passwords. One laptop is more than sufficient. I will show you step by step how I tackled this little project.

I used my laptop with the following specs:

  • Lenovo W550s
  • Windows 7 professional SP1 (64 bit)
  • Intel® Core™ i7-5500 CPU @ 2.40 GHz
  • 8,00 GB RAM
  • NVIDIA Quadro K620M (64 Bit @ 2000 MHz, 2048MB DDR3)

First I downloaded the entire dataset (rar) and extracted it (21gb). The dataset consists of 1 SQL file, which contains a table mapping MEMBER_ID to MEMBER_PRIMARY_EMAIL, 9 text files which contain mappings from userID to a hashvalue, en 37 textfiles containing mappings from e-mail addresses to hashvalues. Since I’ll only focus on cracking hashes and analyzing the used password, I’ll only be needing the 9 text files with memberid and hashvalue for now.

MySQL Database

I figured I’d best insert the data into a mysql database on my laptop. I wrote a small python script to put the 9 files together and in a favorable format (columns separated by t, no string delimiters, lines terminated by n). I then inserted everything into a MySQL database. I used this manual load&insert command instead of the GUI alternative because it allows me to be more specific on termination and escape characters, and it basically works a lot faster. If you’re looking to insert for example the e-mail addresses or plain text passwords into a database at some point, this way is highly recommended, since these things can contain any stupid random character, including your escape and delimiter characters.

LOAD DATA LOCAL INFILE ‘path_to_file.csv’
INTO TABLE db.useridhash
FIELDS TERMINATED BY ‘t’ ESCAPED BY ”
LINES TERMINATED BY ‘n’;

A quick sort on most-used hashes shows that a lot of the entries had “xxx” instead of a hashvalue. This is probably due to the user using a Facebook login instead of LinkedIn login to access the profile. In the following steps I do not take these entries into account, and will work with the remaining 118.542.820 entries.

Total number of entries in table useridhash: 166.033.990

Total number of entries with “xxx” as hashvalue: 48.828.117

Total number of useful hashvalues: 117.205.873

Now I can generate some quick stats on these hashvalues. I selected the top 100 most-used password hashes. A quick entry in online hashcracking tools (https://hashkiller.co.uk/sha1-decrypter.aspx) reveals that practically all top 100 passwords are easily cracked. To be honest, they do indeed not look secure at all.

The top 5 most popular hashvalues (and passwords) are found in the table below. I put “xxx” on number 0 for completeness. As you can see from the percentages, the most used password (123456) is used by 0.64% of all accounts. This means that if you were to try 200 accounts with the password “123456”, you’d have a good chance of gaining access to one of them. I added a list up to nr 49 at the end of the article.

Rank Used by … accounts % of 117.205.873 Unsalted SHA1 hash value Plaint text password
0 48.828.092 xxx
1 753.305 0.64% 7c4a8d09ca3762af61e59520943dc26494f8941b 123456
2 144.949 0.12% 7728240c80b6bfd450849405e8500d6d207783b6 linkedin
3 135.200 0.11% 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8 password
4 94.314 0.08% f7c3bc1d808e04732adf679965ccc34ca7ae3441 123456789
5 63.769 0.05% 7c222fb2927d828af22f592134e8932480637c0d 12345678

The next step is to try and crack as many of these hashes as possible, so I can analyze the passwords. I calculated the number of distinct password hashes (not including “xxx”).

Number of distinct hashvalues: 61.829.178

61.829.208 hashes is a lot to handle. I tried feeding them all at once to hashcat, but there was insufficient memory to handle it all. I decided to select the most-used hashes and try to crack these first to see how far I’d get, figuring that very unique hashes are probably also very hard to crack, and would take a long time. I could come back later and have a go at the least-used hashes as well. So I selected the most-used hashes, and then sorted them alphabetically, since this greatly improves the efficiency of hashcat.

I selected the top 50% (30.914.589) of most-used hashes, excluding the “xxx” hashes, to see how many accounts I could enter if I managed to crack them all.

SELECT SUM(count) FROM (SELECT hashvalue, COUNT(hashvalue) AS count FROM db.useridhash WHERE hashvalue != ‘xxx’ GROUP BY hashvalue ORDER BY count DESC LIMIT 30914589) AS T;

This resulted in 86.291.219 records, or 73% of all accounts. This seems sufficient to start with.

I exported this top 50% of most-used hashes (used by 73% percent of LinkedIn users) alphabetically:

SELECT hv FROM (SELECT hashvalue AS hv, COUNT(hashvalue) AS count FROM db.useridhash WHERE hashvalue != ‘xxx’ GROUP BY hashvalue ORDER BY count DESC LIMIT 30914589) AS T ORDER BY hv;

Hashcat

To crack the passwords I used cudahashcat 2.01. This is a hashcat version specifically for use with nvidia Graphics cards. Hashcat is a free to use password cracking tool with a lot of functionalities, you can find it online. During my research a new version of hashcat was released (3.0). This version can combine GPU and CPU power to give quicker results. As soon as 3.0 was released i switched to this newer version.

Brute Force

First I tried some general brute forcing. This is the most straightforward way of cracking hashes, but it does take a long time for longer passwords. Arranging the hashes alphabetically thankfully speeds up the process a bit. The following command checks all hashes of passwords with length 6, using all character sets (lowercase, uppercase, digits and symbols)

cudaHashcat64.exe –hash-type=100 inputfile.txt –attack-mode=3 ?a?a?a?a?a?a –outfile=cracked_six.txt –outfile-format=3 –remove

  •  –hash-type=100 : SHA-1 hashes
  •  inputfile.txt : file containing hashes
  •  –attack-mode=3 : use brute force
  •  ?a?a?a?a?a?a : mask definition, character set ?a (see below)
  •  –outfile=cracked_six.txt : output file
  •  –outfile-format=3 : output generated as “hash:password”
  •  –remove : remove hash from list if cracked

Hashcat allows you to define a mask where you specify how many characters your to-guess password has and which characters can be used. You can define personal charactersets as well by using –custom-charset1=?l?u?d for example.You can then use this custom charset like any other charset (?1?1?1?1?1?1).

  • ?l = abcdefghijklmnopqrstuvwxyz
  • ?u = ABCDEFGHIJKLMNOPQRSTUVWXYZ
  • ?d = 0123456789
  • ?s =  !”#$%&'()*+,-./:;<=>?@[]^_`{|}~
  • ?a = ?l?u?d?s
  • ?b = 0x00 – 0xff

The table below shows the results of some cracking attempts; the password length, mask and character sets used in hashcat, and the results with the time needed to complete the command. The images show the Hashcat output.

Length Mask Comment Amount cracked % of total 30.914.589 Duration
6 ?a?a?a?a?a?a Full set 3.212.366 10,39% 3h 43m
7 ?1?1?1?1?1?1?1(?1 = ?l?u?d) No symbols 3.402.681 11,01% 8h 03m
8 ?l?l?l?l?l?l?l?l Lowercase 1.722.602 5,57% 1h 33m
8 ?u?u?u?u?u?u?u?u Uppercase 47.850 0,15% 0h 37m
7-12 ?d?d?d?d?d?d?d?d?d?d?d?d Digits, incremental 715.629 2,31% 1h 06m

I also started a search on 8 characters with upper-and lowercase, and uppercase, lowercase, and digits. I didn’t complete the search because it would have taken up too much time for a relatively small yield.

Dictionary Attack

My next approach is a dictionary attack. I will be using a well-spread dictionary file for this: rockyou.txt. This is a file containing English dictionary words, as well as often-used passwords. When you use a dictionary attack you basically tell the cracking software to only calculate the hashes of the words in the dictionary file, and to match these against the hashes you want to crack. This type of attack tends to be more efficient since humans are more likely to use dictionary words rather than complicated and random series of text, symbols and numbers.

cudaHashcat64.exe –hash-type=100 inputfile.txt –attack-mode=7 rockyou.txt ?a?a?a –outfile=cracked_rockyouaaa.txt –outfile-format=3 –remove

  •  –attack-mode=7 : use dictionary
  •  ?a?a?a?a?a?a : mask definition, character set ?a (see below)
  •  –outfile=cracked_six.txt : output file
  •  –outfile-format=3 : output generated as “hash:password”
  •  –remove : remove hash from list if cracked

Next to the straight dictionary attack, we can also combine it with a “mask”. Then we add some characters to the dictionary words (defined by character set masks). As you can see from the results dictionary attacks yield a whole bunch of new results, with very little overall time needed. We notice that the yield is bigger for word+mask combinations than for mask+word combinations.

Dictionary Comment Mask Amount cracked % of 30.914.589 Duration
rockyou straight 1.143.881 3,70% 0h 24m
rockyou combined —?a 1.311.356 4,24% 0h 30m
rockyou combined ?a— 333.302 0,11% 0h 10m
rockyou combined —?a?a 2.662.884 8,61% 1h 03m
rockyou Combined —?a?a?a 1.923.981 6,22% 26h 59m

Seeing that the dictionary file appended with 3 random characters already caused a processing time of more than 1 day, I decided to look into a different approach to find more. E-mail addresses and names are very often used as passwords. Since we have a list of all e-mail addresses (listed in the 1.sql file) , we can easily create a new dictionary file. I wrote a little python script to list all e-mail addresses, as well as the base of each address (the bit before “@”), and possible smaller pieces (separated by “.”, “–” or ” _ “). So if there is an e-mail address like “pleasecrack_mye-mailpassword@gmail.com”, I will be adding “pleasecrack”, “mailpassword”, “pleasecrack_mye-mailpassword” and “pleasecrack_mye-mailpassword@gmail.com” to my dictionary file. I’m leaving the short bits out since a LinkedIn password needs to have a minimum length of 6. 6 Gigabytes of dictionary file later, I can throw “email_wordlist.txt” in hashcat and see how far this gets me.

It doesn’t have the yield of the other dictionary file, but I’m still getting a whopping 438.065 user accounts that were using their e-mail address or something related to it as a password. If you scroll back up and compare this to the most-used passwords, using something similar to your e-mail address comes in second and is done by at least 0.37% of LinkedIn users.

To recap: I have found 17.170.864 hashed passwords by now, which equals 28% of all distinct hashes.

I’ll be inserting them into the database to check how many user accounts I could actually have hacked with this set. After insertion I get the following result:

71.255.593 user accounts have been cracked, which equals 60% of all accounts (with non-xxx hash values).

Extra notes on hashcat:

The new hashcat version allows you to use both GPU and CPU for the same task. The command has therefore changed a bit:

hashcat64.exe –hash-type=100 sortedFile.txt –attack-mode=3 ?a?a?a?a?a?a –outfile=cracked_6.txt –outfile-format=3 –potfile-disable –opencl-device-types 1,2 –opencl-devices 2,3 –remove

  • –opencl-device-types 1,2: Use both CPU and GPU devices
  • –opencl-devices 2,3: Only use these specified devices (I had to add this in some cases because the integrated graphics card on the motherboard could not handle the dataset and prevented the execution of the command, so I could only use the nvidia graphics card and my CPU)
  • –potfile-disable: this function was already available in the previous version: the potfile is the database your hashcat stores of hashes it has already cracked. Since I was cracking distinct hashes, I did not need hashcat to check for this. Disabling it in this case probably lead to quicker execution.

Some of the passwords found by hashcat as a result of the rockyou.txt dictionary file were non-ascii characters. Hashcat outputs these in a hexadecimal format by default (e.g. $HEX[61c59f6bc4b16d64616e])

You can however easily convert these hexadecimal values to utf-8 characters with with python:

bytearray.fromhex(string).decode(‘utf-8’)

$HEX[61c59f6bc4b16d64616e] = aşkımdan
$HEX[7365c3b16f7269746f] = señorito

Conclusion

Cracking unsalted passwords is a breeze. The less complex the password, the quicker it will be cracked in the attacks. Going over all possibilities for passwords with 6 characters took me no more than 4 hours. Lowercase passwords with 8 characters only takes 1 hour. Passwords based on your e-mail address are dead giveaways, and any password in the rockyou wordlist is quickly discovered as well. The hardest passwords to crack are those with characters from all sets (lowercase, uppercase, digits and symbols) and sufficient length (at least 8 is advised).

In less than 2 days (44h processing time) and with a normal laptop I was able to crack 60% of the userbase of LinkedIn that used a password to login. If you know that the original owner (hacker) of the database has had 4 years, you can be pretty damn sure every single little password is out in the open somewhere. So if you’re still using that 2012 LinkedIn password somewhere, better change it, it’s probably compromised.

In the next post I’ll be diving deeper into the user behavior concerning passwords. Until then I’ll also be continuing to crack hashes to increase my usable data.

50 Most common hash values in LinkedIn breach

If you should find your favourite password in this list, you know at least 14.000 people agree that it’s a nice password. Unfortunately, you can also be pretty sure it’s not a very secure one.

Rank Used by … Accounts % of 117.205.873 Unsalted SHA1 hash value Plain text password
0 48.828.092 xxx
1 753.305 0,64% 7c4a8d09ca3762af61e59520943dc26494f8941b 123456
2 144.949 0,12% 7728240c80b6bfd450849405e8500d6d207783b6 linkedin
3 135.200 0,11% 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8 password
4 94.314 0,08% f7c3bc1d808e04732adf679965ccc34ca7ae3441 123456789
5 63.769 0,05% 7c222fb2927d828af22f592134e8932480637c0d 12345678
6 57.210 0,05% 3d4f2bf07dc1be38b20cd6e46949a1071f9d0e3d 111111
7 49.652 0,04% 20eabe5d64b0e216796e834f52d61fd0b70332fc 1234567
8 36.460 0,03% b1b3773a05c0ed0176787a4f1574ff0075f7521e qwerty
9 36.291 0,03% 8d6e34f987851aa599257d3831a1af040886842f sunshine
10 33.854 0,03% dd5fef9c1c1da1394d6d34b248c51be2ad740840 654321
11 32.490 0,03% c984aed014aec7623a54f0591da07a85fd4b762d 000000
12 28.824 0,02% 6367c48dd193d56ea7b0baad25b19455e529f5ee abc123
13 24.984 0,02% d8cd10b920dcbdb5163ca0185e402357bc27c265 charlie
14 22.888 0,02% 1411678a0b9e25ee2f7c8b2f7ac92b6a74b3f9c5 666666
15 22.666 0,02% ff539c96a2ed9f72a47a5e1c7d59e143ba1fba94 linked
16 21.826 0,02% 601f1889667efaebb33b8c12572835da3f027f78 123123
17 21.613 0,02% 019db0bfd5f85951cb46e4452e9642858c004155 maggie
18 20.172 0,02% 775bb961b81da1ca49217a48e533c832c337154a princess
19 20.086 0,02% 17b9e1c64588c7fa6419b4d29dc1f4426279ba01 michael
20 19.575 0,02% 01b307acba4f54f55aafc33bb06bbbf6ca803e9a 1234567890
21 19.076 0,02% ee8d8728f435fd550f83852aabab5234ce1da528 iloveyou
22 17.334 0,01% c0b137fe2d792459f26ff763cce44574a5b5ab03 welcome
23 17.134 0,01% 48058e0c99bf7d689ce71c360699a14ce2f99774 121212
24 16.990 0,01% 3d0f3b9ddcacec30c4008c5e030e6c13a478cb4f daniel
25 16.846 0,01% a2c901c8c6dea98958c219f6f2d038c44dc5d362 baseball
26 16.678 0,01% e38ad214943daad1d64c102faec29de4afe9da3d password1
27 16.397 0,01% 1999e4893f732ba38b948dbe8d34ed48cd54f058 buster
28 16.257 0,01% ed9d3d832af899035363a69fd53cd3be8f71501c shadow
29 16.221 0,01% b1f45ed147d6803ac1a2a91bdea1fab603f910a5 bailey
30 16.173 0,01% ab87d24bdc7452e55738deb5f868e1f16dea5ace monkey
31 16.165 0,01% 273a0c7bd3c679ba9a6f5d99078e36e85d02b952 222222
32 15.664 0,01% b7c40b9c66bc88d38a59e554c639d743e77f1b65 555555
33 15.513 0,01% 6420ed4d831b436d1e92d25605d18297296374e3 summer
34 15.111 0,01% 675dc611bafb0b7348dd3baf7e005b6916fb954d hannah
35 15.089 0,01% 1f8ac10f23c5b5bc1167bda84b833e5c057a77d2 abcdef
36 14.838 0,01% 891c5feef171da85aadd3fdb8130ba509b03f5ea chocolate
37 14.826 0,01% 9fd8de5fc2a7c2c0d469b2fff1afde4e5def37ba george
38 14.778 0,01% b7a875fc1ea228b9061041b7cec4bd3c52ab3ce3 letmein
39 14.772 0,01% 7ecfd8f97b4729c6ff0799b0b4d40f870083b461 freedom
40 14.726 0,01% 088e4a2e6f0c20048cd3e53c639c7092bffb8524 pakistan
41 14.617 0,01% 92119e2c63e9366acfefe818b50537a85577e2db ginger
42 14.577 0,01% 5f50a84c1fa3bcff146405017f36aec1a10a9e38 thomas
43 14.548 0,01% 53a5687cb26dc41f2ab4033e97e13adefd3740d6 success
44 14.456 0,01% f32157a45887e4fe5adc0b5198f7ec4920a526d7 harley
45 14.435 0,01% 31c7fd2e291eeee7451ad31168f87183e31b4b9d Linkedin
46 14.431 0,01% 7212a9e01329ea93a57f574bd9bf77695d5fdca4 michelle
47 14.415 0,01% 64356bcfae350c970263c1ce575185b289f7b836 pepper
48 14.396 0,01% fba9f1c9ae2a8afe7815c9cdd492512622a66302 777777
49 14.381 0,01% c60266a8adad2f8ee67d793b4fd3fd0ffd73cc61 computer

Leave a Reply

Your email address will not be published. Required fields are marked *