Securing My Elephant Brain: Passwords

security
Standard

“There are only two types of companies, those that have been hacked and those that will be.”

Those are the words of Robert Mueller, the head of the FBI, from the keynote address at an international computer security and cryptography conference, RSA Conference 2012. With that in mind, the security of My Elephant Brain has been a major focus of the last few weeks. While the project is still in it’s infancy it’s important to build a stable, secure foundation. This is the first in a series of posts about security considerations developers need to take into account, and how users can recognize when their security isn’t being taken seriously.

Customers will be trusting My Elephant Brain with two sets of content that need to be protected. One is the content they add in the shape of flash cards to be memorized. Photos with names and faces have a privacy value that needs to be protected to the utmost. But it’s the other set of content they give us, or any web application, that’s arguably more valuable. Their account information, including usernames and passwords.

Because users often reuse username/password combinations on multiple sites breaking into one system and uncovering passwords could be a treasure trove of personal data. It’s up to system developers to make sure that even if user data falls into the wrong hands, it’s unusable. To understand how to do that, it’s best to start at the most insecure end of the spectrum, plaintext passwords.

Plaintext Passwords

Most lay users will simply assume that when they enter their username and password into a login form, the website is simply looking up the stored password associated with the username and comparing it with the one submitted in the form. This is hopefully not exactly what is happening. The problem in the previous process is that the password was stored in the database in ‘plaintext’ exactly as I had originally written it. If that database is ever to fall into the wrong hands my password is free to be misused without any effort on the attacker’s part. How can the average user tell if their password is being stored in plaintext? One surefire way to tell is if a website emails you your original password when you ask to recover you password. A third party should never be able to recover the original password you gave them. But if the website itself can’t recover your password, how do they log you in?

Hashed Passwords

We need some means of making an original password unrecoverable, but still allowing legitimate users to login. To do this we arrive at the second level of password securing, hashing. “Hashing” means running known text though an algorithm that garbles the text, but is reproducible ever time.

hashing illustration

When a user registers for a service their password is hashed and stored in the database. The next time a user gives a website their username and password to login the same hashing function is used, and the result of the hash is compared with the original value. If an attacker gains access to the passwords database they’ll be met with a series of gibberish that can’t be used against a user on other sites where the username/password combination has been reused.

However, because hash functions will always result in the same value given the same input it is possible to pre-compute a sequence of possible passwords then compare this to the database of stolen passwords. If this set of pre-computed passwords is based on existing words this process is known as a ‘dictionary attack’ because it assumes your password comes from known dictionary words. If attackers take the time to generate every possible combination of letters and numbers up to a 16, 64, 124 character word the set of hashes is known as a rainbow table because the resulting hashes cover the entire spectrum of passwords, much like a rainbow covers the spectrum of visible light. On occasion, this can be valuable and used for legitimate purposes. If you’ve ever forgotten your Windows password there are ways to run a rainbow table against the hashed password and discover a match. However, this technique could also be used for nefarious purposes.  Indeed, this was a primary criticism of LinkedIn when their password database was compromised this year. They had encrypted the passwords with a popular hashing algorithm function known as SHA1 (Secure Hash Algorithm), but even the laziest attacker could download an existing rainbow table and be able to recover the original passwords. To guard against this, we need to add an extra level of randomness.

You Want Salt with That

We could ask users to make their passwords 128 characters long, but they’d probably balk at that. Instead, developers can add that extra complexity without any additional demands on the user. This extra random complexity is known as a salt. Simply add an extra randomly generated string to the user submitted password, then hash the result.

salt and hash illustration

The salt is then saved alongside the user password in the database, and the next time a user tries to sign in the salting and hashing process is repeated and the result is compared. Because every user gets a unique salt the number of possible hashes is astronomical and pre-computing a rainbow table for an entire database is indefeasible.

At least, it used to be.

The Current Art

Computers just keep getting faster. It’s a great feature of technology that we can generally rely on this years crop of silicon to be faster than last year. However, that brings with it problems for security. The history of encryption is littered with hashing algorithms that were secure at their time, but became obsolete when hardware fast enough to ‘brute force’ them became available. Brute forcing simply means trying every conceivable input until you end up with the desired result. This is in contrast to an ‘elegant’ solution that finds the solution much more directly. Graphics cards are actually specially suited to crunching numbers really fast, and high end cards have been shown to crunch upwards of 700 million hashes per second. So the current crop of password storing techniques rely on algorithms with names like PBKDF2 (out of the above-mentioned RSA), bcrypt, and script. They essentially repeat the salting and hashing function enough times that there’s a slight delay, say half to a full second, for a single password. This might not seem like much, but it makes brute-forcing passwords impossible, while being tolerable for systems to log users in. The official PBKDF2 specification suggests 2000 iterations, but iOS 4+ devices go through 10,000 iterations before settling on a password.

What you Should do as a Developer

The above is a gross oversimplification of the process. The truths are much more complex and even slight mistakes can open vulnerabilities. Fortunately most decent frameworks come with modules built by people that are passionate and well trained in security. Use those modules and don’t try to reinvent the wheel. My framework of choice ASP.Net MVC uses PBKDF2 by default to store passwords.

What you Should do as a User

No one will care about your security as much as you will. Use strong passwords (the longer the better) and stop repeating them. I was guilty of this until recently but now I autogenerate my passwords in KeePass and use ChromePass to automatically fill in passwords on webpages. Not only do I have a secure password for each site, but I never forget a password anymore because the password never exists without being first generated in KeePass.