A slightly different way to generate strong, memorable passwords

It’s a good idea to avoid remembering passwords – the more passwords you need to remember, the more tempting it is to re-use them. Ideally you should use a password manager like Bitwarden to store almost all of your passwords.

But of course you can’t store your Bitwarden password in Bitwarden, so it’s still necessary to memorize at least that one password. You’ll probably also need to memorize your disk encryption password and your user account password for your computer. So how do you choose these passwords to be secure and memorable?

Entropy

If you’re interested in computer security, you have probably seen the “Correct Horse Battery Staple” XKCD comic. It advocates choosing four random common words as your passphrase, instead of doing the traditional password generation black magic, because these passwords can be harder to guess and easier to remember. It’s a good illustration of the concept of “entropy”: In the context of security, entropy is a measure of how difficult your password is to guess, and it depends on how you chose the password.

For example, If you choose your password by flipping a coin 10 times, and adding a “1” if you get heads or a “0” if you get tails, your password has 10 bits of entropy. That means that an attacker who knew your decision procedure would have to guess, on average, 2^(10-1) passwords in order to successfully guess the one you chose. Adding one bit of entropy (one coin flip) to your password doubles the number of guesses required. It’s a good idea to incorporate as many truly random choices into your password as possible, to ensure that it’s really unpredictable.

Diceware

A respected way to generate secure passwords is by using the Diceware word list. You sit there rolling dice for a few minutes to choose random words. If you want your password to have 128 bits of entropy (enough for essentially all purposes), you can generate one by randomly choosing ten words from the Diceware list. And if you roll a password with N words in it, you’ll end up with N * log2(6^5) bits of entropy.

I’ve used this method in the past, and it works well. But the passwords I get are often difficult to remember: The words don’t have any semantic structure, and it’s not always easy to come up with mnemonics for lists of ten words.

The Reordered Diceware method

Instead of choosing N words in order and trying to remember them in the order you rolled them, you might try reordering the words you roll, to give them a more memorable structure. Unfortunately you lose some bits when doing this, and it’s important to take that into account. Pessimistically, your password using this method could have as few as log2(6^5 choose N) bits of entropy, or 107 bits for 10 words. To guarantee more than 128 bits of entropy, now your password needs to contain 13 words. And you’re still stuck using all the words you rolled, which limits how memorable your password will be.

Reordered Diceware with discards

I’ve started generating passwords in the following way:

First, roll 15 Diceware words. Then, choose your favorite 13 of those 15 words, and discard the other 2. Put those 13 words into any order you want, with punctuation separating them. This password can be very memorable, because you had a lot of options for just how it should be put together. And while the precise entropy is difficult to calculate, I believe it is at least 127.75 bits no matter how you choose the discards and ordering. For more details about how to compute this lower bound for different password sizes, check out Marcello Herreshoff’s solution here.

In addition to being strong, I’ve found that passwords generated this way are quite memorable, compared to the pure Diceware method.

Port Knocking Better

I am not a security expert; I am just some guy. The usual caveats apply.

Generally, it is a good idea to have fewer ports open on your server. When you have ports open, you leak information about what services are running on your server, and any misconfigured or vulnerable programs listening on those ports can become targets.

Normally, a server must have open ports in order to do its job, at least when you don’t know in advance which addresses it needs to serve. Port knocking is a technique that allows you to have zero open ports on a server, while still allowing connections from trusted clients.

Port Knocking

In the simplest case, port knocking does what it sounds like: You disallow all connections to all ports on your server, using your favorite firewall. Then, you configure a simple daemon to watch for a particular sequence of “knocks”: packets sent to closed ports on the server. When it sees the appropriate sequence of knocks from some IP address, it tells your firewall to (temporarily) allow connections from that address, on whatever port your service is listening on (typically port 22, for ssh). This is like if your speakeasy had a “secret knock” that must be performed before the bouncer opens the door. Trusted clients know the secret knock, and so when connecting to your server they use the knock sequence before attempting to connect.

Avoiding Replays

As with real-life secret knocks, though, this technique is susceptible to replay attacks: if someone is able to hear the secret knock, they can repeat it easily. A person snooping on your traffic may be able to see that you sent a suspicious sequence of packets before successfully connecting, and then use the same sequence themselves.

Some knocking daemons, such as knockd, can mitigate this with a feature that lets you use “one time sequences”: A list of sequences that must be used in the appropriate order, and are then discarded. This works well when only one client needs to connect to the server: That client can keep track of the same list of knock sequences, and discard ones that it has used.

If multiple clients need to connect to the server, each client can get its own list.

But a problem with this approach is that one must generate sequence lists long enough to ensure that the client will be able to connect as many times as they wish. In principle this may not be too difficult; if you plan to connect fifty times a day for a hundred years, this amounts to less than 100 MiB of configuration. In practice I’m not entirely sure how knockd would deal with such a large list.

Using TOTP

An alternative would be to use a scheme like the TOTP system you (hopefully) use for two-factor authentication on your phone. In this scheme, the server and clients use the current time to agree on a knock sequence. The server gets a secret that’s shared with the client, and whenever the server sees a knock sequence corresponding to the current OTP value, it opens the port (and discards that value).

There are at least two implementations of this idea: one by Akshay Shah and Swapnil Kumbhar, and another by Javier Junquera Sánchez. Neither of these solutions looks very polished, but they exist as proofs-of-concept.

I haven’t looked hard at the code for either of these implementations, so I can’t recommend them. There are various pitfalls involved here; the most important is that the TOTP-generated sequence must be discarded after use, to avoid fast replay attacks. And one may want to use different TOTP keys for each client or user (to ensure that multiple clients aren’t racing for the same code if they need to connect at similar times). I suspect that the above implementations don’t deal nicely with at least one of these issues, though again I haven’t looked at them in detail.

Shah and Kumbhar’s solution above uses knockd’s one time sequences feature, but is forced to restart knockd to reload its configuration whenever the keys change. Ideally it would be great if knockd incorporated a TOTP-based configuration option, or had some other method of avoiding replays than long lists of knock sequences.

Some Property Testing Tricks

If I could only give one piece of advice to every programmer working today, it would be “No matter what language you’re using, no matter what you’re building, use property-based testing when things need to be correct.”

If you’re unfamiliar, I’d recommend reading the link above. The gist, though, is that when you test a function, instead of coming up with examples to check with unit tests, you come up with properties of the function that should be true for every input, and then a library generates example inputs (randomly and/or intelligently) that check whether those properties do in fact hold.

In the ideal world of program correctness, software would be written in languages with powerful type systems, à la CPDT, and you could express your program’s correctness at the type level. But, in my opinion, this approach isn’t yet ready for large software projects that need to get things done quickly: It takes too much overhead to get it really right, and it significantly slows down the process of changing your code.

Property-based testing, on the other hand, is almost no harder than unit testing. It’s also much more flexible, and much more likely to catch bugs that you wouldn’t have been able to predict. And, in my experience, it even helps to steer you in the direction of “robust tests”, rather than tests that will break as soon as your code changes.

I’ve recently been employing some simple but general-purpose property testing tricks, that might prove useful the next time you want to write some bug-free code.

My Side Project

My side project is a note-taking and flashcard app, inspired by Roam. The backend of the app uses a CRDT I’m designing, that (I think) solves an open problem with text-editing CRDTs. This CRDT absolutely must have certain properties to ensure that two people editing the same document will always be able to reconcile their changes. I have a sketch of a proof in my head that these properties are preserved, but it’s crucial that the implementation (as well as the conceptual objects) actually do preserve those properties.

Furthermore, there are some properties that I believe are true of my data structure, but that I haven’t even fully sketched a proof for, like “Do the user’s local actions always result in the intended state?” and “When multiple users are editing a document concurrently, is it impossible for them to end up with their edits interleaved in an unfortunate way?” Property testing gives me a way to be confident that these are true before spending hours trying to come up with a formal proof.

I’m writing the project in Rust, using burntsushi’s Quickcheck port as the testing framework.

Warping the Input Space

Property testing should give you confidence in your code, but that confidence should be proportional to “how much of the input space your tests have explored.” For very simple inputs, like a function that takes three booleans, Quickcheck can exhaust the entire input space easily. For more complicated inputs, like a function that renders some unicode text, your tests are much less likely to hit the rare edge cases, unless you are very strategic about how you generate your random examples.

One trick here is to use smaller types when possible. For example, say you have a function that searches for cycles in a directed graph, say a Map<A, Vec<A>>. Normally your graphs are indexed by strings. If you generate a graph by picking out random elements of your Map<String, Vec<String>>, you’ll spend most of your randomness on generating uninteresting graphs with lots of nodes pointing at nothing.

Instead, say you generate Map<u8, Vec<u8>>s. This is much more likely to give you interesting graph structures, since you’ll (at least occasionally) pick the same u8s at different places in your structure.

It might be even better to implement and generate elements of an even smaller type: say u3s. Note the tradeoff here: with u3s, your graphs will have at most 8 vertices (which limits the structures), but you’ll be verifying your code on a much larger subset of those graphs. There are certainly some bugs that only manifest on graphs with more than 8 vertices, but I’d wager that they’re much rarer than bugs that manifest on small graphs as well.

There are probably other tricks for warping the input space, to give you better coverage of the “tricky” parts, but “use smaller types” is quite useful on its own.

Another thing I’d really like to try in this general vicinity, but haven’t yet, is to write a tool that analyzes how much / which parts of the input space are being explored by your generators. (Just like the glorious code coverage tools of old! /s)

Generating Objects via Possible Histories

The other trick I’ve been enjoying involves objects that are difficult to generate, because the rules for which objects are “valid” are quite complicated. In my side project, generated elements of the CRDT type should be restricted to the elements that are actually possible to create through some sequence of user actions (including merging two CRDT elements together).

In cases like this, one option is to tell Quickcheck that invalid examples should be discard()ed. But this results in doing a lot of extra work to find valid examples, and in practice the library will give up after some number of discards.

A more workable solution is to generate a possible history: Implement a datatype to represent “possible states of the world”, and another to represent “events that might happen”, and then generate sequences of events to check that your property holds on the resulting world state. This lets you avoid ever having to call discard(), at the cost of a bit of extra code.

This trick is a little bit demanding: In order to be confident that you’re exploring the input space successfully, you need to be sure that your event datatype is really representing all of the things that can happen to your world state. One way to be pretty confident is by having your event type mirror your public API: For each operation that can be performed by an end-user, that operation should be possible to generate as an event.

Another downside of this approach (making it a last-resort in my toolkit) is that when you do find failures, your property testing library will (of course) show you the history that generated those failures. And unless you implement a helper that lets Quickcheck shrink your inputs (which lets it find the simplest possible input that causes a failure), you’ll spend a lot of time walking through the history to try to understand the problem. Other property testing frameworks might be better at the shrinking aspect here.

Other Thoughts About Property Testing

As you can tell, I’m hugely enthusiastic about property testing. Some other considerations that have occurred to me while working on this project:

Speed is important! I started my project using AltSysRq’s proptest library rather than Quickcheck, but discovered that it was nearly 10x slower on my project. That means 10x fewer example inputs! Its shrinking algorithm is much more sophisticated than Quickcheck’s, but in my opinion this is definitely not worth the cost in missed examples.
If your testing library is randomly generating examples, your confidence in your code grows with each test you run. I’ve considered setting up a beefy server somewhere just constantly running my test suite.
Despite being more than 20 years old, Property-based testing still feels like a “new” technology to me: There are some really great libraries out there, but I really feel that the entire software industry should be adopting it, and I suspect that tooling for property-based testing libraries could be hugely improved.

I hope these little tricks are useful to you – may Moore’s Law faithfully guide your PRNG!