At one time, at Apple, we did what is now called a Rainbow Table password attack. We pre-computed a lot of passwords and could just do a “lookup” on the table to find the decryption. At the time, we had 2 TB of data storage. Figuring out what it would take to pre-compute ALL possible passwords for the (then) fixed length cryptext, we realized it would take all our storage space and then some. Not going to work.
So we then proceeded to just use a direct dictionary attack (again with some variations like leading numbers and special char on each end) on our password file. Why? To find which users has weak passwords. So make your password “holiday” and you would get an email saying “We have broken your password, please change it to something better” and describing what better meant. IF it failed with the same password a little later the next email would include what the password actually was. A third time we’d change it for you ;-)
So I’ve known about the problem of very large Rainbow Tables for a very long time.
But disks keep getting bigger. Due to that, Unix & Linus went from visible cryptext in the password file to hidden cryptext, closing the “lookup the cryptext” door. (Called “shadow password file”, we did that at Apple back about ’85 or so, plus or minus a year or two).
But what about encrypted data communications links?
These are pretty much Diffe-Hellman / RSA based. The critical bit being an exponentiation then modulus a large number, that is the product of 2 hidden prime numbers. So on and off I’d wondered:
Could you pre-compute the product of all the known primes and store that as a Rainbow Table?
IFF for example we knew a Billion primes, that would only be 10^9 x 10^9 products. A Terabyte disk is 10^12 bytes. So 1 million of them x the average byte length. If that average were about 1 million, then we’re talking 1 Trillion TB disks. In 2011 (a good half dozen years ago) IBM made a 120 petabyte storage system. A petabyte is 10^15 bytes or 1000 TB.
So that system is about 10^17 bytes or 120,000 TB. It would take 10 of them to make that Million TB, then a million times that.
OK, at that point even the NSA computer budget is having “issues”.
We’re saying that we’re about 10^7 short (or were 2 Moore’s Law cycles ago). We could be closing in on “only” 10^5 short by now. Still large, but will that be “small enough” in another 20 years?… So are those example guesses of number of primes at all close?
Has an interesting discussion of the question of just how many primes and about what lengths. I’m going to skip down to a couple of more germane bits to my question. The assertion is made that there are so many primes you can’t possibley store them all (never mind the products of them).
Nobody’s really keeping count.
Newly discovered large primes make the news, but primes in the range of, say, a few hundred digits are not something that anybody keeps track of. They are very easy to find — the computer that’s showing you this text is likely capable of finding at least several ones per second for you, and with overwhelming probability they will be primes nobody else have ever seen before.
There are very many hundred-digit primes to find. We could cover the Earth in harddisks full of distinct hundred-digit primes to a height of hundreds of meters, without even making a dent in the supply of hundred-digit primes.
This also raises the question of what it means that a prime is “known”. If I generate a dozen hundred-digit primes and they are forgotten after I close the window showing them, are these primes still “known”? If instead I print out one of them and save the copy in a safe without showing it to anybody, is that prime “known”? What if I cast it into the concrete foundation for my new house?
…where π(n) is the number of prime numbers less than n. The largest known prime, discovered in 2008, is 2^43,112,609−1, but if we put that in the place of n we get an answer so big that not even WolframAlpha can compute π(n)
(no need to click on it, because it doesn’t work).
However, we can still find an approximate answer thanks to the list of 10 largest known primes. The largest number for which WolframAlpha still works is currently ranking 3rd on that list and its value is 2^37,156,667−1
from which we get that there are approximately 7.853∗10^11,185,263 (or 10^10^7.04865) primes smaller than 2^37,156,667−1 using the π(n) formula.
It is known that there are exactly 1,699,246,750,872,437,141,327,603
primes less than 10^26 but not all of these are known primes. See Douglas Staple’s thesis – Henry Jun 1 ’16 at 23:31
So as a reasonable number to use, there’s about 10^24 primes “computable”. Knock off a few orders of magnetude for the very very large so hard to compute end, and for the very very small so easy to find the product end (like 3×5 = 15 so you are not going to use 15 as your product) and you are still left with 10^21 to 10^18 range. Now the product of each pair of them is going to be an insanely large number ( 10^36 to 10^42 products of about 1/2 x 12,000,000 digits length average or 6,000,000 digits. So 6 x 10^6 x about 10^36 or 10^42. That’s roughly 10^42 to 10^49 bytes.
Wikipedia says that the largest known prime number is 2^43,112,609−1 and it has 12,978,189 digits.
So now we have our answer. No. A Rainbow table like attack is not possible. Not even close. Not even with petabyte storage systems. It would take between 10^25 and 10^32 of those largest of 2011 storage systems.
It is remotely possible you could find some way to reduce the search space. Perhaps by examination of a given bit of software you might find the key generation is limited to keys of exactly 100 bytes. Say, for example, it always started it’s “random” key generation at the 100 digit length. You know it will find one before it hits 101 digits. Though it would be a bit of a rookie mistake to do that kind of thing. But say they did.
That could be used to reduce the search space dramatically. Would it be reduced by 10^15 bytes? Maybe… but I think you would need to be lucky. You would also need to precompute and load all those products before the utility of the code break was mooted by a software or bit length change. At that point you have a massive data center, full of expensive storage and LOADS of compute engines trying to fill it. But ONLY if the particular version of software doing the encryption is known, it is known to only use one key length, and it doesn’t change much. I don’t see those conditions as existing.
I’ve not looked at the key generation code myself, but were I writing it, I’d set a lower bound for complexity needed, and an upper bound based on max tolerable time to compute, and pick a start point between those two via call random. (And the guys actually writing that code are more paranoid about this stuff than I am and write better code…)
What brought this on? Watching a series about code breaking that ended with RSA / Diffie Hellman and the assertion it was unbreakable. Reminded me I wanted to scratch this particular itch on what it would take to do a Rainbow Table attack.
FWIW, encryption guys never think something is always un-crackable. They just think it’s un-crackable for now, and set about looking for new and different ways to attack it.
It took about 250 years from time of invention before Babbage cracked the then Unbreakable Cipher, so these things can take a while to mature and weaken. But code breakers never give up trying. That’s part of the charm of it ;-) Windmills, tilting, and all that… but where sometimes the windmill topples. FWIW, Babbage didn’t disclose his crack. Another guy got credit for cracking it many years later. It is thought this was because it gave the British an advantage in the Crimean War as “everyone” used that cipher and believed it unbreakable. Thought to perhaps be the first example of military classification of an academic discovery about code breaking.