A novel approach to Prime Numbers

I’m not certain this is a novel approach, but I’m pretty sure it is. I’ve done a fairly large web search and not found anyone else doing this. (Then again, it’s a big web and any search done by a mere human will be too small for any certainty).

Why primes? Welll…..

Aside from the fact that they have fascinated Mathematicians for thousands of years… aside from my having a Math Award (cash from an outside agency) at High School graduation and seriously thinking to taking a math track in college… aside from me being a bit Aspe and tending to tease issues to The End and this having no clear end yet… aside from practical issues (like making more efficient database hash codes or making / cracking encryption)… aside from all that: They are just something I got started on decades back and have an interest in researching. It is one of the “infinite computes” problems of the world that can consume all possible compute power for the foreseeable future (and then some) so finding a way to make it work better would be of benefit.

OK, some background.

In High School I learned about the Sieve of Eratosthenes. It was somewhat fascinating as it is a regular process, but yields irregular results. I worked the Sieve by hand to somewhat more than required for the class ;-)

In college, I played a bit with programming to find Prime Numbers. Not to excess, but some. (Compute time was limited and charged by the hour then. Despite having “cracked a code” that let me “find” any amount of compute time I wanted, “it would be wrong” to use it for personal interests that would deprive other folks of their compute time. The Burroughs Mainframe was pretty much run flat out 24 x 7. So some minor playing, but not to excess. Oh, you want to know the Hack? Seems that they used any old free disk for swap space. It was not scrubbed on release. So in FORTRAN there was an “open for random read / write” file option. Do that, you got a chunk of ‘free space’ that might have been used as swap at some point. OK, open the file. Write record 1, write record 1,000,000… it was up to the programmer to “zero” the space in between on ‘first write’… I decided to read it instead. Search for “assword” and print the following 100 chars ;-)

Later, an early job was doing database work on the old HP3000 mini-computer. Using the Image and Query database / languages. They used the size of the database as a ‘seed’ to the hash table function. If the database was a prime number in size, the hash function was more efficient. So I tossed together a sort of ‘brute force’ Sieve and calculated the first 1,000,000 prime numbers. (Needed relatively large ones for typical database sizes). Printed this out on 120 char wide green stripe computer paper – two big ‘tabs’ of it about a foot tall – and never had to do more than a ‘lookup’ again when building new databases. Also made an online program to which you could give a ‘size’ and it would find the next prime larger than that. It was used by others in the company. That was done after the printout, and more or less obsoleted it..

Now Primes are more important for cryptology than for database hash tables. So most of the current Public Key Cryptography depends on the fact that it is very hard to factor a very large number that is made of Very Large Primes. You must do a very large number of ‘attempted divisions’ before you find out it’s a 35 digit prime number that ‘goes’… I have an idea about how to ‘break’ that encryption (involving a few peta-bytes of storage ;-) but it also would need either massive amounts of money (any TLA interested in funding me, just shoot me an Email ;-) or a clever way to find ALL the primes in a large block of size.

Most of the primes that are large, and found, are special subsets that fit in special areas of high probability of finding a prime. (A couple of methods…. The Prime Spiral being one of the more interesting ones…) But I want to find ALL the primes (as someone could be using any of them). Turns out that’s rather hard when you get to very large primes.

Because that’s hard, and because The Glory goes to the “Largest Prime” not to “The Most Complete Set”, most folks have concentrated on finding ways to get random primes that are very large. Not much goes into finding ALL the primes. Heck, even for me, I’ve mostly let it lay there as a ‘someday thing’.

There are lots of ways to find prime numbers, and more ways to prove what you found is in fact prime. I won’t go into them. What I’m interested in is the question of “where do you search?”. IFF you could identify that place most likely to contain the next bunch of primes (constraining the search to a smaller area) you can more easily find those primes. Basically, putting the lamp post next to where you lost the keys…

Some years back (first time I was on contract to Disney about 9? Years Ago) and was stuck in a hotel room in Florida, I had the old Compaq Laptop with me and not much else to do that night, so started it to finding primes. I then simply plotted them.

Why? Because that’s what Mr. Theil did. He was my Math Teacher in High School and had this thing about graphing. So when we’d start looking at something new, he’d put some kind of graph up. Sometime it was useless. Sometimes it explained all. He basically thought you always ought to “visualize the data” just to see if anything showed up. So that’s what I did.

What I noticed was that the Primes were a “Stochastic Parabaloid”. Basically a parabolic like shape, but with some ‘random chatter’ about the target value. (They ARE integers, after all ;-) So take a bit of graph paper. Take the value “1” (that some folks think is a prime, but lately has been ‘cast out’ by folks who have a pet theory that doesn’t work if 1 is a prime. For me, it’s a prime. It is “divisible by 1 and itself”. That those are both the same thing makes it a Very Special Prime IMHO. Not an outcast). At any rate, as it’s “Special”, plot it at zero (another special number). Now the next prime is two. Plot it at “1, 2” then move on to three at (2,3) and five at 3,5 and on it goes. When you look at that graph with a few dozen numbers on it, it looks rather like a parabolic curve.

So, gee, I thought, could it be that easy? No, it’s not… There is “stochastic jitter” about the expected value.

BUT, that means that it can serve very well as a ‘place to look’. One method is to find a decent parabolic formula fit to the data, then simply search both above and below that projected point. Odds are very high that you will find the next prime reasonably close to that projection. (I’ll leave it to others to do some MATLAB curve fit to the primes and see what’s a decent fit… I did my ‘curve fit’ by hand, so the automated fit will undoubtedly be better).

A different approach is to put “bounds” on the search. What curve means “you’ve likely gone too far!”.

10000 Prime Fit

10000 Prime Fit

That graph is a bit small and kind of hard to read, but it’s ‘for illustration only’ anyway. What it shows is two curves that lay above and below several samples of Primes. As I’ve “put away” my notebook “somewhere” and the Compaq is bootable, but also put away “somewhere”, I recreated some of the work in a spreadsheet in a crude way. So I typed in several dozens of primes and fit the curves by trial and error (mostly error ;-). They aren’t the same as the ones I originally did, but close.

What does this show?

Well, almost impossible to see are the ‘few dozen’ prime numbers near zero / one and up to a couple of hundred. Then there’s a gap, and I’ve put some more primes in the table at a prime value near 1020. There’s a TINY LITTLE DOT on the graph at that point. A similar dot at the very far right is primes about the 10,000th prime. The green “linear fit” line more or less obliterates them, but they are there.

What about the other lines? One is a ‘lower bound’ curve fit of a sort. The other is an ‘upper bound’ fit. The curve formula that fits best at low ordinal (counting position) of primes is not the same as what fits best at high count values. That mostly means I’m likely being way to simplistic in my formula and it needs a couple of more terms…

So since the graph is somewhat hard to read, I’m going to put some “table data” here. I’m not going to put in a table for the very low values They’ve already been found anyway. One other point to make is that green “linear fit” line. The primes are below it at the start, then likely move above it at the end (just after the last data on the chart which is an intercept). At very large values, a parabola approaches a linear shape. One could do a decent search just doing a ‘linear fit’ to the known data and then searching above that line going forward… As more data are found, do a new fit. Repeat.

For the parabolic fit, or for other quadratic fits to limit curves, a similar ‘recalculate’ will bring the bounds closer to the actual value of interest. So even though these formula will likely diverge at very large values, one could simply calculate a few thousand values, then ‘rebounds’ the curves. (That is, tune them to still be outside the new data, but not too far outside…)

Via this iterative approach, the method ought to be extensible to any degree. (Finding even more accurate curve fits would help, but is not essential).

So what are the formula as used in this graph? For the upper bound, it is:
OrdinalPrimeValue^1.245+2.45*OrdinalPrimeValue

SO what’s an “OrdinalPrimeValue”? Well, 2 is the “first prime” so it’s ordinal (count) value is 1. Next comes 3, so it’s Ordinal is 2. The 3rd prime is 5, then the fourth ordinal goes with 7. This method depends on knowing ALL the primes, otherwise you can’t ‘count’ them and don’t know the ordinal value…. So it can extend all the known primes, but can’t find ‘the biggest’ prime out in the far netherworld…

Up at the 10,000th prime, the ordinal value is 10,000 and the prime is 104729. The lower bound formula has a value of 104189 and the upper bound value is 119999. So this simple crude first cut would have a search of at most 15810 values to find the 10,000th prime. (a close curve fit, or a search from the ‘middle out’, or any of several other ‘directed search’ methods could be used to make this better. Oh, and a 30,000 core massively parallel processor machine would test all the values at once and have half it’s power left unused ;-) As each set found lets the bounds be better fit for the next batch, the ability to find primes ought to be roughly linear work with size (or a very slighly growing work value if the degree of ‘curve fit’ improvement isn’t linear).

The lower bound?
OrdinalPrimeValue^1.24+1.3*OrdinalPrimeValue-12

All in all, not all that hard.

(One trivial enhancement is to just plot the linear intercept trend line to the last found value and search above it first. Only on very rare occasions ought the next value be below it. The next enhancement is to search a “little bit” below the intercept value just to avoid surprises… then tuning ‘how much’ to search below based on experience.)

OK, some tables. There are four columns here. The first is the ordinal value, then the prime number, followed by the lower bound, then the upper bound.

Count   Prime   Low B   Up Bound
972	7649	6318	7625
973	7669	6326	7634
974	7673	6334	7644
975	7681	6341	7653
976	7687	6349	7662
977	7691	6357	7671
978	7699	6365	7680
979	7703	6372	7689
980	7717	6380	7699
981	7723	6388	7708
982	7727	6396	7717
983	7741	6404	7726
984	7753	6411	7735
985	7757	6419	7744
986	7759	6427	7754
987	7789	6435	7763
988	7793	6442	7772
989	7817	6450	7781
990	7823	6458	7790
991	7829	6466	7800
992	7841	6474	7809
993	7853	6481	7818
994	7873	6489	7827
995	7877	6497	7836
996	7879	6505	7846
997	7883	6513	7855
998	7901	6520	7864
999	7907	6528	7873
1000	7919	6536	7883

I had a ‘fit’ with closer match at this point, but it diverged at the 10,000 ordinal prime. If you look at this carefully you will find that the ‘upper bound’ is smaller than the prime. This was larger than the prime prior to the last change to the formula to fit out at the 10,000 end. ( only the coefficients were made slightly different). As an example of ‘post iteration’ it is interesting. I’d had a formula that was clearly outside the bounds, but now is inside with the ‘latest fit’… but as I’ve already ‘found’ these primes it’s not so important… Generally, I think an ‘iterative fit’ works best; but even here, a search of a few hundred values to find the next prime is not a very large compute load. I expect that at even larger values, the fit will become even tighter (as the parabolic shape becomes ever more linear…)

The earlier value had been:

OrdinalPrimeValue^1.26+2.25*OrdinalPrimeValue

So you can see how I iterated the value over time as more data further out were added. Again, a better automated curve fit to known values would be beneficial, but the real value is in iterative ‘fit’ as new data is added and that this is a target for new value searches; not a hard wall. So “search here first” ought to find the next value first; but when it doesn’t the value is very nearby…

The first few million primes have already been computed, so using a larger set of data would give an even tighter fit to the bounds. (Something to do “real soon now” ;-)

Finally, note that the ‘bounds’ are an indication of probability, not of certainty. This tells you where to start looking and gets you close to the next value. Searching a bit outside the lines on those occasions where no prime was found is unlikely to be a major compute cost issue. Even if you have computed overly tight ‘bounds’ on a further iteration of the fit.

Here’s the ‘fit’ out at the 10,000th prime point (post iteration):

9981	104549	103950	119727	
9982	104551	103962	119741	
9983	104561	103975	119756	
9984	104579	103987	119770	
9985	104593	104000	119784	
9986	104597	104013	119799	
9987	104623	104025	119813	
9988	104639	104038	119827	
9989	104651	104050	119842	
9990	104659	104063	119856	
9991	104677	104076	119870	
9992	104681	104088	119885	
9993	104683	104101	119899	
9994	104693	104113	119913	
9995	104701	104126	119928	
9996	104707	104139	119942	
9997	104711	104151	119956	
9998	104717	104164	119971	
9999	104723	104176	119985	
10000	104729	104189	119999	

You can see that it’s about 550 terms to test above the lower bound, and about 5,000 to the upper bound. Clearly a bit better fit to do ;-) Still, searching a ‘thousand scale’ area for the next prime is not a lot of compute work. Then, a few thousand primes more along, computing the ‘new bounds’ lets the process continue at about constant work load growth.

In Conclusion

That’s the idea and the method. Iterative curve fitting of upper and lower bounds. Probabilistic prediction of ‘which way to look’ and ‘how far’.

As I already have a few million prime numbers, I’ve not computed the next batch(es)… but if anyone needs them, it ought to be pretty easy to do with these methods. (Breaking current crypto techniques would likely take a few hundred thousand dollars and a year or three of work… Oh, and some ‘way cool’ computer gear ;-)

There’s more ‘stuff’ about primes than any sane human could read… but just a couple of pointers here:

The first 10,000 primes:

http://primes.utm.edu/lists/small/10000.txt

A Wolfram Alpha curve fitting page:

http://www.demonstrations.wolfram.com/InteractiveCurveFitting/\

The first 100,000 prime numbers:

http://www.digital.library.upenn.edu/webbin/gutbook/lookup?num=65

The first 50 Million Primes:

http://primes.utm.edu/lists/small/millions/

There’s a whole lot more math theory out there about primes. What I find frustrating is that it tends to fall into one of two camps. 1) Methods for showing a number is prime or variation on brute force finding primes. 2) Ways to find a very large prime but skipping over large areas of primes that are not yet known.

The whole area of ‘directed search’ tends to be ignored. IMHO there’s a very large opportunity to ‘put bounds on the search’ and find a whole lot of primes. Just not the biggest one… FWIW, I also ran into a formula that gave a large number of primes over one small span. It had variations with 4th power and 5th power functions. IMHO, that’s just a statistical quirk in that segment of the prime space. It ought not take a 5th power term to put bounds on primes….

OK, so that’s the “idea”. And it’s on my “someday” list of things to do to tighten it up a bit… but as I’ve got way more “things to do” than “time to do them”, I figured maybe someone else could benefit from the idea. The simple approach is to plot the first few million prime numbers, put a ‘best fit’ curve through them (automated codes for that exist now) and put ‘bounds’ curves on each side of them. At that point, searching each way from the best fit curve and out to the bounds (or just a touch beyond) ought to exhaustively find all primes with minimal computes. What to do with them is “for another day” ;-)

One can only wonder if someone else has thought of this. Pre-compute a whole lot of ‘products of primes’. Put them in a Very Large Lookup Table and then when it’s desired to ‘factor a very large number’ it is just a single ‘lookup’ in said peta-byte of data… not a very large compute workload at all…. As TB scale storage is on the order of $100 (or less…) and a single ‘lookup’ doesn’t take a lot of seek nor transfer speed, it is very do-able to precompute and store a very large number of such products into very slow and cheap storage. It’s the “size” that matters, not the transfer speed….

With that, I hope I’ve not just obsoleted a large amount of crypto gear ;-) I’m not too worried, as it likely would take a large budget to prove the thesis and those folks are already well funded and ‘further along’ than I am; so at most I’d be ‘spilling the beans’ to the rest of us… and that’s a good thing…

Peta-FLOP scale computers are ‘reasonable’ and ‘femto-byte’ storage is possible. Folks who can fund that scale of compute farm have a whole different way of looking at problems…. so “Why factor a large number when you can just do a lookup?” is a reasonable question at that scale…

I think i’ve shown that computing the data to fill that lookup-store is a reasonable problem to solve.

Subscribe to feed

Advertisement

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Science Bits, Tech Bits and tagged , , , . Bookmark the permalink.

12 Responses to A novel approach to Prime Numbers

  1. Jim Masterson says:

    Very interesting. I, too, have been trying to find a prime generation formula. I’ve been trying to curve fit an nth root or a very slow rising exponential. I hadn’t thought about using a parabola lying on its side. I’ll try it and see how it works.

    Thanks,
    Jim

  2. pouncer says:

    There was an old Isaac Asimov article about the largest useful number — called Skewee’s number, IIRC — that discussed something similar. There was apparently a distribution curve of the frequency of primes and a set “below” the curve and a set “above” the curve and Skewee demonstrated the point beyond which that distribution should “cross over”.

    Asimov’s point wasn’t about primes per se. He was writing about how Skewee’s number could count more space for more atomic particles for more units of time than could possibly exist in the life of the universe. But still …

    The bounded curve part sounds familiar and you might want to compare Skewee (sp?) to what ever else you’re googling.

  3. omanuel says:

    As I recall there is a “Chinese sieve” for finding prime numbers. Did you just rediscover it? Oliver

  4. philjourdan says:

    Fascinating read! Like you, I was always interested in Math. Unlike you, I actually pursued it as a degree for some time in college (the first 2.5 years). So I still enjoy reading about it and new ways to make things easy! I remember my early computing days (no, I did not think of your hack) when I was developing logarithmic sorts (when computing time was precious). That is when I realized that a good programmer (again in the days of costly computing) was a good mathematician!

  5. adolfogiurfa says:

    What does it show, in the real world of phenomena, a prime number?, what does it happen in those “points” or “places”?
    Let us see an octave 1,2,3,4,5,6,7,8,9 (the only real numbers which exist)
    There we have 1,3,5,7,9, precisely those places where those black keys on a piano keyboard, called “intervals” or “gaps” in an octave scale, in a scale of vibrations where a “quanta”, goes in or goes out, where the octave “feeds in” or “excretes”: 2+1=3, ratio: 2/3=0.666, etc., or where an octave of development crosses another:

  6. Gary says:

    My question is how do you “find the time” to do “all that you do”? Nice post, btw. Very readable even for math non-intuitives like me.

  7. E.M.Smith says:

    @Gary:

    Well, I don’t sleep all that much, and some projects get ‘thought about’ while I’m actually “doing” other things. So, for example, while making bread, I’m thinking about things like prime numbers, or what Linux release to try out… While posting comments, I’m likely booting / testing some Linux release (as shown in some of the comments) and may be pondering things like “Would 1/2 the Rye work in a bread?” while waiting for the login screen… Lots of “overlap”…

    But even then, some things end up being on ‘hold’ for way too long. I’ve got a half dozen nagging at me for more time, that I just don’t have, so they languish… Like that whole Church Of The Sacred Carbon where I’ve pondered the aspects more, but not done another posting… and writing a bit more fiction (that cries at me from neglect… but I had other things thrust to the front… like the Java and such bugs that meant I needed to build a more secure compute environment on a DIY basis, like it or not…) Thing like this topic, primes, are sometimes a summary of things done years ago. So not much time needed now, more a ‘brain dump’ from back then.

    Oh, and lots of coffee helps ;-)

    @Jim:

    I found some formulas on line that used 4th and 5th powers, but didn’t keep the links. I think “Prime number quadratic” and then either ‘formula’ or ‘equation’ would likely turn them up. There were intended to find ‘many primes in a row’ as opposed to ‘bound the space’ in an iterative way, though. Might be worth a look…

    http://mathworld.wolfram.com/Prime-GeneratingPolynomial.html

    has some other ones listed…

    @Pouncer:

    Thanks for the idea!

    http://mathworld.wolfram.com/SkewesNumber.html

    but it looks like it will take significant ‘think time’ to wrap head around it…

    @Oliver:

    Never heard of a “Chinese Sieve”… Hmmm…. (he does a web search…)

    https://terrytao.wordpress.com/2007/06/05/open-question-the-parity-problem-in-sieve-theory/

    Looks like the “Chinese remainder theorem” is the name for it? Don’t think it’s what I did…

    @PhilJourdan:

    When I was learning programming in college, there were three areas of study that got you a lot of computer time / computing related degree (there being no Computer Science Degree then):

    1) Accounting / business. Essentially a ‘degree in COBOL’. (Honorary mention of Economics where Econometrics was just starting up and you COULD take a lot of computer classes, which is what I did, and not COBOL either ;-)

    2) Engineering. Mostly in FORTRAN (honorary mention of ALGOL and some others). The first computer class I took.. (I was an Engineering Major for a year or two…. and tool ALGOL as my second computer class).

    3) Math. Where most “computer majors” got degrees. They got to play with all the fun languages too ;-)

    Yes, it REALLY helps to have a good grasp of math do do efficient programming… the symbolic thinking helps too…

    @Adolfo:

    Um, there’s a whole lot more ‘real numbers’ than just the decimal digits 1-9… First off, you need a zero…

    Oh, and in other bases, there are other ‘single digits’… so in HEX you need to add A B C D E F as well… But if you are in octal, does that mean that only 1, 2, 3, 4, 5, 6, 7 are “real numbers”?…

    I’d also point out that “octaves” are non-linear in frequency while digits are linear…

    (Personally, I like base 60 the most. It’s the most interesting… but rarely used for anything these days).

    FWIW, I’d assert that the Prime Number are the only “real” ones and that the composite numbers are fabrications made from them… but that’s just me ;-)

    I do agree that log paper and octave scales are more interesting than linear ones, but I’d not limit the world of interesting numbers to just a few integers. Heck, Phi is fascinating, and e and pi and… all much more ‘central’ to reality than a few integers…

  8. Jim Masterson says:

    As I look at the function now, a parabola won’t work. It looks like a combination of a logarithmic function and a decaying exponential. Curve fitting those functions is quite a chore.

    Jim

  9. adolfogiurfa says:

    @E.M.: Um, there’s a whole lot more ‘real numbers’ than just the decimal digits 1-9… First off, you need a zero…
    Of course….It all began with 1 (active force) and a Zero (passive force, vacuum) : Nature abhors vacuum, then yang (1) runs after ying (0).
    You must ponder that there are not such a thing as abstract numbers, they are always related to physical laws, and basically, to the movement (love-hate) of charges, generating two main primeval “waves” sine+cosine, such “umbilical chord” called “Birkeland Current”.

  10. Chuck Bradley says:

    The fundamental theorem of arithmetic (unique factorization) is simpler to state if 1 is not considered a prime. It is often convenient in algorithms to pretend 1 is a prime. There are relatively fast ways to test for primeness by a probability procedure with a user specified chance of error. The encrypters can make the primes longer much faster than the decrypters can make the tables you suggest, but it is an interesting idea. Thanks for entertaining and educational blog.

  11. E.M.Smith says:

    @Chuck Bradley:

    There’s another axis to the “speed of storage growth”: size of budget…. So if a Three Letter Agency has a $Billions budget, they can buy more storage and fill it faster than your $Thousands budget can find longer primes… and once a prime is known, well, the compute cost for it is zero. Keeping a ‘cannonical list of primes’ up to date is not a very large work load.

    So while what you said is quite true, I suspect there’s a ‘dollar axis’ to the problem…

    Oh, and glad you like the discussions!

    @Jim Masterson:

    Yeah, I called it a ‘parabaloid’ as it isn’t quite a parabola, and the dificulty of ‘curve fit’ for what it really is was what caused the idea of “incremental extension of fit”. Even though a parabola is NOT the formula, you can do a ‘close’ curve fit over a part of the problem space using a parabola. So then it just becomes “incremental extesion of the fit”. Compute a hundred primes, re-fit the curve, compute another hundred, re-fit, etc. Automated curve fit to a span of 1000 numbers ‘at the ends’ out to give a fairly small space to search. (In some cases this will not help. Some primes differ by 2 so linear search finds the second of them fast.) Still, the closer you get to the ‘real formula’ the further you can ‘predict’ on each iteration.

  12. jim2 says:

    Paraphrased from the book, “An Imaginary Tale The Story of sqrt(-1): First off, mathematicians centuries ago felt that negative numbers were “unphysical.” Along these lines, they also favored geometrical proofs for at least two reasons. One could literally see the solution and if a geometric proof could be found, it was definitely “physical.”
    Newton deduced the inverse-square law of gravity and the direction of that force from Kepler’s second and third laws and he used geometric proofs. This, even though he had developed calculus enough that he could have used it to provide proof. At the time, there was an uproar over the validity of the calculus and Newton did not want that to distract from his findings on gravitation.
    Later, Richard Feynman wanted to present Newton’s derivation, but he couldn’t understand some Newton’s proofs which were are based on the arcane properties of second-degree curves. So R.F. worked out his own geometrical proof and it was not easy.

Comments are closed.