Moore’s End, The Economist, The Absurdity and the Ecstacy

I don’t know how the same magazine can have both a great technical article and an absurdity of stupidity article at the same time on the same topic. I was at Starbucks, having my Vente of Joe, and at the network login screen you get teasers. One was on “The Future Of Computing” so I took a look.

Basically says that we’re reaching the end of Moore’s Law (computer power doubles every 2 years) and that this is A Bad Thing with unknown consequences. OK, that’s been predicted a few thousand times so far… and so far has been wrong. (BUT there’s reason to believe “this time is different” as we’re running out of atoms to make a smaller transistor…). But, but, the real absurdity is that they attribute Moore’s Law to “Central Planning”. No, honest, they do! Apparently they have never visited Silicon Valley, have no clue how cutthroat proprietary it is, how it depends on non-planned “disruptive technology” to upset everyone else’s apple cart and make billions, and just how “entrepreneurial” the place is (and how much entrepreneurial is antithetical to “central planning”)…

Speed isn’t everything

What will this mean in practice? Moore’s law was never a physical law, but a self-fulfilling prophecy—a triumph of central planning by which the technology industry co-ordinated and synchronised its actions. Its demise will make the rate of technological progress less predictable; there are likely to be bumps in the road as new performance-enhancing technologies arrive in fits and starts. But given that most people judge their computing devices on the availability of capabilities and features, rather than processing speed, it may not feel like much of a slowdown to consumers.

Mostly the article is a ‘bit of fluff’ overview of another article, linked in it, that does have some technical chops. This first one has the usual comparisons of other physical things with the improvement rate of Moore’s Law with skyscrapers 1/2 way to the moon and fractional speed of light cars… All physical absurdities, and all fantasies from the non-technical mind founded in fluff and nonsense.

But that “triumph of central planning” is just Socialist Clap Trap Dogma divorced from reality. Intel, internally, did their own planning and pushing forward of Moore’s Law without any external “central planning”. AMD started as a ‘follow along’ competitor, and the rivalry between them was brutal. Future plans and directions closely guarded corporate secrets. ARM had a bright idea of how to make money out of cheaper, not faster, and pushed in their own direction. Then there is the Silicon Graveyard of failed ideas and companies. I’ve worked at some of them. Amdahl was competing with IBM (prior to the PC revolution) and is now gone. It pushed tech in its own way, and entirely differently from Intel. Sun Microsystems, HP, and DEC all had the mini-computer revolution with their own CPU designs and their own tech advances (and all in similar cutthroat competition, not “central planning”). Now only HP survives, having consumed DEC, Compaq, Tandem, and God Knows how many others. But HP sold their microprocessor advances to Intel as it became too hard to continue to compete on chips. Sun, sold to Oracle, “sort of survives”…

This place is as far from Central Planning as you can possibly get. Anyone remember the VLIW Very Long Instruction Word computer? The nibble based computers? The Motorolla CPUs (68000 and similar) and CP/M? How about Burroughs and their 56 bit word? (Sort of surviving as Unisys). We’ve tried long words, short words, parallel processing, Big Iron single computers, Cray had “bubbles” the Cray2 in a bath of organic blood to cool it. The list of things tried, and often eventually discarded, is long.

It is the extreme end of ignorant to call this Venture Capital Thunderdome Fight Club “central planning”. It has been exactly the opposite. I’ve lived it, so no one can tell me it is something it has NEVER been. ( I have used, or worked for,: Burroughs, Amdahl, IBM Mainframes, Sun, HP – both their 3000 to 9000 series and their microprocessors, DEC Vaxen, CP/M, Unix of many colors, PC scale boxes, Apple through all generations of CPU and hardware / OS, Cray Supercomputers, and likely a dozen others that I don’t have at the finger tips right now…) I’ve taught the History Of Computing at the Community College level. It was NEVER EVER “centrally planned”.

Then There’s The Good Article

The link also goes to an Economist article. One with technical chops that is actually quite accurate.

It does an accurate job of laying out the progress of computing from the Intel 4004 ( I started with the 8008 ) when Moore's Law started as a doubling every 1 year, to the shortly updated one that covers most of my career as a double every 1.5 years, to the present where we've hit a double every 2 years, and even that is starting to slip. We've hit the knee of the curve. They recognize this, explain why (present transistors are about 100 atoms. Can't cut that in half too many more times and still have a working transistor… plus we've hit the point where each halving does not reduce heat load nor improve speed. Oh Dear.)

Then it gives an overview of some things that are not Moore's Law based, but will give some extended processing gains. More cores, a gaggle of distributed CPUs in a "cloud" (ignoring the ever smaller problem set that can solve, and the communications costs in time and money) and some exotic hopes and dreams like Quantum Computing.
"But Hope is not a strategy. E.M.Smith"…

Still, it's a good article and well worth reading as it clearly lays out why Intel is smacking into a wall and why computers are not going to be getting leaps and bounds faster and why it is time to go back and clean up 30 years of crappy inefficient software (that's where we can mine A LOT of CPU cycles fairly easily). It also hints at who's going to make buckets of money. Folks like NVIDIA with "specialized processors" and CUDA…

A child with a decent microscope could have counted the individual transistors of the 4004.

The transistors on the Skylake chips Intel makes today would flummox any such inspection. The chips themselves are ten times the size of the 4004, but at a spacing of just 14 nanometres (nm) their transistors are invisible, for they are far smaller than the wavelengths of light human eyes and microscopes use.
But now the computer industry is increasingly aware that the jig will soon be up. For some time, making transistors smaller has no longer been making them more energy-efficient; as a result, the operating speed of high-end chips has been on a plateau since the mid-2000s (see chart). And while the benefits of making things smaller have been decreasing, the costs have been rising. This is in large part because the components are approaching a fundamental limit of smallness: the atom. A Skylake transistor is around 100 atoms across, and the fewer atoms you have, the harder it becomes to store and manipulate electronic 1s and 0s.

They have a very nice chart showing that in about 2006 the clock speed topped out, and that the thermal design limit topped (so not getting reduced heat load with reduced transistor size anymore). At this point, all that is left is reduced silicon, and at 100 atoms to the transistor, that’s almost hit the wall already. Some time back I looked at the number of electrons used to store a “bit” of information and cut it in half in accordance with Moore’s Law. I don’t remember the exact date it came to, but it was somewhere around 2018 – 2020. At that point you can’t cut an electron in half. Similar things apply to making a transistor out of a half dozen atoms. Just not going to happen.

The article does a great job of listing the “issues” with present tech, the other kinds of tech solutions being explored, and why overall performance will continue to improve for a while ( I give it about a decade, maybe two if we get diligent about honing and polishing really good efficient software). BUT, it will not be the regular dependable “double from silicon every 1.5 years”, or even every 2 to 4 years.

Down nearer the bottom it gets into the bright ideas that might keep Moore’s Law running (while giving lip service to the formal definition that is more stuff per unit area, then ignoring that and just going for more total performance…) The Usual things like Gallium Arsenide and SiGe alloys. The unusual stuff like Quantum Computing where we are almost able to demonstrate that it might actually work, maybe. On a good day. If you don’t look at it…

The one that gets me most, though, is they usual sop of hanging hope on parallel computing. More cores per system. More systems. The Cloud full of systems. All this ignoring Amdahl’s Law and that we’re already well into the parallelizing process. Each core is now pipelined with computing ahead possible solutions and throwing away the branches not taken. We have multiple threads per core. We have multiple cores per chip. (4, then 8, then 16, now pushing 32…) and then boxes with increasing chips per box. (What are we up to, 8 now?). How many parallel compute units is that already? A few dozen? Maybe a few hundred?

Amdahl's Law bites at a few thousand parallel compute units

Amdahl’s Law bites at a few thousand parallel compute units

Look at that carefully. Realize that it takes a LOT of work to make a problem “parallel” and many classes of problems can not be made parallel at all. Then think about how many threads of parallel are already running on your multicore multiCPU box. How much can you really gain by having 1000 of them in The Cloud somewhere? For what class of problems? Only a vanishingly small class of embarrassingly parallel problem.

At 8 processors ( 2 CPU “chips” of 4 cores each ) you have already gotten max performance out of any problem that is less than 50% suitable for parallel processing. That is a LOT of the workload of the world.

At 64 processors ( 8 CPU “chips” of 8 cores each ) you have reached the limit for anything less than 75% parallel. We’re already at that point for many high end machines, and even some low end boards for “education” have 64 cores, though those are often slower and cheaper cores so not at the compute limit. Give it a couple of years…

There are few problems at all more than 75% parallel.

In the realm of approaching embarrassingly parallel, you get to the 90% parallel and 95% parallel problems. They can use up to 512, and 2048 to 4096 cores respectively. (There’s a reason most of the commercial Supercomputers today top out at 2k to 8k cores… though the Top 10 can have in to the million range, but that’s more for a very narrow problem set and bragging rights than being generally usable.) At that point you are hitting the wall of parallel processing damned hard. What speed up do you get? 20 times. 2^4 = 16 while 2^5 is 32. So between 4 and 5 ‘doublings’. As that happens, most of the time most of those cores will be idle, since most of what we do is NOT highly parallel.

In effect, once we are commonly buying machines with 4 CPUs of 8 cores each (or so) we’re buying ever more hardware that does ever less that we care about. We can see that point already. Present systems are about as fast as the Average Person can use on the kinds of problems they care about.

Increasing clock rates has been the main way of making chips faster over the past 40 years. But since the middle of the past decade clock rates have barely budged.

Chipmakers have responded by using the extra transistors that came with shrinking to duplicate a chip’s existing circuitry. Such “multi-core” chips are, in effect, several processors in one, the idea being that lashing several slower chips together might give better results than relying on a single speedy one. Most modern desktop chips feature four, eight or even 16 cores.

But, as the industry has discovered, multi-core chips rapidly hit limits. “The consensus was that if we could keep doing that, if we could go to chips with 1,000 cores, everything would be fine,” says Doug Burger, an expert in chip design at Microsoft. But to get the best out of such chips, programmers have to break down tasks into smaller chunks that can be worked on simultaneously. “It turns out that’s really hard,” says Dr Burger. Indeed, for some mathematical tasks it is impossible.

Anyone thinking the Average Joe or Jane can use and benefit from a 1024 core machine has been smoking something that is illegal in most States… Then again, that was Microsoft talking…

Sure, you can load them up with things like searching for Golomb Rulers and Pi to a zillion digits and expanding the table of prime numbers (see SETI and BOINC projects) but few of us really need to do that or do protein folding or run “climate models” in our day job. Essentially, the embarrassingly parallel problems are already being served by COW (Cluster Of Workstation), Supercomputer, and Grid Computing as distributed computing platforms (see BOINC again for example). The world of your desktop is increasingly dominated by the bits that can NOT be made parallel, in accordance with Amdahl’s Law. Adding more cores won’t fix that. Moving it to The Cloud won’t fix that.

Also remember that The Cloud, The Grid, BOINC, etc. etc. all end up running into data and process communications limits. As you get more cores, more widely spread, the communications costs and limits whack you.

So, IMHO, the next decade will be the one where we start exploring a lot of those things we’ve ignored, and a lot of “new” ideas (some of them old ones like Very Long Instruction Words) will start being floated. At this time, I think THE most immediately useful is the use of the Vector Unit ( “graphics processor” ) as a dedicated parallel math processor. Those NVIDIA cards using CUDA. That will take the load of most of the highly parallel and yet not too complex problems. Things like Virtual Reality and Machine Vision and ever better graphics and animation. Even some interesting compute problems like running climate models (some are already being ported to CUDA – the NVIDIA language for parallel processing math chunks).

My “gut feeling” about it is that we’re more or less at the wall. There will be a decade of incremental improvements, but at the next doubling of hardware, it won’t be clocking faster and it won’t be better thermally, so it’s mostly just more cores or wider words. At 64 bit words, making the word wider doesn’t gain much ( it is a problem similar to Amdahl’s Law… but the ‘bite’ comes between 32 and 64 bit words. You end up packing and unpacking a lot of bytes / word instead of just processing them…) So between the word length running low on benefit, and the ‘more cores’ hitting Amdahl’s Law (or is that Amdahl’s Wall now?…) and with ever less parallel problems not already being handled, the benefits will start to be very slim. ( I expect most of them to come as lower cost, not better performance, so chip companies will see flat to lower revenue and profits…)

That article goes on to look at a whole lot of interesting “possibles”, the field of dreams for extending Moore’s Law a few more decades. Maybe one of them will make it, but for Silicon, the end is near. It will be a huge jump needed and a whole lot of $Billions to move to something other than Silicon. And likely a decade or two.

Well worth reading that long list of the dead technologies of the future. In it, somewhere, is likely the one that will ‘make it’ and be in the computers of 2030. Some are truly fascinating ideas. Personally, I like the one with “blood”. It uses a fluid as both an internal “flow battery” (so external electricity isn’t needed eliminating all those power leads) and as liquid coolant (so you have better heat flow and can cram more into less space). Reminds me of a microscopic “Bubbles” Cray2 ;-)

IBM hopes to kill two birds with one stone by fitting its 3D chips with miniscule internal plumbing. Microfluidic channels will carry cooling liquid into the heart of the chip, removing heat from its entire volume at once. The firm has already tested the liquid-cooling technology with conventional, flat chips. The micro fluidic system could ultimately remove around a kilowatt of heat—about the same as the output of one bar of an electric heater—from a cubic centimetre of volume, says Bruno Michel, the head of the group (see Brain scan, previous page).

But the liquid will do more than cool the chips: it will deliver energy as well. Inspired by his background in biology, Dr Michel has dubbed the liquid “electronic blood”. If he can pull it off, it will do for computer chips what biological blood does for bodies: provide energy and regulate the temperature at the same time. Dr Michel’s idea is a variant of a flow battery, in which power is provided by two liquids that, meeting on either side of a membrane, produce electricity.

Doing that could effectively get you to 3-D chips and out of Flat World surface area heat limits. You would still have the size and number of cores limits (word size and Amdahl’s Law) but at least you could crank up the power and make the switches go a bit faster for a while.

They then wander off into the Idiot Of Things, oh, pardon, the “Internet Of Things”… Like I really want my toaster connected to the internet and hacked… Yeah, it might happen, but IMHO is more Stupid Dream that really desired product. (i.e. most likely to ‘make it’ when forced on us by law rather than by actual product demand).

They are ignoring the exponential rise in Fab Cost with each halving of micron size and the halving of PRICE/compute and therefore profit at each iteration. Those two curves will cross and at that point, profit ends. It happened first for SuperComputers and now they are made from thousands of Intel Chips, not supersized custom CPUs. Then it hit mainframes and Amdahl and others left the business. Then MiniComputers got hit and Dec, Tandem, Sun, HP, etc. took it on the chin. Now “smart phones” and tablets are killing off the decktop PC profit. (Microsoft’s attempt to sell a tablet as a quasi desktop shows this absurdity in action..)

Soon there simply will be no money to spend on $16 Billion Fab Plants to make quad core CPUs that sell for 5 ¢ each… and making a 4096 core CPU that doesn’t do anything for most folks to sell for $1000 just isn’t going to happen. At that point, we’re maybe able to still push Moore’s Law technically, but financially it will be at an end. With present Fab at about $8 Billion, and my R.Pi Quad Core selling for $35 for the whole computer, we can already see that point from here…

So something’s gotta give. Will it be GeAs chips at last? Or something more exotic? Who knows. But it can’t cost too much or it can’t compete with “good enough dirt cheap” silicon chips. It will be interesting to watch the Venture Capital driven highly competitive, totally not centrally planned, knife fight that is Silicon Valley as this shakeout comes.

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , , . Bookmark the permalink.

11 Responses to Moore’s End, The Economist, The Absurdity and the Ecstacy

  1. Larry Ledwick says:

    That first article is obviously trying to spin info to support an agenda and is way out in the weeds, as you pointed out.

    As Moore’s law runs into physical barriers on semiconductor size, we will as you point out make our gains in other areas (for a while). You can only lower operating voltages so far to control heat load without running into signal noise issues. That internal cooling system sounds like it could be a big improvement on heat limited systems.

    Right now where I work, they are working hard in other areas. When dealing with big data, cutting processing time by shifting to SSD disks has become important to reduced process times. The data read and write cycles are now a large fraction of the time involved in processing, as is the efficiency of the core supporting code like the data base and it’s ability to handle queries and serve the result fast enough to make best use of SSD storage.

    Like cray did back when you were younger, the physical layout of the chips will focus more and more on the geometrical limitations of near speed of light transfer of pulses.

    You can only move a bit (1 or 0) so fast and at some point minimum propagation distance is the only way you can cut transfer time. That leads me to suspect we will start to see spherical and cubical chips instead of flat form factor chips to minimize data transfer distance in the core and its memory caches.

    Each time we optimize one bottle neck, we reveal a new one somewhere else in the fetch data, process data, serve data cycle. Especially in systems which work with “big data” data base efficiency is becoming king. They start worrying about the internals of the db and how it handles certain queries, is it faster to do this with one table structure or another? As in all code, there are often several ways to get to the desired output data, but some involve significantly more processor overhead than others and now that overhead is beginning to matter.

    A few years ago when the processor spent 90% of its time waiting for a spinning disk to get the right sector under the head and to seek the head to the right track a lot of poor code design made no difference. Now with fast on chip memory caches and SSD storage, those wasted process cycles matter, especially in processes which are walking through tables with 100’s of millions of rows, and terabytes of data to parse.

    I think the wizard hat is going to be passed from the hardware chip guys to the code magicians who really really understand how an instruction is processed in their OS and data bases, and start tweaking instruction sets to minimize lost cycles due to poor code architecture.

    Like Admiral Grace Hopper used to say, nanoseconds matter. Physical distance between hardware nodes is now becoming important. We are beginning to see the high end stock market systems spending big money to shave microseconds off of time of flight from one exchange to another to gain an advantage to front run the market by milliseconds or microseconds. This sort of big dollar motivation will drive a new generation of switches, routers, and other systems to optimize for speed and not just reliability.

  2. BobN says:

    it is crazy to spend billions trying to improve processor speeds when system usage throws away so many cycles on wait states of unoptimized overheads. The bottleneck for most systems is the peripheral speeds. Just going to Flash allowed servers to greatly improve their IOPS which translates to more throughput and fewer servers required to get the job done. A simple SSD solution brought this about and caused a big jump in performance in the market. This was a baby step to what is coming. My moving Flash close to the servers and operating with processor busses, another huge jump will be seen. Interfaces that support queueing and remote DMA several orders of magnitude speed increases will be possible. In addition, new NVM memory is arriving that will offer 1000x speed increase over previous Flash.
    By memory architecture and bus changes continued big jumps in server performance will be seen. These type of changes will allow several generation of system development where the systems keep meeting moors laws. There are many things architecturally that are possible.

    The shift to system and architecture improvements will buy time for technology breakthroughs that will emerge and keep the pace of improvement on a pretty steady advance. Things like high-temperature superconduction and optical interfaces will allow for continued increases to the road map.

    My bet is that it does not flatten out but continues, at least at the system performance levels.

  3. John F. Hultquist says:

    We bought a VIC20 with a 6502A microprocessor and thus helped to push its sales over 1M.
    Now my computer is much more capable. However, we live at the end of a 10 miles from service DSL route — first iteration of that arrived in 2008, and it too now is better than it was. Then there is the inside the house wireless. Hey, where did the printer go? And from restarts or sleep the #2 screen comes back with the wrong aspect ratio.
    Point is, there are many ways the computing experience can be improved. Those with big needs are going to have to find a solution to security (hackers were once friendly sorts, now thee be crooks).

    Regarding: “triumph of central planning”
    Maybe the writer had Al Gore in mind.

  4. Oh my goodness. IBM seem to have taken my joke seriously.

  5. E.M.Smith says:


    We already hit the speed of light problem back in 1985 or so. Our Cray XMP-48 was one of those C shaped jobs. It was that shape so the wires from one part to another were on the inside of the C and a bit shorter. It was one clock from edge to edge. It could not be made bigger without some part clocking and communicating late.

    The Cray2, to be faster, had to be made smaller, thus the approx 3 foot diameter 6 inch thick wafer, but then was too hot, so got immersed in flourinert refrigerant.. thus the nickname “bubbles”. All dictated by speed of light and heat load. Now we are down to die sized chips doing the same thing and faced with the same problems. For multicore boards and large chips, they have had to go to non synchronized clock edge to edge…

    Per I/O, the old IBM and Amdahl mainframes were not very fast CPUs, but what they had was several very fast high capacity disk data channels. Since business processing moved lots of data per unit of computing, they had fast performance for things like reading a billing record, adding this month consumption, and writing it back out again. Sorted sequential access data files can be very fast… Now everything gets loaded into databases with indexes and searches, not always the fastest…

    Per software: Absolutely it will be key. My brother-in-law was Ph.D. Computational Aeronautics at NASA. He shared with me an interesting graph. One line sloped upward from lower left to upper right (log scale). It was improvement in computational throughput from Moore’s Law. Another line did the same, ending well above it. Computational throughput improvement from improved algorithms and coding.

    My corrolary to that was that “Bad programming can consume all of Moore’s Law, and then some. -E.M.Smith” which is why your Microsoft filled PC today, with quad cores, GB of memory, and dramatically faster IO Channels (faster than those old IBM mainframs) is roughly the same user experiance as 20 years ago and several Moore’s Law doublings. That is also why I love Unix and to a lesser degree Linux. The code bloat there has been much less.

    To get that gain usable in future years will require tossing out a whole lot of Object Oriented and Virtual Machine wasted cycles and overhead. There are a few orders of magnitude of performance available just by going Old School… but the labor cost will be high.


    Better IO helps some classes of problems like business processing, others not so much. Calculation intensive things least of all. That didn’t matter so much when it all doubled fast. Now it will matter again as folks need solutions tuned to their problem class.


    The folks I know like to call White Hats “hackers” and black hats “system crackers”. I know, not what they do in media land… but we know better who we are…



  6. Steve Crook says:

    > and why it is time to go back and clean up 30 years of crappy inefficient software
    Yes, problem is we’ve got a generation (or two) of developers used to ever increasing memory and CPU power and a world devoted to ‘frameworks’. It’s going to be an uphill struggle.

    I know next to nothing about the design and operation of a CPU, but I wonder if there are similar issues in the hardware, inefficient design because it “doesn’t matter” when there will always be gains elsewhere, and time to market is paramount…

    There are signs that changes are afoot, the new Vulcan graphics card API looks to be a step in the right direction and a tacit admission that graphics cards are also nearing the point we can rely on ever increasing power…

  7. John Silver says:

    Look beyond Von Neuman’s general purpose binary architecture.

  8. AlainCo says:

    The key to extending moore law is to changing the problem.

    I’ve been trained to microelectronic when the submicronic looked alien. It worked.
    superscalar was a revolution.

    why not testing processor in memory, cell automaton.

    A computer is not made to run programs, but to solve problems (today by running a program).
    Internet is already a huge network that solves huge problems, like informing me.

    I’ve seen mattric algebra changing with parallel programming, and recently Monte-Carlo have replaced gradient method, replacing cholesky triangularisation…

    in the 90s the “code couplig” gat fashion, and today “multi-physics” is on your desktop.

    the limit of simulation today is not in the computer but in the model, and finally in the data.
    Climate is a good example

  9. Larry Ledwick says:

    This just showed up on Drudge.
    Does anyone else smell magical thinking and a funding scam?
    Like they are getting people to pay big bucks to listen to exotic unachievable fairy tails?

    Statements like the below light up my warning board. More alarming is apparently the attendees found nothing worth commenting about this observation.

    Within the next decade, he said, self-driving cars will eliminate all driving fatalities.

  10. Soronel Haetir says:

    I’m waiting for the computers made out of something other than matter and that occupy something other than space.

  11. RobL says:

    Couple of drivers for further big speed improvements: 8 or 16k high (2-300Hz) refresh rate graphics rendering for VR will become a must have within a decade, currently mobile devices cannot even drive the relatively low-res occulus rift at desired speeds to eliminate motion sickness caused by latency issues.

    Deep learning. It is rapidly scaling up and being applied to ever more problems. It is truely massively useful for solving robotics, translation, and machine vision problems, because solutions can be developed with so little costly human input. Nvidia has jsut released a 15 billion transistor 600mm² 20 Tflop chip (would have been world’s fastest computer at Millenium) specifically aimed at deep learning – and the demands are only growing driven by things like self driving cars.

    So there is probably more impetus now for hardware improvement and computer power increases for consumers than there has been for 5-10 years (when desktops got mostly good enough to not bother upgrading further). I’d guess Intel and Nvidia are wetting themselves with excitement at profits to come.

Comments are closed.