What is enough ARM CPU speed?

Generally speaking, the AMD / Intel class of processors run rings around the ARM class. This is fully to be expected as the Intel / AMD class have a whole lot more silicon doing a whole lot more stuff with a whole lot more power consumed in the process. They also cost a bundle. You are not going to get one of them for $5 and use it to make a $30 all up computer board.

What makes an ARM different from an x86?

The ARM CPUs are still no slouch. Traditionally a 32 bit RISC machine (Reduced Instruction Set Computer)that was about the same level of design as the 1980s or so Sun SPARC chips that were early RISC machines. But over the years RISC, too, has grown in complexity and added speed. The chips of today are quite fast, and with things like pipelined instructions I’m not so sure RISC really is accurate. They are no longer as simple as the original design goal of RISC.

The whole idea of RISC was to reduce dramatically the total number of transistors and gates and all needed to get a given level of performance by removing rarely used instructions at the hardware level. One could get order of magnitude reductions in complexity of the chips that way, at a far smaller loss of speed to compute; net a big win. But with silicon for a CPU chip now being an ever smaller part of the whole machine cost, and with speed gains slowing down, many of the more complicated bits of design are slowing being added into the traditionally RISC designs.

One simple example, the move to 64 bit machines. I remember the debates over RISC and the assertion that it was “worth it” to use 32 bit over 16 bit as the design goal but that 64 bit was way too much silicon for not enough gain. (Though a very strong case was made for 16 bit being ‘enough’ almost all the time and the better choice for RISC…) At the 64 bit point, you are packing 8 characters per computer word. You will spend a lot of time packing and unpacking words. It isn’t a “twice as good as 32” by a long shot since rarely do folks need “double precision math” which is where 64 bit shines but they often need 8 bit chars. Yet silicon is cheap enough now that even low cost chips (like the ARM) are moving toward 64 bit machines despite the doubling of the silicon needed for datapaths and registers and memory and all. A similar symptom of the cheapness of silicon is the move to multiple computer cores. Putting 2 or 4 computers in a chip package does not add much to the end cost in absolute dollars or euros.

So an ARM is a RISC chip, but much grown in speed and with word size large and getting larger. Multiple computer “cores” or CPUs in a single chip package becoming ever more common. ( In fact, for most uses, you get the whole computer system in one package called a SOC System On Chip or SOM System On Module – this matters a little when comparing things since you may have the same ‘cores’ in different SOC sets with a small difference in performance; but mostly it matters in that the SOC part may be named and you need to find out ‘what cores does it use?’ to get the speed data right. Especially on how it interacts with peripherals in terms of throughput and speed.)

Most of the time the Intel / AMD x86 ( or i386 or i686 or AMD_64 or IA_64 or all the other specific design specifiers) chips come packaged on their own in one giant part with a massive number of pins and a large heat sink needs to be glued to the top, often with a dedicated fan. All that power suck is why ARM cores dominate mobile (battery limited) devices like phones and tablets (though Intel is trying to make low power mobile cores, it is hard with a CISC design). ARM cores shipped 12 Billion in 2014 alone. That is more than one per person on the planet that year. More will be shipped this year, and more than one per person on the planet has been shipped since about 2010. That is a big market share.

There are also some SOC or SOM devices using older smaller sized x86 cores. I mention this just to reduce the number of “But but but… what about this Intel SOC or low power…” protests. Yes, they exist. Other than some specific embedded systems and a few laptops they don’t matter much. Worth watching for the future, but for a ‘roll your own cheap box’ not very relevant. I’d buy an x86 based Chromebox or Chromebook and convert it to Linux first. (And, in fact, I did buy one.)

The whole point here being that due to the low power used, the whole computer gets stuck in one small cheap package that doesn’t need things like fans and all. Mass produced, they run down into the single digit dollars per package. That is what makes the “under $50” computer system possible.

The Cheap Seats Computers

So how fast is an ARM chip today in the run of the mill dirt cheap boxes, and what can you do with it?

For comparison, the SPARC chips of the early 1990s ran at about 50 MHz reaching 200 MHz in the Ultra SPARC at the end of the 1990s. Compare that with the 1 GHz common in ARM chips today. About 10 x the speed just from the clock speed. Since the UltraSPARC was a fairly effective graphical workstation, one might expect “any old ARM will do” today. But it won’t. Software design has taken an inefficient path and the software run today sucks up MIPS (Million Instructions Per Second) or “computes” and network capacity to an astounding degree. So while a ‘first blush’ look might lead you to think any quad core ARM is plenty, in fact, it depends on which core and how fast (and what software…)

To run a text based workstation for traditional server type uses, even the $25 Raspberry Pi A board can be enough. It is the graphics that tend to chew up machines, and animation in particular (or video). So if you want basic servers to do things like DNS (Domain Name Service) lookups or DHCP address assignments or even a Torrent server, any old Raspberry Pi will do. ( I have a Model B+ with built in ethernet doing all those and more and the CPU on it is mostly idle.) That is an ARM v6 CPU at 700 MHz.

For full on graphics workstation with web pages and animations, I have a Raspberry Pi Model 2 that is ‘barely enough’. That is a quad core ARM v7 CPU cluster at 900 MHz. I’d rather have more. More on that below.

Sidebar On ARM Naming

The ARM family of chips has one of the most annoying naming conventions possible. Core designs have one set of numbers. Chips using them another set of numbers. SOC modules based on them another set of numbers. Bigger does not always mean better, and the same digits may be used for both uses. Sigh. I’m going to give a try at making it a little simpler to sort out for those who don’t want to spend their life learning about ARM Arcania.

First step is just to let go of the notion that a bigger number is better or faster. Inside any one type of use, maybe, but you have to make sure you are not comparing two different uses…

Architectures:

Each core has a design. These have a ‘v’ number. Like ARM v6 or ARM v7. They may have different instruction sets, different abilities. Mostly bigger is newer and better. Most of them are 32 bit. The newer announced ones are 64 bit. And there you get into an oddity. v8-A is 64 bit. v8.1-A is 64 bit. v8.R is 32 bit… But at the moment the last two are in the future…

The ones you care about are generally v6 and v7. The v6 is the older slower architecture used in the Raspberry Pi first models. It is no longer supported by folks like Ubuntu (so no Ubuntu on the R.Pi B+). The v7 is the bulk of what is shipping now. It is used in the R.Pi Model 2.

Easy, right?… Except..

The v6 architecture is embodied in specific core designs. These have their own number series (but without the v). So you have things like the very old ARMv3 architecture is used in the ARM6 and ARM7 cores. That v is highly important.

The v6 is used starting with the ARM11 core. It was also used for what was called the CORTEX design (though with an M attached as the ARMv6-M), in the ARM Cortex-M0 and ARM Coretex-M1. So is it a 6, an 11, or a 0 or 1? Well, yes…. The number is a confounder. And ‘Cortex’ is quasi-meaningful but a distractor. You need to know v6 to know what operating system works, and you need to know ARM11 when looking at the chips on the boards.

Now for the real fun…

The v7 is used in a BUNCH of core designs. ALL of them are “Coretex” so there is no ARMdigit. They are all ARM Cortex-letter_digit. The letter can be M, A, or R (and in each case the ‘architecture’ has the letter too… so ARMv7-M, ARMv7-A, ARMv7-R). The R is intended for Real Time, the M for microcontrollers (like in your car or dishwasher) and it is the A that you most likely will see. It is intended for “Applications”. Sigh.

So most of the time, things like ARM Cortex-A12 will show up. That is an ARMv7 architecture. And an ARM11 is not even close to the same speed, ability, or instruction set (being an old v6 type…) But at least there is no ARM Coretex-A11 near as I can tell…

There exists a zoo of extensions, additions, special instructions, etc. listed at that link for things like “Neon” that lets you do “Single Instruction Multiple Data” or SIMD instructions (mostly useful for graphics chips and signal processing in SOC / CPU designs that include that kind of GPU coprocessor) and “thumb” instructions that let you pack your code as 16 bits rather than taking a whole world. All nice, but not things you need to get sucked into.

For now, what really matters, is mostly just that the original R.Pi are all v6, the new one is v7 (as is just about everything else in the small hobbyist ARM computer on a board world), and many times for comparison with other boards you need to look at AMD Coretex-A( 9 vs 12 vs 15 vs etc.) to know who’s a what.

Sigh. Again…
( I don’t think you could make it more of a naming mess if you tried.)

Comparing Real World Experiences

I have 4 relevant boxes / boards to compare.

1) Raspberry Pi B+

Single core, v6 at 700 MHz with 512 MB memory.

2) Raspberry Pi Model 2

Quad core, v7 ARM Coretex-A7 at 900 MHz with 1 GB of memory.

3) Samsung Galaxy Note 10.1

Quad core, v7 ARM Coretex-A9 at 1.4 GHz with 2 GB of memory.

4) HP Chromebox CB1

For comparison with x86. Dual core Intel Celeron 2955U at 1.4 GHz with 2 GB memory (IIRC. It can be either 2 or 4).

Note that this CPU lists at $132 per the wiki today. That is about 3 x the price of the R.PiM2 entire board. So you can get a total of 12 ARM-Coretex-A7 cores with all the trimmings and memory installed in computer boards for the list price of just the CPU in this box… But it is a valid touchstone for comparison to x86 alternatives.

The Subjective Experience

1) The original Raspberry Pi is painful to use in graphics mode. Yes, it can be done and is “livable” for occasional use (so I’m OK with using the more ‘tuned’ and limited browsers on it for short searches for things while doing maintenance on it as a server). It is not at all sufficient for a “daily driver” unless you are just so dirt poor that you can not pop the extra $6 or so for the Model 2.

Personally, while I’m happy to keep mine in service for “networked headless server” duties, I don’t see much reason for me to buy anything less than the Model 2 “going forward”.

2) The Raspberry Pi Model 2 is almost enough. I’m using it at the moment to make this posting and have basically ‘lived on it’ for the last few weeks. It is “the minimum” you can get by with as a “daily driver” desktop. It works, but some things slowly wear on you. For starters, a lot of things you want to run use only one core. You generally don’t experience “4 x the performance” but more like 1.5 times the performance. Over time this might improve a little IF folks make more codes run in parallel. (And they actually gain performance, unlike this FORTRAN test.) But for now, Firefox like browsers ( IceApe / Seamonkey) run in one core. I often get to sit and wait, watching CPU “pegged” at 25% as IceApe does some page update, or whatever, maxing out one core. “Type ahead” then can be very noticed. ‘Load delay’ on one page when another is ‘saving the edits’ on a posting is very “in your face”. Yeah, you can live with it. But ideal, it isn’t.

Personally, while I’m happy to have it, and will use it from time to time as a ‘daily driver’, I think a faster equivalent box for $15 more (or maybe less more) will replace it in a few months. It will go on to other specific uses ( an interesting minimal GIStemp box and maybe some other climate codes… an improved Dongle Pi… a ‘special purposes’ browser box… a Raspberry Tails box… ) so won’t be wasted; but use it for more than light browsing (and especially the very chatty very inefficient WordPress blog edit process) and it is just a tiny bit too slow to live with for months time scales. On video feeds, it has a bit of ‘jitter’ as it works to keep up. Yes, you can watch the YouTube videos, but it isn’t the seamless liquid experience you really want. Likely fine on slow links where they limit you anyway, or for very occasional use. Might even be “just dandy” with special purpose media software builds. Might even get there for the generic Raspbian / IceApe after a bit more code polishing. Would be easier to toss an extra $15 at the problem…

I would rate it as the “bottom acceptable” for a workstation. You want more, but can work with it.

3) Samsung Galaxy Note 10.1. Really a different class of machine. At over $400 it isn’t a cheap machine. I mostly have it listed for the comparison on CPU that it gives. As a “Quad Core ARM” it is ‘head to head’ with the R.PiM2. But with a slightly hotter core design and with a lot more MHz.

It is liquid and smooth on videos and with good sound. It does web browsing just fine. Hard to compare the “chatty WordPress edit” experience as the input of anything on a tablet is painful. It also has clearly had some software tuning done to decrease background wasted cycles and power. (So, on the R.PiM2 I can see some ‘idle’ tabs doing updates while I’m in this page; on the Samsung using Firefox I have 80 tabs open at present. NONE of them are active unless they are the foreground window. So some “software tuning” could make the R.Pi performance go up a lot…)

I find I’m very happy to use it for browsing the web and watching videos. I’d rank this level of performance as “Just Dandy”. That, to me, says that a single ARM v7 core at about 1.4 GHz is likely all you need, and maybe with a bit of software tuning at most it can run massive browser window open load and full motion video with good stereo sound. So I think this sets the ‘lower bound’ on “just dandy” CPU at about an ARM Coretex-A9 1.4 GHz quad core. That lets us say that a v7 at 1 GHz or better is needed for acceptable single core performance and 1.4 GHz preferred. Anything above a Coretex-A9 is better.

4) The Chromebox. I’ve reviewed it before here. The short form is that it works Just Fine for video, sound, browsing, editing WordPress postings. Plenty of speed and never have seen it struggle. The only “issue” I have with it is the straightjacket of limitations on what you can do with it that comes from the Google Apps “everything on the internet” model and the complete lack of privacy from Google’s prying eyes (and through them the NSA / PRISM program).

In short, for the relatively low cost it has high performance and high speed. IMHO this demonstrated the degree to which 2 CISC x86 cores at 1.4 GHz can beat the pants off 4 AMD cores at 900 MHz. It is a very liveable space for things like browsing, watching youtube videos, Netflix at full HD (IF your network can keep up… where the R.PiM2 wants less than HD to have motion at all…) Having swapped between it and the R.PiM2 on and off over the last month, I’m now comfortable that I can “live without it” and will likely install Linux soon. That will give me the speed along with the flexibility I want. (Originally I bought it as an emergency replacement for the HP Laptop that died while ‘on the road’, with the intent that ‘someday’ it would become a Linux when the difficulty and risk of converting it, as it is not straight forward, would not matter. That day is fast approaching.)

I like it. I use it. It is comfortable. It will make a fine Linux box. At $180 it isn’t what I’d call really cheap… I think it does show that at the low end, a GHz range x86 is likely “enough”. FWIW, the old Compaq Evo is single core IIRC and not that fast and it does OK with things like Video (but on the edge of not enough… HD can make it struggle). So somewhere between the Evo and this dual core box is the ‘cut off’ for x86 with present Linux software.

The Future

With that base of experience, I would predict that the following two boards set the lower bound of really livable and the upper bound of “likely more than you needed”.

https://en.wikipedia.org/wiki/Cubieboard

The Cubieboard is something of a knockoff of the Raspberry Pi with more MHz, faster SATA I/O to a real disk and other better interfaces to the rest of the world. For making a “roll your own” workstation with more than “barely acceptable” performance, I think this level of performance is what folks will want ( or convert a Chromebox / Chromebook to Linux). The first system is likley ‘lower bound’, the second is likely just fine.

The “Maybe fast enough” would be this ARM Coretex-A7 at 1 GHz. While it is only 2 cores, I rarely get over 2 going on the R.PiM2 anyway. More MHz (with present single core software) matters more. That it has WiFi and Bluetooth built in makes a media station even easier to build. Note that “Allwinner A20” is the SOC used to embody the ARM cores and more. Yes, another series of numbers to learn… Note that this supports a SATA diskdrive. The SD cards are OK, but for big data moves, a real hard disk matters. Just being able to run real swap on it makes memory size issues evaporate. Then that 1000 Mb Gig-E ethernet is way fast (and better if making a home brew supercomputer Beowulf cluster ;-) Each of the Coretex-A7 cores is roughly the same as a core in the R.PiM2 but a little faster on MHz. Basically this has a whole lot more usable I/O oriented features, but loses 2 cores that the R.PiM2 rarely uses in a browser / workstation at present anyway.

Cubietruck (Cubieboard3)

The third version has a new and larger PCB layout and features the following hardware:

SoC: Allwinner A20
CPU: ARM Cortex-A7 @ 1 GHz dual-core
GPU: Mali-400 MP2
display controller: unknown, supports HDMI 1080p, no LVDS support
2 GiB DDR3 @ 480 MHz
8 GB NAND flash built-in, 1x microSD slot, 1x SATA 2.0 port (Hard Disk of 2,5″).
10/100/1000 RTL8211E Gigabit Ethernet
2x USB Host, 1x USB OTG, 1x CIR.
S/PDIF, headphone, VGA and HDMI audio out, mic and line-in via extended pins
Wi-Fi and Bluetooth on board with PCB antenna (Broadcom BCM4329/BCM40181)
54 extended pins including I²C, SPI
Dimensions: 11 cm × 8 cm

There is no LVDS support any longer. The RTL8211E NIC allows transfer rates up to 630–638 Mbit/s (sending while 5–10% idle) and 850–860 Mbit/s (receiving while 0–2% idle) when simultaneous TCP connections are established (testing was done utilising iperf with three clients against Cubietruck running Lubuntu)

To connect a 3.5″ HDD the necessary 12 V power can be delivered by a 3.5 inch HDD addon package which can be used to power the Cubietruck itself as well. Also new is the option to power the Cubietruck from LiPo batteries.

On my “someday lust list” is this one:

Cubieboard 4

On May 4, 2014 CubieTech announced the Cubieboard 4, the board is also known as CC-A80. It is based on an Allwinner A80 SoC (quad Cortex-A15, quad Cortex-A7 big.LITTLE), thereby replacing the Mali GPU with a PowerVR GPU. The board was officially released on 10 March 2015.

SoC: Allwinner A80
CPU: 4x Cortex-A15 and 4x Cortex-A7 implementing ARM big.LITTLE
GPU: PowerVR G6230 (Rogue)
video acceleration: A new generation of display engine that supports H.265, 4K resolution codec and 3-screen simultaneous output
display controller: unknown, supports:
microUSB 3.0 OTG

This is an 8 core board. There are 4 x Coretex-A7 cpus and 4 x Coretex-A15 cpus and the chips are smart enough to only fire up the higher performance cores if you need them. (That big.LITTLE thing). Also has a upscale GPU for even better graphics / video. I note in passing the USB 3.0 that is not on the Raspberry Pi and that would benefit me when doing data maintenance on a couple of TB USB disk…

http://www.amazon.com/Waveshare-Cubieboard-High-Performance-PC-Development/dp/B00T2YFQSG/ref=sr_1_2

Puts the price at about $140 that starts to make a Chromebook / Chromebox look cheaper after the rest of the package is figured in… It also claims that the A7 cores run at 1.3 GHz while the A15 run at 2 GHz. Hopefully that is correct.

But what is a Coretex-A15 and how does it compare with a Coretex-A7 or the Coretex-A9?

https://en.wikipedia.org/wiki/ARM_Cortex-A15

ARM has claimed that the Cortex A15 core is 40 percent more powerful than the Cortex-A9 core with the same number of cores at the same speed. The first A15 designs came out in the autumn of 2011, but products based on the chip did not reach the market until 2012.

I hate phrases like “more powerful”. Is that faster process per clock? Wider data? Fancy instructions not used by any actual OS? But assuming it actually does reflect in real life somehow… Also note that while there are are 8 cores, you may only get 4 of them going at one time depending on how things are done. From that big.little link above:

ARM big.LITTLE is a heterogeneous computing architecture developed by ARM Holdings, coupling relatively slower, low-power processor cores (LITTLE) with relatively more powerful and power-hungry ones (big). Typically, only one “side” or the other will be active at once, but since all cores have access to the same memory areas, workload can be swapped from big to LITTLE and back on the fly. The intention is to create a multi-core processor that can adjust better to dynamic computing needs and use less power than clock scaling alone. ARM’s marketing material promises up to a 75% savings in power usage for some activities.

This is more about power savings than about 8 core speed.

https://en.wikipedia.org/wiki/Allwinner_Technology

Has specs on the Allwinner SOC designs.

Now my presumption (and the truth only becomes known once I’ve tested on of these) is that the 40% faster from the CPU has some truth in it, and that the faster 1.3 MHz will matter a lot. To the extent that is true, the “CubieTruck 4” ought to make a fine general purpose graphics oriented workstation even for significant users. 1300/900 = about 1.45 times the performance per A7 core and 3 x / A15 core compared to the R.PiM2 and that would be enough to make the experience pretty livable, IMHO. IFF it is real and achievable.

I also suspect that better more efficient software builds would do even more, but that’s my hobby horse to ride… most of the “industry” seems quite happy to throw away “computes” on Java craplettes and Java VMs inside an OS inside a VM inside a hypervisor inside hardware… and to use object oriented “load the kitchen sink” languages and libraries. IMHO it is quite likely that a well designed “load to RAM” live-CD type Linux would just fly on the Raspberry Pi Model 2; especially if built with better use of parallel threads. But if wishes were fishes…

https://en.wikipedia.org/wiki/Comparison_of_single-board_computers

Has an extensive and less subjective comparison of single board computer specs. Just watch out for the gotchas of software efficiency and how parallel is the code.

In Conclusion

Given that I already have the R.PiM2 and I’m not going to be tossing $140 at a Cubieboard 4 any time soon: My next steps will be to try getting a “live-build to ram” type Linux running on the R.PiM2 and see if some tuning can speed things up to “livable long term”.

After that, most likely making the ChromeBox dual boot with Linux is next on the list. For now, the best “bang for the buck” secure desktop system is still to find an old (non-UEFI boot…) PC and replace Windows on it. I got a 64 bit AMD box running at 1.8 Ghz for about $70 used. It makes a fairly fine and fast general purpose workstation, the only real annoyance being that it has a loud fan (my bad for not noticing when I first saw it running…). Running one of these from a Live-CD type boot just flies as the whole OS loads into RAM.

Over time, speeds increase. I’m quite certain that as soon as a year from now there will be A15 boards for very cheap. It will likely take me longer than that to finish all the things I have planned with the hardware I already have.

Finally, for folks NOT doing WordPress page editing, you will likely not notice that the R.PiM2 has any real slow to it at all. For general browsing, it’s been livable and sometimes nice. But for WordPress edits, it’s a bit of a pain since that process is VERY internet chatty and very much a CPU user as it constantly scans the text for spelling and does periodic saves. Just don’t expect to watch a Lady Gaga Youtube video in HD in IceApe under Raspbian…

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , . Bookmark the permalink.

3 Responses to What is enough ARM CPU speed?

  1. M Simon says:

    The one thing I do not like about the ARM is that it is tuned for “C”. One and one half stacks. Making it a two stack machine would be trivial. And no hit to speed and a trivial (near zero) silicon hit.

    “C” is just horrible in terms of code bloat. Not to mention that its idea of structures is quite complicated. And you are totally at the mercy of the compiler designers. I can’t tell you the number of times I have been hit with compiler gotchas in that language whenever I do “C” projects (you have to pay me).

    I’m using an ARM with 256K of Flash for hardware control (machine) stuff. And my Forth code just rattles around in that HUGE space.

    And “C” encourages badly written code. Because of its stack thrash from a call it is more clock efficient to write big modules with few calls. You are supposed to write small modules which call other small modules. The hit for a call in Forth is small. So you write small well tested modules. “C” complicates and multiplies the problems of testing because of its structure.

    When ever I have competed with a team doing “C” my Forth team gets the job done in about 1/10th the calendar time. It really is amazing to watch 99.9% of industry taking such a productivity hit. With the excuses: “Everybody does it that way” – “We have always done it that way.” – “Forth is so weird” – “Where would we be without type checking and casting?” – “There is no limit to what you can do to a set of bits”

    And one little added bit of “humor”. The Forth compiler is so small that we do not have a separate compiler running on a “BIG” machine and then load the code on the little machine. We put the compiler (16 K bytes) on the little machine and just feed it a text file. Or try stuff directly from the keyboard.

  2. p.g.sharrow says:

    @M Simon; The concept of Forth was first explained to me in 1987 and it seemed to me that it was the way to go, but, machine control has been the only applications that I have been aware of. At about the same time introduced to Linux 0.9 and X-Windows same problem, No applications that I needed, so I was stuck with mastering MSDOS 3 to 5 and WORKS, also ACAD in MSDOS. I am not a coder or computer geek, just a guy that has had to master the damn things to do my own work. Since you are a coder maybe you can show us how Forth can be used to advance this project…pg

  3. Pingback: RaTails – Draft High Level Steps | Musings from the Chiefio

Comments are closed.