Every computer has things it does better, and things it does worse. Handling integer math, floating point math, moving bytes ( I/O speed ) or doing character manipulation. So it goes. Old IBM 360 machines were actually slow processors, but had several high speed data channels to disk. IBM realized that the typical business transaction moved a large record, but only did math on one or a few fields. The old Cray did a block of 64 double precision math problems with one instruction, but was not so good at “scalar” problems or moving long records of bits. So trying to compare computers based on just one performance number is a bit daft.
You really MUST characterize the problem you are solving and then match that to the kind of computer prior to doing comparisons.
But I’m going to ignore that and look at Just One Number. The Dhrystone benchmark performance.
Dhrystone is a synthetic computing benchmark program developed in 1984 by Reinhold P. Weicker intended to be representative of system (integer) programming. The Dhrystone grew to become representative of general processor (CPU) performance. The name “Dhrystone” is a pun on a different benchmark algorithm called Whetstone.
With Dhrystone, Weicker gathered meta-data from a broad range of software, including programs written in FORTRAN, PL/1, SAL, ALGOL 68, and Pascal. He then characterized these programs in terms of various common constructs: procedure calls, pointer indirections, assignments, etc. From this he wrote the Dhrystone benchmark to correspond to a representative mix. Dhrystone was published in Ada, with the C version for Unix developed by Rick Richardson (“version 1.1”) greatly contributing to its popularity.
It isn’t ideal, but it’s good enough for most things and widely used. For a more complete set of benchmarks of many sorts, you can go wandering in the open benchmark site here:
Basically, what I’m going to do is look almost entirely at just those boards which have a tested production release of Devuan available. If folks don’t care about SystemD, then many more boards have a native Debian or Ubuntu port and the same exercise can be done for them. For now, for this look, I’m starting with the Devuan native boards. (IFF I don’t find anything I like, I can expand the search later).
These are all ARM CPU boards, and mostly all v7 or v8 instruction sets ( 32 bit armhf or 64bit arm64 ). But it isn’t just a 2 way split. Not all cores are created equal per MHz of clock.
A couple of decades back, many or most of these ‘tricks’ were reserved for high end CISC machines (Complex Instruction Set Computer). Now they are showing up in what is nominally a RISC (Reduced Instruction Set Computer) like the ARM. How many instructions can be decoded in parallel? How are the instructions “pipelined” so the next one is started while the last one isn’t done yet? Is hardware floating point fast, faster, 64 bit, 32 bit, or missing? Can a path be executed, then thrown away if not taken by a later test function? So all cores, even of what is nominally the same instruction set and architecture are not created equal. This page gives a rough Dhrystone factor for the various ARM chips along with some details about things like decode width, pipeline, Floating Point Unit (FPU) and out of order execution. At the far right is a “DMIPS/MHz” factor.
What I’m going to do is look up the chipset for each board, the CPU type, and the MHz, then “do the math”, for each of a bunch of boards that have a native Devuan available for download. I’m getting the chip set and MHz information from here:
The list of supported boards is from the Devuan site readme here:
I’m leaving out the Chromebooks as I’m not looking for a laptop right now, similarly the Allwinner Tablet was skipped, and skipping the Nokia phone. I’ve also skipped the “Lomobo R1″just because I’ve never heard of it… and the CubieTruck using the A83T was not in the comparison wiki. Note that many of these boards use the Allwinner A10 or A20 chips, so not that many distinct comparisons to make, really. Though other boards use the same H3 or Samsung chips with different MHz so some variations creep back in.
Currently supported images:
* Acer Chromebook (chromeacer)
* Veyron/Rockchip Chromebook (chromeveyron)
* Nokia N900 (n900)
* Odroid XU (odroidxu)
* Raspberry Pi 0 and 1 (raspi1)
* Raspberry Pi 2 and 3 (raspi2)
* Raspberry Pi 3 64bit (raspi3)
Allwinner boards with mainline U-Boot and mainline Linux can be booted
using the sunxi image, and flashing the according u-boot blob found in
the u-boot directory here. The filenames are board-specific, but this
file is commonly known as “u-boot-sunxi-with-spl.bin”.
Currently supported Allwinner boards:
* Olimex OLinuXino Lime (A10)
* Olimex OLinuXino Lime (A20)
* Olimex OLinuXino Lime2 (A20)
* Olimex OLinuXino MICRO (A20)
* Banana Pi (A20)
* Banana Pro (A20)
* CHIP (R8)
* CHIP Pro (GR8)
* Cubieboard (A10)
* Cubieboard2 (A20)
* Cubietruck (A20)
* Cubieboard4 (A80)
* Cubietruck Plus (A83t)
* Lamobo R1 (A20)
* OrangePi2 (H3)
* OrangePi Lite (H3)
* OrangePi Plus (H3)
* OrangePi Zero (H2+)
* OrangePi (A20)
* OrangePi Mini (A20)
* Allwinner-based q8 touchscreen tablet (A33)
All the A10 use the same MHz as do all the A20, so only one line of data for each. The general layout of the comparison is “chip set”, core count and architecture type, scaling factor (Dhst/MHz), then a relative performance number for that core type at that MHz, and a total for all the cores on the chip set. Finally, a list of the boards using that chip set to make figuring out who’s a A20 easier…
The Rel. Perf. is good for comparing how fast a monolithic task completes in one core (like some browser tasks). It tends to indicate how the board feels in terms of response in use. The Total is better for indicating how much gets done on long fully loaded tasks like running models or doing BOINC. The A15 cores have a range of relative performance from 3.5 to 4. I’ve just used 4 in the posting to keep the chart manageable.
Chip # x Type MHz Scale Rel Total Boards Set Factor Perf Rel Perf A10 1 A8 1000 2 2000 2000 Olimex Lime, CubieBoard A20 2 A7 1000 1.9 1900 3800 Olimex Lime2, Olimex Micro, Banana Pi & Pro, CubieBoard 2, CubieTruck, Orange Pi & Mini A80 4 A15 1300 4 5200 20800 CubieBoard4 4 A7 1300 1.9 2470 9880 30680 total for A80 H3 4 A7 1536 1.9 2918 11672 Orange Pi 2 & Plus H3 4 A7 1200 1.9 2280 9120 Orange Pi One & Lite H2 4 A7 1200 1.9 2280 9120 Orange Pi Zero R8 1 A8 1000 2 2000 2000 C.H.I.P. Broadcom - Raspberry Pi 2835 1 ARM11 700 1.25 875 875 R.Pi B+ 2836 4 A7 900 1.9 1710 6840 R.Pi M2 2837 4 A8 1200 2.3 2760 11040 R.Pi M3 Samsung 5410 4 A15 1700 4 6800 27200 Odroid XU 4 A7 1200 1.9 2280 9120 36320 total for Samsung 5410 Not Running Devuan 1.0 native, but via Armbian "Uplift": Amlogic 905 4 A53 1500 2.3 3450 13800 Odroid C2 805 4 A5 1500 1.57 2355 9420 Odroid C1+ Samsung 5422 4 A15 2000 4 8000 32000 Odroid XU4 4 A7 1400 1.9 2660 10640 42640 total for Samsung 5422
Now there you can see in one number just why that Odroid XU4 spent so much time doing nothing and was very crisp on web pages and such. The octo core chips are just monsters. More than 3 x the speed of the C2, and single core performance at about 3 x a Pi M3 core. To find some other board like it, running fairly nicely, well, that’s going to be hard, or expensive, or both.
Still, all I need is “fast enough” really. Given the Pi M3 on editing WordPress pages (lots of bytes sent back and forth and a bit of typeahead), I’m likely to need something closer to 3000 or 4000 single core speed. That’s mostly CubieBoards and Odroids. (Or I give up on Devuan or accept an x86 or make some other compromise with my principles).
This is the same reason I bought it and the C2 in the first place. Oh Well.
I suspect I mostly just need to do my “postmortem” as planned and then work on getting some code working rather than worry about building more “stuff”. I’ve got enough rig for the early stages of work and I can easily use something else as my “daily driver”. But “when the time comes”, knowing how many DMarks / $$$ you get for any given board is going to be a key number. Then those Dhrystones get divided by Dollars and you have yet another very interesting number. All of which needs leavening with things like “Float vs Integer” performance and “can I use the GPU?” along with “Can I get out all the heat at full load 24 x 7?” (so “effective Dhrystones over hours…) and “Is the I/O fast enough to keep the CPUs fed?”…
But that is how you evaluate the “buy” decision on buying more computes. How much do the computes cost, how many can you get done before the machine breaks or becomes obsolete, how much work to keep it up and running right. This is just a first rough cut number, but it lets you rapidly dump some options as “uninteresting”.