Some Benchmarks For SBCs

Since I’ve got them, I thought it might be interesting to do some benchmarks on my various Single Board Computers (SBCs) along with an old Pentium box for comparison. My subjective evaluation is that the old Compaq Evo is faster than a Raspberry Pi M3, but slower than an Odroid N2 or a Pine RockPro64, and maybe a bit slower than an Odroid XU4. The benchmarks will tell though.

As this will involve moving from board to board, and that means a lot of swapping hardware, rather than save all this in some text file I’m going to put the results in this posting. Once I have a couple in it, then I’ll publish it. Added boards will be done as updates here (though I’m not going to bother flagging them as updates).

The Test Software

UPDATE / NOTICE:

The behaviour of sysbench changed with the 1.0 release. I have re-done the benchmarks to move them to a common basis. This web page details how to change the behaviour back to the prior version:

https://github.com/akopytov/sysbench/issues/140

@coding-horror to get it to run using the previous behavior, you’ll want to add –time=0 to run for an unspecified time, and –events=10000 to run for 10k events (which is what previous versions had used as a limiter)

Nothing like having the fundamental nature of your benchmark program changed to mess up your comparisons…

The test software is Sysbench. This is installed with “apt-get install sysbench” as all of these are running some kind of Debian derivative (including Ubuntu and Devuan).

I’ve made a script in /usr/local/bin named “bench” to conduct the tests. It is a modified version of the examples at this site:
https://www.howtoforge.com/how-to-benchmark-your-system-cpu-file-io-mysql-with-sysbench

Mostly I’ve just cut down the number of iterations as I suspect their values are way too large for a completion on the Raspberry Pi in a reasonable period of time. Also, for the file I/O, as the Odroid N2 has a nearly full eMMC, I’ve made it much smaller so that I don’t fill up the root disk. This risks measuring memory file cache a bit too much, but is likely representative of what people will experience day-to-day as few of us sling more than 100MB at a time. I have also not done the database benchmarks as my first few attempts did not get it to run. I think the login method may be different since that page was written.

Note that the “read $ANS” statements just cause the script to pause until you hit enter so you can read the results and then let it go to the next step.

Here’s the script (Now updated to the new version). Note that the old version of sysbench barfs on the new flags of –time and –events, so those must be left out on them. That’s right, you can no longer run the same script to test two different computers.:

root@odroid:~# bcat bench
echo
echo "First do a single core"
echo
time sysbench --test=cpu --time=0 --events=10000 --cpu-max-prime=20000 run
echo
echo -n "Done with test: "
read $ANS
echo
echo "Now doing multicore"
echo
time sysbench --test=cpu --time=0 --events=10000 --num-threads=6 --cpu-max-prime=20000 run
echo
echo -n "Done with test: "
read $ANS
echo
echo "Then some I/O"
echo
echo "Prep it"
echo
echo "File System Status?"
df /
echo
time sysbench --test=fileio --file-total-size=100M prepare
echo
echo "File System Status?"
df /
echo
echo
echo "Run Files Test"
echo
#time sysbench --test=fileio --file-total-size=150G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run
echo "Using one the OS likes instead"
echo
time sysbench --test=fileio --file-total-size=100M --file-test-mode=rndrw --max-time=300 --max-requests=0 run
echo
echo "File System Status?"
df /
echo
echo
echo -n "Done with test: "
read $ANS
echo
echo "Then cleanup"
echo
time sysbench --test=fileio --file-total-size=100M cleanup
echo
echo "File System Status?"
df /
echo
echo
echo -n "Done with test: "
read $ANS
echo
echo "And the database.  First set-up, then run."
echo
sysbench --test=oltp --oltp-table-size=1000000 --db-driver=mysql --mysql-db=test --mysql-user=root --mysql-password=LetMeIn! prepare
echo
echo "Done with setup, so do the run"
echo
sysbench --test=oltp --oltp-table-size=1000000 --db-driver=mysql --mysql-db=test --mysql-user=root --mysql-password=LetMeIn! --max-time=60 --oltp-read-only=on --max-requests=0 --num-threads=6 run
echo
echo "And the cleanup"
echo
sysbench --test=oltp --db-driver=mysql --mysql-db=test --mysql-user=root --mysql-password=yourrootsqlpassword cleanup
echo
echo "All this lifted from:"
echo " https://www.howtoforge.com/how-to-benchmark-your-system-cpu-file-io-mysql-with-sysbench "
echo "then modified a tiny bit"
echo

Odroid N2

First up, and I expect to be the fastest, the Odroid N2. This is a brand new product that uses A73 cores. These are faster than the A72 cores in the RockPro64 and also make less heat, while it has a giant heat sink covering one whole side of the board. It is asserted that this board does not heat limit, and I believe it.

This is a hex core system with 4 BIG cores and only 2 little of the A53 type. It has 4 GB of memory, but memory ought not be limiting in these benchmarks. On this system I even ran them with the browser open and editing this script as nothing rolls to swap and the browser load on a single low page-weight page is negligible. It might amount to 1/4 of one core and that would only show up on the 6 thread CPU benchmark, maybe.

Since I’m doing the I/O benchmark against the eMMC card, this doesn’t test the USB 3.0 performance. I’ll need to work up a different I/O test for that (after I figure out where sysbench puts the files it is making in an I/O test ;-) I’ll bold some of the more interesting bits. You will note I put the “time” command in front of the “sysbench” command. That returns elapsed time, user CPU and system CPU time. These ought to be very close to what is reported by sysbench and I may remove them in future runs if they are just redundant.

UPDATE: These times have now been updated with the changes to run like the older version.

Single Core compute of primes:

root@odroid:~# bench

First do a single core

WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   689.44

General statistics:
    total time:                          14.5030s
    total number of events:              10000

Latency (ms):
         min:                                  1.44
         avg:                                  1.45
         max:                                  7.84
         95th percentile:                      1.47
         sum:                              14494.25

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   14.4942/0.00


real	0m14.573s
user	0m14.512s
sys	0m0.028s

Done with test:

Then Multi-core. For this I’ll set number of threads equal to the number of cores on each board to give an idea what the max ability of the board is if all the cores are fully loaded.

Now doing multicore

WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
WARNING: --num-threads is deprecated, use --threads instead
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 6
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  3446.58

General statistics: 
    total time:                          2.9000s
    total number of events:              10000

Latency (ms):
         min:                                  1.44
         avg:                                  1.74
         max:                                 17.46
         95th percentile:                      2.66
         sum:                              17353.10

Threads fairness:
    events (avg/stddev):           1666.6667/416.62
    execution time (avg/stddev):   2.8922/0.01


real	0m2.941s
user	0m17.092s
sys	0m0.012s

Done with test: 

It is a little strange to get used to things like the total elapsed time being 10 seconds, but the total CPU time being almost 60 seconds. That’s 59 CPU seconds spread over 6 CPUs, or about 10 seconds / CPU.

Also note that on some systems with the right combination of 64 bit word size and an operating system that knows about the NEON GPU instructions, this particular benchmark can run in the GPU as well, and then you get very unexpected results where a given CPU is much slower than another, but finishes in 1/10th the time. That’s because the GPU was doing the math, not the CPU. Unfortunately, the benchmark doesn’t clue you in to that. I’ve seen one video (benchmark on “Explaining computers . com” where the video has the guy saying “That can’t be right” when a board finished in 2 seconds what per CPU stats ought to have been 10 to 20. He just didn’t know it was using the GPU…)

And then the file I/O test;

Then some I/O

Prep it

WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

128 files, 800Kb each, 100Mb total
Creating files for the test...
Extra file open flags: 0
Creating file test_file.0
Creating file test_file.1
Creating file test_file.2
Creating file test_file.3
Creating file test_file.4
[...]
Creating file test_file.124
Creating file test_file.125
Creating file test_file.126
Creating file test_file.127
104857600 bytes written in 19.50 seconds (5.13 MiB/sec).

real	0m19.562s
user	0m0.032s
sys	0m0.856s

The actual file I/O test first failed with a statement that an option was no longer available, so I removed that option. This is noted in the script with the original form commented out and the new one active.

File System Status?
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/mmcblk0p2   7251432 6363236    583256  92% /


Run Files Test

Using one the OS likes instead

WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
WARNING: --max-time is deprecated, use --time instead
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: 0
128 files, 800KiB each
100MiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      261.19
    writes/s:                     174.13
    fsyncs/s:                     556.91

Throughput:
    read, MiB/s:         4.08
    written, MiB/s:      2.72

General statistics:
    total time:                          300.0005s
    total number of events:              297677

Latency (ms):
         min:                                  0.01
         avg:                                  1.00
         max:                                839.93
         95th percentile:                      1.67
         sum:                             298333.59

Threads fairness:
    events (avg/stddev):           297677.0000/0.00
    execution time (avg/stddev):   298.3336/0.00


real	5m0.058s
user	0m2.268s
sys	0m19.268s

File System Status?
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/mmcblk0p2   7251432 6363236    583256  92% /

I left in the –test option as it just complains (and was needed on some old OS images I tried a couple of years ago) but it looks like –init-rng is just lethal now.

Then cleanup

WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

Removing test files...

real	0m0.097s
user	0m0.020s
sys	0m0.056s

File System Status?
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/mmcblk0p2   7251432 6259016    687476  91% /

Raspberry Pi Model 3

These results really surprised me. I looks like Ubuntu on the Odroid XU4 doesn’t know how to use NEON while Devuan on the Raspberry Pi does. I’m not sure exactly now to interpret these numbers, but it did finish damn fast. Some of the values for “events” look quite low, then again I’m not real sure what an “event” is, so some small “Dig Here!” needed to figure out how to interpret all this. As I recall it, the video also had Debian as fast (due to NION) and Ubuntu slower (as it didn’t use the GPU / NEON instructions).

For the file I/O tests, I found that it was writing the files in my home directory, which is also where I was running it. I suspect you can change what I/O you are testing just by running it in different directories on different file systems. In this case, an older Seagate 500 GB USB 2.0 drive and an xfs file system.

UPDATE: This first set of results is with the new script setting primes=20000.

chiefio@PiM3Devuan2:~$ bench

First do a single core

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          30.7532s
    total number of events:              10000
    total time taken by event execution: 30.7484
    per-request statistics:
         min:                                  3.05ms
         avg:                                  3.07ms
         max:                                  3.45ms
         approx.  95 percentile:               3.10ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   30.7484/0.00


real	0m30.768s
user	0m30.728s
sys	0m0.005s

Done with test: 

Now doing multicore

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          7.7866s
    total number of events:              10000
    total time taken by event execution: 31.1332
    per-request statistics:
         min:                                  3.05ms
         avg:                                  3.11ms
         max:                                 26.16ms
         approx.  95 percentile:               3.12ms

Threads fairness:
    events (avg/stddev):           2500.0000/13.91
    execution time (avg/stddev):   7.7833/0.00


real	0m7.802s
user	0m30.698s
sys	0m0.012s

Done with test:

Here’s the I/O portion:


Then some I/O

Prep it

File System Status?
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/sda3      393024000 193319136 199704864  50% /SG500/xfs


sysbench 0.4.12:  multi-threaded system evaluation benchmark

128 files, 800Kb each, 100Mb total
Creating files for the test...

real	0m11.013s
user	0m0.015s
sys	0m1.057s

File System Status?
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/sda3      393024000 193421572 199602428  50% /SG500/xfs


Run Files Test

Using one the OS likes instead

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
128 files, 800Kb each
100Mb total file size
Block size 16Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!
Time limit exceeded, exiting...
Done.

Operations performed:  38160 Read, 25440 Write, 81282 Other = 144882 Total
Read 596.25Mb  Written 397.5Mb  Total transferred 993.75Mb  
(3.3122Mb/sec)
  211.98 Requests/sec executed

Test execution summary:
    total time:                          300.0313s
    total number of events:              63600
    total time taken by event execution: 3.4174
    per-request statistics:
         min:                                  0.01ms
         avg:                                  0.05ms
         max:                                  0.97ms
         approx.  95 percentile:               0.10ms

Threads fairness:
    events (avg/stddev):           63600.0000/0.00
    execution time (avg/stddev):   3.4174/0.00


real	5m0.048s
user	0m0.752s
sys	0m11.914s

Done with test: 

Then cleanup

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Removing test files...

real	0m0.133s
user	0m0.007s
sys	0m0.126s

Done with test: 

Well, on to the next board. I’ll start editing out more of the “hand holding” statements from the script in the interests of making things more readable from here on down; now that you know what’s happening around the bits that matter.

Odroid XU4

With 8 cores, and a high clock rate, this CPU is very fast. Yet since sysbench doesn’t use the NEON codes on 32 bit machines, and this one is 32 bit, you can see it is not as fast as the Pi at computing primes. I probably ought to break this out as a distinct posting about “Considerations in benchmarks”, but for now this will have to do.

Do realize that most of what the “average person” does is shoving text bytes around and a pure math benchmark does not measure that. The XU4 is remarkably faster and more comfortable editing web pages than the Raspberry Pi M3, for example. For text manipulation, word size and GPU math don’t do much, while clock speed and cache does a lot.
.
I think I need to do another benchmark in another posting as well. Some code that doesn’t use the NEON instructions on any of them. I think I need to read the manual page on sysbench and see if it has other choices…

UPDATE: These results are for the new version with 20000 prime limit.

chiefio@odroidxu4:~$ bench

First do a single core

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          224.0713s
    total number of events:              10000
    total time taken by event execution: 224.0558
    per-request statistics:
         min:                                 22.33ms
         avg:                                 22.41ms
         max:                                 30.49ms
         approx.  95 percentile:              22.46ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   224.0558/0.00


real	3m44.362s
user	3m43.757s
sys	0m0.040s

Done with test: 

Now doing multicore

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 8

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          45.7562s
    total number of events:              10000
    total time taken by event execution: 365.9369
    per-request statistics:
         min:                                 22.48ms
         avg:                                 36.59ms
         max:                                 61.36ms
         approx.  95 percentile:              50.30ms

Threads fairness:
    events (avg/stddev):           1250.0000/328.64
    execution time (avg/stddev):   45.7421/0.01


real	0m45.772s
user	6m3.895s
sys	0m0.096s

The Odroid XU4 has a USB 3.0 port, and I plugged in a USB 3.0 drive for these tests. It IS faster, but not by much. No idea why. I used a Seagate 2 GB drive that is newer and ought to be faster than the 500 MB one on the R. Pi. But that’s why we do benchmarks. To find out what is really limiting.

Might there be some way to tune this up? Would adding more threads change it? Don’t know. I’ll need to play around with some settings and see what changes.

The big “takeaway” for me is just that the Pi M3 with a USB disk on it isn’t all that bad at I/O. Yeah, a USB 3.0 board is faster, but then other things start to limit.

Then some I/O

Prep it

sysbench 0.4.12:  multi-threaded system evaluation benchmark

128 files, 800Kb each, 100Mb total
Creating files for the test...

real	0m7.083s
user	0m0.009s
sys	0m1.139s


Run Files Test

Using one the OS likes instead

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
128 files, 800Kb each
100Mb total file size
Block size 16Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!
Time limit exceeded, exiting...
Done.

Operations performed:  47160 Read, 31440 Write, 100583 Other = 179183 Total
Read 736.88Mb  Written 491.25Mb  Total transferred 1.1993Gb  
(4.0937Mb/sec)
  262.00 Requests/sec executed

Test execution summary:
    total time:                          300.0033s
    total number of events:              78600
    total time taken by event execution: 5.3348
    per-request statistics:
         min:                                  0.02ms
         avg:                                  0.07ms
         max:                                  1.14ms
         approx.  95 percentile:               0.09ms

Threads fairness:
    events (avg/stddev):           78600.0000/0.00
    execution time (avg/stddev):   5.3348/0.00


real	5m0.016s
user	0m1.368s
sys	0m18.610s

File System Status?
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/mmcblk1p1  30335916 3007948  26990180  11% /


Done with test: 

Then cleanup

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Removing test files...

real	0m0.140s
user	0m0.004s
sys	0m0.134s

Done with test: 
chiefio@odroidxu4:~$

Well, that’s what I’ve got done for now. I’m going to post this, then spend some time looking at it, thinking about better ways to benchmark, and then do some more boards tomorrow. I also need to spend a bit of time learning how best to interpret these results. While I think I know what it means, sometimes benchmarks have complexities that need a ponder (and reading the man page more than once ;-)

RockPro64

This is also a hex core machine, but with 4x A72 / 2x A53 cores at 1.81 GHz / 1.54 GHz. It has 2 GB memory, so rarely to never uses swap space. The file I/O test was done to the uSD card, a fast Samsung card that works well. Eventually I’ll work up an “Apples to Apples” test with the same disk on USB 3.0 for each of the fast boards, but not today ;-)

This board is running a straight Armbian Ubuntu port based on Debian. It is a new port and a little buggy still. Simple things like the HTOP gives CPU frequency for all 6, but only has “usage bars” for 4 CPUs. I’m sure that will change in a future update.

This one also has the “new” sysbench so gets the flags to make it run like the old one.

Here’s the results:

ems@rockpro64:~$ bench

First do a single core

WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   161.73

General statistics:
    total time:                          61.8253s
    total number of events:              10000

Latency (ms):
         min:                                  6.17
         avg:                                  6.18
         max:                                  6.51
         95th percentile:                      6.21
         sum:                              61805.32

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   61.8053/0.00


real	1m1.887s
user	1m1.836s
sys	0m0.028s

Done with test: 


Now doing multicore

WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
WARNING: --num-threads is deprecated, use --threads instead
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 6
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   634.26

General statistics:
    total time:                          15.7602s
    total number of events:              10000

Latency (ms):
         min:                                  6.17
         avg:                                  9.45
         max:                                 50.22
         95th percentile:                     18.28
         sum:                              94483.16

Threads fairness:
    events (avg/stddev):           1666.6667/107.80
    execution time (avg/stddev):   15.7472/0.01


real	0m15.818s
user	1m1.836s
sys	0m0.068s

Done with test:
Then some I/O

Prep it

File System Status?
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/mmcblk0p1  30398732 13525736  16525556  46% /

WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

128 files, 800Kb each, 100Mb total
Creating files for the test...
Extra file open flags: 0
Creating file test_file.0
Creating file test_file.1
Creating file test_file.2
Creating file test_file.3
Creating file test_file.4
[...]
Creating file test_file.124
Creating file test_file.125
Creating file test_file.126
Creating file test_file.127
104857600 bytes written in 26.80 seconds 
(3.73 MiB/sec).

real	0m26.854s
user	0m0.040s
sys	0m0.760s

File System Status?
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/mmcblk0p1  30398732 13628136  16423156  46% /


Run Files Test

Using one the OS likes instead

WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
WARNING: --max-time is deprecated, use --time instead
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: 0
128 files, 800KiB each
100MiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      390.15
    writes/s:                     260.10
    fsyncs/s:                     832.12

Throughput:
    read, MiB/s:                  6.10
    written, MiB/s:               4.06

General statistics:
    total time:                          300.0286s
    total number of events:              444764

Latency (ms):
         min:                                  0.01
         avg:                                  0.67
         max:                               1413.01
         95th percentile:                      2.26
         sum:                             298916.93

Threads fairness:
    events (avg/stddev):           444764.0000/0.00
    execution time (avg/stddev):   298.9169/0.00


real	5m0.101s
user	0m1.484s
sys	0m12.412s

File System Status?
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/mmcblk0p1  30398732 13628136  16423156  46% /


Done with test: 

Then cleanup

WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

Removing test files...

real	0m0.180s
user	0m0.036s
sys	0m0.140s

File System Status?
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/mmcblk0p1  30398732 13525748  16525544  46% /


Done with test:

All this lifted from:
 https://www.howtoforge.com/how-to-benchmark-your-system-cpu-file-io-mysql-with-sysbench 
then modified a tiny bit

ems@rockpro64:~$

So Damn Fast but not as fast as the Odroid N2, and clearly not using the NEON GPU instructions like the Raspberry Pi.

Odroid C2

This is a quad core A53 cores SBC running at 1.54 GHz with 2 GB of memory. It is 64 bit A8 instruction set, and it looks like it uses the NEON GPU instructions as well. This is on the Armbian Debian operating system.

File I/O is to the uSD card, a fast Sandisk one. This run used the older sysbench code as it is sysbench 0.4.12.

ems@odroidc2:~$ bench

First do a single core

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          23.6381s
    total number of events:              10000
    total time taken by event execution: 23.6351
    per-request statistics:
         min:                                  2.35ms
         avg:                                  2.36ms
         max:                                  7.94ms
         approx.  95 percentile:               2.37ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   23.6351/0.00


real	0m23.658s
user	0m23.644s
sys	0m0.004s

Done with test: 

Now doing multicore

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          6.0685s
    total number of events:              10000
    total time taken by event execution: 24.2669
    per-request statistics:
         min:                                  2.35ms
         avg:                                  2.43ms
         max:                                 31.14ms
         approx.  95 percentile:               2.43ms

Threads fairness:
    events (avg/stddev):           2500.0000/37.24
    execution time (avg/stddev):   6.0667/0.00


real	0m6.081s
user	0m23.970s
sys	0m0.012s

Done with test: 

Then some I/O

Prep it

File System Status?
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/mmcblk1p1  13193136 4195920   8711444  33% /

sysbench 0.4.12:  multi-threaded system evaluation benchmark

128 files, 800Kb each, 100Mb total
Creating files for the test...

real	0m12.458s
user	0m0.023s
sys	0m0.990s

File System Status?
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/mmcblk1p1  13193136 4298320   8609044  34% /


Run Files Test

Using one the OS likes instead

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
128 files, 800Kb each
100Mb total file size
Block size 16Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!
Time limit exceeded, exiting...
Done.

Operations performed:  133620 Read, 89080 Write, 284952 Other = 507652 Total
Read 2.0389Gb  Written 1.3593Gb  Total transferred 3.3981Gb  
(11.599Mb/sec)
  742.33 Requests/sec executed

Test execution summary:
    total time:                          300.0011s
    total number of events:              222700
    total time taken by event execution: 11.2828
    per-request statistics:
         min:                                  0.01ms
         avg:                                  0.05ms
         max:                                 14.28ms
         approx.  95 percentile:               0.09ms

Threads fairness:
    events (avg/stddev):           222700.0000/0.00
    execution time (avg/stddev):   11.2828/0.00


real	5m0.026s
user	0m2.366s
sys	0m21.125s

File System Status?
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/mmcblk1p1  13193136 4298320   8609044  34% /


Done with test: 

Then cleanup

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Removing test files...

real	0m0.128s
user	0m0.012s
sys	0m0.115s

File System Status?
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/mmcblk1p1  13193136 4194480   8712884  33% /


Done with test: 

All this lifted from:
 https://www.howtoforge.com/how-to-benchmark-your-system-cpu-file-io-mysql-with-sysbench 
then modified a tiny bit

That I/O performance is way impressive. My guess would be that it is a mix of fast I/O channels, a fast uSD card, and lots of memory cache.

Pentium x86 Compaq Evo

While the bios says 4 core Pentium, I think this is really just a single core at 2.4 GHz and the bios is just saying it can handle up to 4 cores. It has 2 GB of memory and an old real disk in it. PCI bus I think. It is from about 12 years ago. I could likely get that closer to real, but what’s the point? It shipped with XP Professional on it. It is a 32 bit x86 architecture.

I went ahead and did one test with 4 threads anyway. It was 10 seconds faster, or about 11% faster, than a single thread. I think that confirms it’s only one core.

Performance is very smooth, but “sedate” at times. It does give a good comparison for deciding how the SBCs stack up to “some old used x86 box” from a friends garage.

It is running straight Devuan version of Debian, upgraded to ASCII (Debian 9) code.

chiefio@EVOdebian:~$ bench

First do a single core

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          113.0809s
    total number of events:              10000
    total time taken by event execution: 113.0577
    per-request statistics:
         min:                                  9.61ms
         avg:                                 11.31ms
         max:                                 84.86ms
         approx.  95 percentile:              18.86ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   113.0577/0.00


real	1m53.116s
user	1m39.232s
sys	0m0.092s

Done with test: 

Now doing multicore

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          103.8437s
    total number of events:              10000
    total time taken by event execution: 415.2937
    per-request statistics:
         min:                                 21.85ms
         avg:                                 41.53ms
         max:                                 86.14ms
         approx.  95 percentile:              55.82ms

Threads fairness:
    events (avg/stddev):           2500.0000/0.00
    execution time (avg/stddev):   103.8234/0.01


real	1m43.857s
user	1m38.760s
sys	0m0.032s

Done with test: 

Then some I/O

Prep it

File System Status?
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sda3        5044188 1331740   3456212  28% /

sysbench 0.4.12:  multi-threaded system evaluation benchmark

128 files, 800Kb each, 100Mb total
Creating files for the test...

real	0m11.151s
user	0m0.008s
sys	0m0.644s

File System Status?
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sda3        5044188 1434652   3353300  30% /


Run Files Test

Using one the OS likes instead

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
128 files, 800Kb each
100Mb total file size
Block size 16Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!
Time limit exceeded, exiting...
Done.

Operations performed:  29040 Read, 19360 Write, 61828 Other = 110228 Total
Read 453.75Mb  Written 302.5Mb  Total transferred 756.25Mb  
(2.5208Mb/sec)
  161.33 Requests/sec executed

Test execution summary:
    total time:                          300.0096s
    total number of events:              48400
    total time taken by event execution: 1.6400
    per-request statistics:
         min:                                  0.01ms
         avg:                                  0.03ms
         max:                                 10.45ms
         approx.  95 percentile:               0.04ms

Threads fairness:
    events (avg/stddev):           48400.0000/0.00
    execution time (avg/stddev):   1.6400/0.00


real	5m0.027s
user	0m0.320s
sys	0m7.684s

File System Status?
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sda3        5044188 1434652   3353300  30% /


Done with test: 

Then cleanup

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Removing test files...

real	0m0.085s
user	0m0.004s
sys	0m0.048s

File System Status?
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sda3        5044188 1331744   3456208  28% /


Done with test: 

All this lifted from:
 https://www.howtoforge.com/how-to-benchmark-your-system-cpu-file-io-mysql-with-sysbench 
then modified a tiny bit

All in all, a remarkably low performance. Even modest SBCs can outperform it. The I/O in particular was abysmal. Old disks are just not fast.

Orange Pi One

This is a seriously cheap board without a lot of ability, yet it is usable as a desktop with xfce for some light browsing (i.e. NOT having 20 tabs open, but just a couple, otherwise memory use hits a wall) and editing (like with LibreOffice). I had to use LibreOffice to copy / paste these results here as the xfce XTerm didn’t have scroll back. So I had this edit session AND LibreWriter both open at once and it ran OK.

I did manage to accidentally launch a second copy of the Chromium browser (didn’t notice one was open as I’d minimized it and xfce put the tabs at the top of the screen while I’m used to lxde putting them at the bottom, so didn’t notice…). Well, with LibreOffice and 2 copies of Chromium AND a couple of terminal sessions open, I ran into some kind of memory limit and things slowed WAY down… Glacial in fact. Now I had 2 GB of USB disk mounted as swap (normally it has something like 128 M of uSD card) but I saw no swap usage. IIRC this system has “swapiness” set to basically “never if possible”, so it is likely a change of the swapiness setting and a reboot wold fix that. Just quitting the 2nd browser copy fixed it for me ;-)

So, to the benchmark. I changed the script to comment out the “read $ans” and wait for prompting and just ran it to a file with “bench > benchmarks”. That way no need to fix the lack of scroll back and not wanting to let me do a copy / past from XTerm. So the output will look a bit more sparse. Now this IS a slow chip and it is NOT using Neon. It is even less than a Raspberry Pi M3 in that it is an v7 (32 bit) instruction set, not v8 (64 bit) and not using NEON makes the times “way slow” in comparison.

https://en.wikipedia.org/wiki/Comparison_of_single-board_computers

Lists it as an Allwinner H3, v7 32 bit at 1.2 GHz speed. So more like the old type Pi M2 in terms of speed and capacity. Then with only 512 MB of memory, so seriously less file cache available.

OTOH, I’ve spent more than the $14 purchase price on a Vente Mocha and pastry / snacks at Starbucks… and here I am editing this web page in Chromium using it…

Test was run with a fast Sandisk uSD card, so file system sloth is due to the board and lack of disk cache available with the limited memory. The OS is Armbian Ubuntu (so no NEON GPU computes). It has an older (pre 1.0) version of Sysbench, so run without the “do things the old way” flags.

With that, here’s the benchmark results:

First do a single core

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          472.7443s
    total number of events:              10000
    total time taken by event execution: 472.7063
    per-request statistics:
         min:                                 46.73ms
         avg:                                 47.27ms
         max:                                 78.28ms
         approx.  95 percentile:              49.37ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   472.7063/0.00

Yes, you read that right. Almost 8 minutes to do it single thread. (7.86 minutes if you do the math.) Now you see what NEON GPU math gives you on the R.Pi M3 and similar systems ;-)

Now doing multicore

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          130.6047s
    total number of events:              10000
    total time taken by event execution: 522.1139
    per-request statistics:
         min:                                 46.72ms
         avg:                                 52.21ms
         max:                                165.18ms
         approx.  95 percentile:              62.00ms

Threads fairness:
    events (avg/stddev):           2500.0000/9.70
    execution time (avg/stddev):   130.5285/0.02

So roughly 2 minutes, 10 seconds. A lot better multicore. Which also shows why I’m so interested in multi-core systems and code that runs multi-core. ;-0

Then some I/O
[...]
Run Files Test

Using one the OS likes instead

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
128 files, 800Kb each
100Mb total file size
Block size 16Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!
Time limit exceeded, exiting...
Done.

Operations performed:  2220 Read, 1480 Write, 4611 Other = 8311 Total
Read 34.688Mb  Written 23.125Mb  Total transferred 57.812Mb  
(197.15Kb/sec)
   12.32 Requests/sec executed

Test execution summary:
    total time:                          300.2850s
    total number of events:              3700
    total time taken by event execution: 4.1506
    per-request statistics:
         min:                                  0.03ms
         avg:                                  1.12ms
         max:                                522.89ms
         approx.  95 percentile:               3.04ms

Threads fairness:
    events (avg/stddev):           3700.0000/0.00
    execution time (avg/stddev):   4.1506/0.00


File System Status?
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/mmcblk0p1   7484976 3512988   3866584  48% /

[...]
All this lifted from:
 https://www.howtoforge.com/how-to-benchmark-your-system-cpu-file-io-mysql-with-sysbench 
then modified a tiny bit

So why that “seriously slow I/O”? Most likely, as a guess, the other systems are actually NOT measuring uSD card or disk speeds so much as memory file system cache. Remember that the original page from which I modeled these tests used 150 GB of file space and I cut it down to 100 MB and they specifically said it was to prevent cache from biasing the results. With 2 GB to 4 GB of memory, there’s plenty of room to cache a 100 MB of files.

So I really ought to have a set of file benchmarks with that 100-200 GB of file size to actually test the file I/O part of the hardware and not the memory cache so much. OTOH, the reality is that for my actual use profile, I’m going to be interacting with less than 100 MB of “stuff” and less than 150 files at a time anyway. So this is more what I will actually experience 90%+ of the time. Only when doing a massive data transfer will I actually hit the underlying file system hardware sloth (to the extent it exists). So this one matters more to me. Yes “someday” I might go back and do the “slogging mondo huge files” benchmark… but not, I think, today ;-)

OK, with this one done, time to put it back in services as my NAS server. IF file I/O is so slow, why use it for Network Attached Storage? Simples! The network is slower than the computer anyway. I’m only running a 100 Mb network, not GigE. I also rarely move big file volumes via the NAS. I’ll plug the disks directly into same fast board if I want to move multiple GB of stuff. The NAS is just so I can find some small files without having 8 TB of old crap on my desktop (and potentially exposed to the internet). Frankly, it is turned off 90%+ of the time just as a security measure. So I turn it on when I need to rummage in the archives, or when I need to toss some stuff into the archive. even then, much of what is on it is from the internet, pulled down at less than 30 Mb/second and well suited to feeding into a slow NAS. I’ve been happy with the apparent speed.

The Rest

From here on down is just headings, in no particular order, for boards & systems I have that I’ll be testing as time and inclination permit. Some, like the Pi M2 and the Orange Pi, will need to be taken out of their present service to do the benchmark, and since shutting down that stack stops my DNS servers, NFS File Server, and the PiHole advertising blocker, it can only be done when those are not needed.

As I do various benchmarks, they will be moved up into the listings above.

Golly, only 3 left on the ToDo list.

Raspberry Pi Model 2 – original

Pine A64

Odroid C1

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , , , , . Bookmark the permalink.

24 Responses to Some Benchmarks For SBCs

  1. CoRev says:

    Wow! I didn’t know the N-2 was shipping. Initially looks to be 50% faster than the XU4s. Now I await the Rockpro64 test to see the difference(s).

    Great job and help!

  2. E.M.Smith says:

    @CoRev:

    Trying to find why the N2 was taking so long and the number of “events” was so large: It looks like sysbench had a change of behaviour at the 1.0 release. One runs to a time limit, the other to an event count limit. For this cpu test, the “event” looks like it ought to be A prime number. It also looks like Ubuntu version is limiting on time while Devuan is limiting on event count. Thus my puzzlement that the N2 had so many “events” for the same set limit of primes to do.

    https://stackoverflow.com/questions/44299402/sysbenchwhich-value-should-i-look-at-for-cpu-benchmarking

    Bottom line is I need to modify and rerun the benchmarks to get them stopping at the same total work.

    Since you are interested in it, I’ll move the RockPro64 up the queue… look for it in about 6 hours…

    BTW, I was watching Ameridroid when the notice swapped from “soon” to “in stock shipping” and just could not resist… so I’m likely in the first week of buyers…

  3. E.M.Smith says:

    I have redone the benchmarks with higher loads CPU and changed the “new” sysbench to match the old one that is on most of my platforms. The text has been updated..

    One fun bit so far is that thanks to using GPU calculations (NEON instructions) on 64 bit Raspberry Pi SBCs, it runs about 1/2 the speed of the Odroid N2 that does 14 seconds single core and 3 seconds multi (as Ubuntu seems to ignore NEON), and dramatically faster than the Odroid XU4 that takes 3 minutes 44 seconds single core and 45 seconds multicore, to the Pi 30 seconds single core and 7 seconds multicore.

    Nice illustration of why I’m so interested in GPU Math and multicore :-)

  4. E.M.Smith says:

    I’ve added the RockPro64. Single core 61 seconds, multicore 15.7 seconds.

    I/O 6 Mb/s read 4 write. The N2 is 4 read 2 write, but to eMMC instead of uSD card, while the XU4 is 4 Mb/s to USB 3.0 and the Pi reports 3 Mb/second to USB 2.0 disk.

    Note that the XU4 and Pi are running the older version of sysbench that looks like it just gives the average not breaking it out by Read vs Write. So as a guess, the averages for the N2 would be 3 Mb/s and the RockPro64 would be 5 Mb/sec.

    I think I need to put a summary chart at the top after a couple of more boards are done. Also to do a set of tests that use uSD, eMMC if present, USB 2.0 disk, and USB 3.0 disk on each machine. Just to sort out what’s media, channel, and rest-of-board.

    Well, I’ll try to do another one later tonight.

  5. CoRev says:

    EM, was curious about the time on the RockPro064 until you mentioned the GPU issue again. My own experience between the Pi M3B and the RockPro64 is the RP64 is much faster for everyday use. No time comparisons, though.

  6. E.M.Smith says:

    @CoRev:

    That is a growing complaint about the sysbench benchmark, that on some systems (notably some 64 bit ARM chips) it uses the NEON instructions for GPU math, and on others it doesn’t, so can give wildly different benchmarks than expected from the CPU alone.

    There are other benchmark codes available, and I’ve sometimes just knocked togetther some program of my own and used “time” on it. But as sysbench is sort of the standard, I figured I’d start with it.

    There’s also the point that usig the GPU for math is really becoming an important part of computing. The Nvidia board is all about that, with either 128 or 256 “CUDA” cores for GPGPU (General Purpose GPU computing) use. Widely used in robotic vision, neural nets, and animation – increasingly used for general purpose scientific computing. So there’s an argument that if you can use the GPU folks want to know that, and what it adds to speed.

    So at some point I’m going to find a “CPU ONLY” no GPU benchmark, or make one, and do another round with that.

    But the sysbench numbers give a general idea of the max you can expect from a given board.

    I suppose I also ought to read the man page on sysbench and see if it has a flag to turn off GPU processing ;-)

    (But I’m interested in using the GPU, so there’s that ;-)

  7. CoRev says:

    You’re doing us all without the skill set a worthy service. Thanks!

  8. E.M.Smith says:

    @CoRev:

    It isn’t a very hard skill-set to learn. The only reason I have it is that I decided to “go there”. IIRC, “sysbench” was created about 2007. Prior to that nobody had “the skill-set” with it. Heck, I’m only marginally functional with it ( I’ve skimmed the manual page once and used a reference link found in a web search, then put that into a script and did a “surprise” debug when it had issues from a release change).

    So while I appreciate the thanks, and you are most welcome, do realize that the anyone can become “skilled” with this kind of stuff just by jumping in and accepting there will be a few lumps along the way. Most programmers are not smarter than anyone else, just more stubborn and more willing to try things new, in my experience at least.

    Frankly, that’s part of why I publish the scripts and code. So folks going “How the hell did he do that?” can see how, learn now, and know how. ’cause that’s how all of us did it. Looking at what someone did and “puzzling it out”.

    So, with that script above, any one reading this page can run it on their computer and post the results here in a comment. I can’t own ever computer out there ;-) even though I might try 9-) Then the more adventurous can read the manual page on sysbench and find some other option that’s interesting, try it, and post both that result and the “how to” about that option. That’s how all of “Open Source” computing works…

    As per the NEON / GPU stuff: That’s all very new. A couple of years ago it was in the strange and exotic and almost nobody even looking at it other than the folks doing graphics displays (as that is what the GPU is for, after all). Only in the last few years has the whole field of using it for General Purpose math blossomed (largely driven by the ending of Moore’s Law on the horizon, multi-core as the fix, and the demands of robotics / machine vision – then spreading into “Scientific Computing”). Pretty much everyone in the field is a “Nooby” at it or just barely above noob. I’m only looking at it because of the use in doing “Climate Models” on very small hardware and a passing interest in robots.

    Heck, the first time I saw a kernel complied with GP NEON support was about 5 years ago and it was a hand rolled by a guy in the UK for experimental things. (I have not got around to doing a “roll your own” myself yet). Now it looks like Debian / Devuan have it compiled in, but Ubuntu is not setting the compiler flags to use it (yet…) so even there the diffusion isn’t complete just in the “Debian based” world. That, BTW, is why those above numbers can vary so much depending on OS. On my “todo” list is to install Ubuntu on the R.Pi and get non-NEON numbers for it. I have a chip in my archives that I ought to be able to just “pop in” and boot.

    So realize that whole NEON / GPU / “I know about Debian vs Ubuntu” is all stuff I learned in the last 5 years, often from obscure sources with a fair amount of time. Now you have it from one spot in just a few minutes. You have “caught up” a couple of years in just a few days. That, IMHO, is the only “Magic Sauce” involved in becoming a “computer guy”. Pick an area, do the swift “catch up” as it’s a lot faster, pick another area, repeat it. In about a year you will discover that you are the “local expert” on a bunch of things… then about a decade in you find you are on the leading edge of some things. Start with a ‘new area’ like GPU general purpose computing, you end up on that leading edge in about one year…

    So take the NVidia Nano. It’s got 128 “CUDA Cores” of GP GPU and basically a Rock64 class ARM SBC. It does, however, come with an amazing array of neural net, robotic vision, animation, etc. etc. examples in the NVidia support site. The whole CUDA compiler and support is only a few years old. They cost about $99. So for $99 and a few weeks “playing with it” on a daily basis, you can be on the cutting edge of modern computing as the “local expert” in CUDU GP GPU coding. I’ve thought of doing it… (I wanted to buy the “Jetson” board but it cost $200+ and was both hot and noisy so didn’t. Now the Nano fixes those issues and I’m considering it again… so this isn’t a hypothetical)

    So the point of all this is just to say “You have what it takes too”. It is much more about what you want than what it takes…

    Sidebar on Orange Pi: I’m posting this comment from an Orange Pi One SBC. That’s the $14 computer board with “only” 512 MB of memory. I’m running Armbian Ubuntu on it at the moment. Last night I ran the benchmark script on another chip (older release) but this morning I’m not sure where I put the results ;-) So I’ll likely just run it again over morning coffee ;-) FWIW, I find the little fella reasonably usable for web pages and such. It does struggle to play YouTube videos even at 360p in the “mini-player” format, so not for your media station ;-) (though after some fiddle I did get PulseAudio to actually let the music be heard…) But Chromium is doing well. Overall it’s a fun little computer for almost no money well suited to all sorts of mundane server tasks and the occasional xfce desktop / web browser use.

  9. E.M.Smith says:

    The Orange Pi One results are up now. With that, I’m getting off the OPi1 and going back to my regular desktop.

    It DOES work as a desktop and the browser is relatively nice (Chromium) but there are times it is just a little slower than I’d like, and if I open too many things / tabs it goes to memory thrash land and slow as molasses… That 512 MB of memory is more limiting than anything else. It’s fine… right up until you run out…

    Besides, it’s my NAS server and needs to get back to work! ;-)

  10. CoRev says:

    EM, due to your inspiration I have tried benchmark Hardinfo on my AMD desktop, the RockPro64, and the Odroid XU4. Just using the CPU benchmark I got these results:
    AMD Desktop 3.06 seconds
    AMD A10-7800 Radeon R-7

    RockPro64 5.72 seconds
    4xARM Cortex-A53
    2xARM Cortex-A72

    Odroid XU4 4.83 seconds
    4xARM Cortex-A15r2p3
    4xARM Cortex-A7

    All machines are running Ubuntu 18.04.2 LTS (latest version for each chip set)

  11. CoRev says:

    EM, if you could run Hardinfo against the N1 I could compare with my results. The whole process should only take ~10 minutes. I did the following to load and execute:
    sudo apt install hardinfo
    sudo apt update
    sudo apt upgrade
    hardinfo

    In the left column of functions go down to the benchmark section and run CPU Blowfish

  12. E.M.Smith says:

    @CoRev:

    I certainly can (though I think you meant the N2 as the N1 is no more… and I have an N2).

    I’m booked for an event this afternoon, and it’s already after noon here, so it might end up being later tonight before I can get to it.

    From what you posted, it looks like “hardinfo” is NOT using the NEON instructions. I’d been wondering where to find a benchmark that is fair about that point ;-) so thanks for that!

    Oddly, on the R.Pi M3 / Devuan doing “apt-get install hardinfo” says “unable to locate package hardinfo”… so it may not be available in all the various OSs I have running.
    https://packages.debian.org/source/stretch/hardinfo
    implies it is a source package and must be built. I’ll look into that later today.

    From the “sysbench” man page:

    cpu
    The cpu is one of the most simple benchmarks in SysBench. In this
    mode each request consists in calculation of prime numbers up to a
    value specified by the –cpu-max-primes option. All calculations
    are performed using 64-bit integers.

    Two key points about that.
    1) Doesn’t measure floating point performance At All, and that matters to a lot of codes.
    2) Very few codes actually need 64 bit int math. Much of the world is 16 bit or 32 bit math.

    So it heavily penalizes 32 bit machines for a use case they will rarely see in real life and it fails to penalize OS builds that are “soft float” and ignore the hard float hardware – despite that making a major difference to actual codes run (like the temperature processing I do…)

    So I’m not thrilled with sysbench. It is just “the first place folks start” and has widely published numbers.

    Then, just for amusement, here’s a link discussing some of the “issues” around using NEON codes in Linux / ARM compiles / kernel:
    https://www.mjmwired.net/kernel/Documentation/arm/kernel_mode_neon.txt
    which is why most folks have not taken on the work of using it, especially in kernel or major applications code that is expected to run on thousands of machine types.

    But as you saw above, if your build IS compiled with the NEON / VFP flags set, it can really scream on 64 bit integer math ;-)

  13. E.M.Smith says:

    Interesting… Using them, side by side, the R.Pi M3 and the Compaq Evo from long long ago; both on Devuan 2.0 level of OS:

    My general impression is that they are about the same. This will be far different from the sysbench numbers above as that does 64 bit integer math, and I’m doing web browsers and file editors… BUT, they both tend to have about the same responsiveness and about the same “issues” where things pause or glitch. Like closing a browser with a lot of open tabs and it says “FOO is causing BAR to be slow, end FOO?” or something like that, and if you just wait a moment the “close down FireFox” gets to that window and it all ends. Or opening a tab with video in it and it takes a while to load up the video, and then longer for all the other window decorations YouTube shoves at you.

    Don’t know how to capture that in a benchmark (yet). But it just really is the case that they “feel about the same”, where the Odroid XU4 feels Very Fast in comparison (those 8 cores doing 32 bit math and 8 bit character manipulation) while the Orange Pi One feels significantly slower.

    So a 32 bit 2.4 GHz Intel single core with 2 GB memory acts about the same as a quad core 64 bit 1.2 GHz ARM chip with one GB memory.. As long as you are not doing a lot of 64 bit math ;-)

    Oh, one other odd bit: FireFox on the Compaq consistently pegs the CPU at 100% use. Don’t know why. Maybe some window trying to run some script or “whatever”. I’ve seen FireFox do this if the dictionary isn’t quite what it wants, so I’ll go investigate that in a moment. But realize this perception is in the context of one core and it running 100%. Essentially it does a great job of taking keyboard interrupts so I don’t notice ;-)

  14. CoRev says:

    EM, yes, did mean the N2. I of course got my copy of hardinfo from the ubuntu libraries.

  15. E.M.Smith says:

    Well, I had a few minutes before I needed to get suited up, so set up the N2. Seems that hardinfo is in the Ubuntu libraries and it just installed like a champ.

    IF I’m reading the listed output correctly, this is named “unknown” CPU type and is ranked at 4.16 where “lower is better” and the things lower than it are
    3.49 AMD Phenom II x4 940 processor
    3.35 Intel Core 2 Quad core Q8300 @ 2.5 GHz
    3.24 Intel Core 2 Quad core Q95500 @ 2.83 GHz
    3.17 Intel Core 2 Quad core Q6700 @ 2.66 GHz

    That your reported value for the XU4 at 4.83 aligns with my feeling that it generally is just damn fast (except on a single threaded single core process) but not as fast as the RockPro64 or N2.

    So I think I’ve picked the right value out of that list (all the others look like Intel/AMD CPU types except for the one Power PC at the start of the list).

    Hopefully that’s what you were looking for.

  16. E.M.Smith says:

    -Here’s what “Copy to Clipboard” gave me, so it looks like I got the right one as it is listed at the top with the right core / speed values.

    CPU Blowfish-
    (Unknown) 2x 1896.00 MHz + 4x 1800.00 MHz 4.16
    Intel(R) Pentium(R) D CPU 3.00GHz 2x 3000.00 MHz 10.84
    Intel(R) Celeron(R) CPU540@ 1.86GHz 1x 1861.00 MHz 25.49
    AMD Turion(tm) 64 X2 TL-58 2x 800.00 MHz 12.58
    AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ 2x 2200.00 MHz 10.71
    AMD Athlon(tm) 64 X2 Dual Core Processor 6400+ 2x 1000.00 MHz 6.42
    Intel(R) Core(TM)2 Extreme CPU X7900@ 2.80GHz 2x 2800.00 MHz 6.81
    Intel(R) Pentium(R) M processor 1.70GHz 1x 1700.00 MHz 28.28
    AMD Phenom(tm) 8650 Triple-Core Processor 3x 2300.00 MHz 5.96
    Intel(R) Pentium(R) 4 CPU 2.40GHz 1x 2400.00 MHz 24.57
    AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ 2x 1000.00 MHz 7.92
    AMD Phenom(tm) II X4 940 Processor 4x 800.00 MHz 3.49
    Pentium III (Coppermine) 2x 999.00 MHz 17.81
    Intel(R) Pentium(R) 4 CPU 3.20GHz 2x 3200.00 MHz 12.06
    Intel(R) Pentium(R) 4 CPU 1500MHz 1x 1495.00 MHz 46.56
    AMD Turion(tm) 64 X2 Mobile Technology TL-58 2x 1900.00 MHz 10.10
    Intel(R) Celeron(R) M processor 1.40GHz 1x 1395.00 MHz 80.94
    Mobile AMD Athlon(tm) XP 2800+ 1x 1459.00 MHz 18.04
    Intel(R) Celeron(R) CPU420@ 1.60GHz 1x 1600.00 MHz 21.24
    AMD Phenom(tm) 9750 Quad-Core Processor 4x 1200.00 MHz 4.29
    Intel(R) Atom(TM) CPU N280 @ 1.66GHz 2x 1660.00 MHz 16.76
    AMD Athlon(tm) X2 Dual-Core QL-60 2x 1900.00 MHz 10.59
    Intel(R) Pentium(R) DualCPUT2370@ 1.73GHz 2x 1733.00 MHz 9.58
    AMD Athlon(tm) XP 1900+ 1x 1593.00 MHz 31.62
    AMD Athlon(tm) XP 1800+ 1x 1540.00 MHz 23.13
    Intel(R) Pentium(R) M processor 1.73GHz 1x 1730.00 MHz 19.66
    AMD Sempron(tm) Processor 3200+ 1x 1000.00 MHz 25.47
    VIA Esther processor 1500MHz 1x 1499.00 MHz 41.59
    Intel(R) Core(TM)2 Quad CPUQ6700@ 2.66GHz 4x 2669.00 MHz 3.17
    AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ 2x 1000.00 MHz 7.49
    Intel(R) Xeon(R) CPU3040@ 1.86GHz 2x 1862.00 MHz 9.11
    Intel(R) Pentium(R) DualCPUT2310@ 1.46GHz 2x 1463.00 MHz 11.60
    Intel(R) Core(TM)2 Duo CPU P8400@ 2.26GHz 2x 2260.00 MHz 8.29
    AMD Turion(tm) X2 Dual-Core Mobile RM-74 2x 600.00 MHz 8.97
    Genuine Intel(R) CPU 575@ 2.00GHz 1x 1995.00 MHz 20.36
    AMD Turion(tm) 64 X2 Mobile Technology TL-62 2x 800.00 MHz 8.84
    AMD Turion(tm) 64 X2 Mobile Technology TL-56 2x 800.00 MHz 10.79
    Intel(R) Celeron(R) CPU 2.80GHz 1x 2793.00 MHz 40.87
    Intel(R) Pentium(R) 4 CPU 2.80GHz 2x 2800.00 MHz 11.70
    Genuine Intel(R) CPU2140@ 1.60GHz 2x 1595.00 MHz 10.67
    AMD Athlon(tm) 7750 Dual-Core Processor 2x 1350.00 MHz 7.45
    Intel(R) Core(TM)2 Duo CPU E8400@ 3.00GHz 2x 2997.00 MHz 5.54
    Intel(R) Core(TM)2 CPU T7400@ 2.16GHz 2x 2161.00 MHz 9.37
    Intel(R) Pentium(R) 4 CPU 2.53GHz 1x 2525.00 MHz 48.08
    Intel(R) Core(TM)2 Quad CPUQ9550@ 2.83GHz 4x 2830.00 MHz 3.24
    Intel(R) Core(TM)2 CPU6700@ 2.66GHz 2x 2667.00 MHz 6.90
    AMD Turion(tm) X2 Dual-Core Mobile RM-72 2x 500.00 MHz 11.86
    Intel(R) Core(TM)2 Duo CPU T9400@ 2.53GHz 2x 2530.00 MHz 6.76
    AMD Athlon(tm) 64 X2 Dual Core Processor 4800+ 2x 2512.00 MHz 8.73
    Intel(R) Celeron(R) M CPU520@ 1.60GHz 1x 1600.00 MHz 22.07
    Intel(R) Core(TM)2 Quad CPUQ8300@ 2.50GHz 4x 2497.00 MHz 3.35
    PowerPC 740/750 1x 280.00 MHz 172.82

  17. CoRev says:

    Thanks, EM. That’s what I was looking for. Looks like the N2 is close to a 2-3 generation old Intel/AMD desk top. What I’ve been doing is getting ready to replace my wife’s older Win10 desktop, and try to get her to accept the NIX world.

    That latter may be the deal breaker.

  18. jim2 says:

    CoRev – I got my technologically backwards partner to switch from Windows. I picked Linux Ubuntu, then switched to Mint after the Unity desktop was introduced and adopted by Ubuntu. You can make Mint look like and work a good bit like Windows – so the transition was (almost) painless for her.

  19. E.M.Smith says:

    https://itsfoss.com/windows-like-linux-distributions/

    f you are terribly addicted to working on a Windows machine (nobody said that was a bad thing), your productivity will not hamper at all on Linux Mint.

    Why, you ask? The Linux Mint menu is located in the same position as the Windows start so when you move the cursor to the bottom right just like you did on Windows, You won’t be disappointed. LM has “activate on hover” for Menu. So that’s a bonus. The applications are arranged under categories so finding applications in a new OS won’t be an issue.

    The Software Manager is a minimalistic and simple tool for finding and downloading additional software on Linux Mint. Since Linux Mint is based on Ubuntu, you’ll find a very large number of applications for all your computing needs. A simple browse and click will get you anything you need.

    That said, you will likely need to “roll your own” Mint on the N2 as the Ubuntu on it right now is a Mate desktop. It isn’t that hard to swap desktops, but it does take doing something and the newer the board the more you get to do.

    IMHO the Odroid XU4 has a lot more software choices and for most stuff folks do is still a lot of performance. Being v7 32 bit it has a lot more stuff already ported and running, and having been around a while helps too. Single core performance is not up to that of the RockPro64 or N2 (and FireFox benefits from the single core – so far – but newer releases are more parallel so it’s coming; but Chromium is already fully multi-core).

    So personal use experience is that the N2 with Ubuntu is (despite SystemD that I despise as an “experienced systems admin”) fairly robust, stable and working out of the box pretty well; along with the N2 being very much fast enough for anything I do.

    The RockPro64 was still a bit of “fighting me” on software (but that might be something that’s fixed by now, or might also be broken on the N2 and I haven’t tested it yet… Matplotlib in Python had broken graphs…) but still had “more than the one Ubuntu” that is presently available on the N2.

    Over time, both of those “hot boards” will get many OS choices and lots fewer bugs.

    Which brings me back to the Odroid XU4. It is what I usually use as my desktop SBC just because it has relatively mature software, lots and lots of cores, and most things do NOT care about 64 bit integer math. Single Core performance is still quite good, and for a whole lot of stuff, you never get all the cores busy at the same time anyway.

    What seems to make more difference than core speed is just a lot of fast memory;. 2 GB or better. At 4 GB, it doesn’t even roll to swap on heavy use.

    So I suggest looking much more at how much memory you can get and at the OS maturity and choices than at raw CPU performance.

    Personal Opinion:

    In general, I’m just not thrilled with the RockPro64 yet. It was more “issue laden” in the OS and use, even though it had more choices. There were choices, but not rock solid ones. The Odroid folks seem to be more in touch with real computing needs and what’s important. Things ship with BIG heat sinks instead of none and you can glue on one when you find out it heat limits at slow without one. The OS is typically working right “out of the box”. They DO have a bright blue LED that flashes when the kernel is running that is nice to let you know it is working, but a bit of a distraction in a dim room at night… but I’ve gotten used to it;.

    Right now, I’m ambivalent between the Odroid XU4 and the N2 as my “hot box”. BUT, whenever something “just must work right” I find myself back on the Raspberry Pi M3. Something about millions of users finding every possible bug gets things fixed fairly fast.

    So I find myself jumping back to the R.PiM3 from lots of other boards / OSs… but almost never from the XU4…

    OK, all of that said, I intend to load up the GHCN Temperature Database onto the N2 and take it for an SQL test drive. IFF it is working right and making nice graphs, I’ll use it for Asia, Africa, and Europe. Why? Because there’s just so many countries in them I need something faster than the Pi or I’m going to get bench butt…

    If this thing continues to behave well and MariaDB / Python / matplotlib / etc. are all working rock solid, then it’s my new Daily Driver… If not, I’m back to the XU4 / Pi M3 combo.

    Oh, and remember than many *Nix programs are far more efficient than their MS counterparts, so you don’t need as much hardware for the same experience. Typically 2 or 3 revs back in performance is about the same speed in experience, IMHO. So look, buy one and give it a whirl. It’s all of $60 for the high end ones, so not like you are breaking the bank.; Call it one tank of gas and be done ;-)

    Oh, and a critical point is “Does she do a lot of video?”. If so, then you need a very hot SBC with working video & sound. No doubt about it. In that case, look up the “explainingcomputers.com” board evaluations on YouTube. He does a regular test of video performance.

    One other minor point: You could get one of those Atomic Pi boards and then choose to either boot windows or Linux “as you like it”. Having a dual (or tri or quad) boot machine is not a bad way to go for the transition…

  20. E.M.Smith says:

    Drat.

    Just did all the install and set up of MariaDB on the Odroid N2. THEN I decided to check on the database size before I loaded it all up. On another system I’d needed to put it on a different disk as “root” (/) was filing up. By Default it goes into /var (that sort of makes sense I guess…) but when you build systems without a dozen “spare” GB on / that’s an issue.

    root@odroid:/WDTB/xfs# du -ms MariaDB/
    7670	MariaDB/
    

    Yeah, just shy of 8 GB.

    What have I got?

    root@odroid:/WDTB/xfs/MariaDB# df /
    Filesystem     1K-blocks    Used Available Use% Mounted on
    /dev/mmcblk0p2   7251432 5661548   1284944  82% /
    

    A bit over one.

    So now I have the choices of:

    1) Not bother making Yet Another Example just to see how fast the N2 is inc omparison.

    2) Using Yet Another Disk mounted onto it.

    3) Finding out if I can just point (via symbolic link) at this other copy. But that risks blowing it up if there were incompatible changes made or the N2 ‘has issues”; AND it means I don’t get to measure how fast it is on loading this stuff unless I blow away the old copy.

    Sigh.

    OK. of all those, I’m leaning strongly toward #2. I’m pretty sure I’ve got a disk with an empty partition that I can put on this system. Then, the eMMC wasn’t looking all that fast anyway. At least not compared to USB 3.0 in the benchmarks. ( I was hoping the random I/O avoiding head seeks might be interesting / faster…)

    Oh Well…

    Guess it is off to “dig in the disk drawer and see who / what is empty”….

  21. E.M.Smith says:

    FWIW, while the Odroid N2 running Ubuntu is generally a very nice experience, I’ve had 2 unexplained crashes. It also looks like it (under MATE at least) tries to mount various hard disk partitions as /dev/usb0 for no good reason and even when there’s an /etc/fstab entry for that partition / LABEL set.

    It seems to be doing this for disk partitions that have an OS boot image on them. As one of my disks is “multiboot” on the R.Pi with Slackware, Gentoo, etc. partitions, that was a bit of an issue…. One of the cashes seems to have involved my doing a “du -ms *” in root (/) and it barfing on that /dev/usb0 that ought not to have been mounted anyway.

    This isn’t horrible. It is the sort of thing you get n the early stages of a new CPU / OS combo. Usually the worst of the bugs are gone inside 6 months, and by about the 1 year anniversary, i’t is a pretty stable product.

    (The other crash had no proximal cause I could identify)

    It IS a damn fast board (both single core and multi core) and running videos full screen doesn’t even manage to peg a core at 100%. Looked more like 2 cores about 50% each.

    I have also managed to install lxde (“apt-get install lxde”) and at the login panel click on the button to choose “lxde” from the login choices) so now it’s even more “comfortably the way I like ti”.

    I did the full database build / load and it was VERY fast up until the point where it was making all the indexes (that’s a lot of disk head seeks…) and that’s when one of the crashes happened as I was playing wiht disks in another terminal window and that usb0 thing bit me. Which means all the data / statistics from the load process evaporated too… Soooo…..

    Sometime later tonight or tomorrow I’ll drop all the tables, reload it all, and gather that statistics (again).

    In general, it is very nicely fast, but not dramatically more so. We’re talking like 15 minutes instead of a 20 minute process… IF I remember the number I saw, the dataset load process it applied to, etc. etc. So nice, but not earthshaking.

    OK, all that said, I’m packing it in on the N2 data load stuff for today and I’ll deal with it more ‘later”.

    I’ll be using it as my “Daily Driver” until further notice just to shake out where the “issues” are lurking.

  22. E.M.Smith says:

    Interesting oddities… And an answer…

    Devuan on the Compaq Evo does find “hardinfo” in the repositories and installs it. IT says the box has a “Pentium(R) 4 processor”… yet “htop” only ever shows one and the “sysbench” benchmarks are not 4 x faster for multicore. I wonder what the heck is going on?

    I think the answer is in the spacing. It is saying “Pentium-4” (R) not “Pentium(R) 4 core”… and it has been so long since I thought about anything Pentium that the mind naturally (now) runs to 4 as core count not “era of antiquities” ;-)

    So I think that ‘splains things. It’s a “Pentium-4” single core processor at 2.4 GHz. 512 kb cache and bogomips 4800 per “hardinfo”. Also with FPU, PAI, MMX, SSE, SSE2, and hyperthreading per the “processor” report.

    So anyway, I ran hardinfo / CPU / Blowfish and here’s the results:

    25.52  This Machine - 25.54
    26.19  Intel Celeron M processor 1.5GHz
    172.82 Power PC 740/750 (280 MHz)
    Results in seconds.  Lower is better.
    

    So about 1/6 th as fast at the Odroid N2

    I can believe that one 32 bit Intel core of that era is about the same as one hot 64 bit ARM core now at similar GHz and with the differences in memory speed and cache sizes then vs now.

    Interesting that this fairly big (in comparison) box is about the performance of a cheap SBC and likely less than a $35 Atomic Pi as it has 4 cores (though each at 1.4 GHz so takes two to match this one – but in total ought to be twice as fast).

    Still, this box will play videos (at limited size / speed) and for general web browser stuff is just fine.

    I’ll have to check my other systems to see which of them have “hardinfo” available in their repositories…

    One Other BSD Sidebar:

    I have downloaded and tried both A64 FreeBSD images from RaspBSD that are supposedly for the A64 board. Neither one would boot. Didn’t even try. I was hoping to put a BSD on it and they claimed to have one. I’m presently downloading the NEMS “whole site monitor” software for it and we’ll see if that works. The board runs on Debian, but I don’t need “yet another R.Pi class board” that’s less well supported running the same OS at the same performance. (Yes, it is only a $17 board so not a big deal). I’m still hoping to find a more interesting use for it, but “we’ll see”. If nothing else, later on I’ll put Debian on it and run “the usual benchmarks”.

    That just leaves the Odroid C1 and the Pi M2 (original) to do. As they are in my stack of 5 boards Dogbone “Case”, and fiddling with it can cause the power connections to bounce my DNS server, I’ll only be doing them late at night….

  23. CoRev says:

    EM, thanks for the latest 2 entries. Good info.

    BTW, have you seen Watts latest NOAA & UHI find?

  24. E.M.Smith says:

    I did the whole database load process again. Details here:

    Some Notes on GHCN Database Using Odroid N2

    Bottom line is that it is about 3x as fast as the Pi Model 3 on these tasks, and about 4x as much total computes available if you run more processes in parallel. Lots more total processing power than needed for anything I’ve yet tried, and that is most of what I ever do. I can’t see me needing anything faster until maybe when I get some complex model code running.

    @CoRev:

    If you mean that article about night lows held higher by urban cement and such, yeah.

    I do feel a tiny twinge of Oh Duh about reading “Pentium 4 processor” and associating it as “Pentium (4 processor)” in stead of “(Pentium-4) processor”…. Interesting how it has slowly gone from “fast box” to now being 1/6 of a cheap SBC over the years I’ve had it. I’m tempted to fire up my AMD 400 MHz. White Box runnung linux and benchmark it, just for giggles… The one with GIStemp on it from a decade ago… when the box was already a decade or two old…

    Well, time for me to get back to work making graphs :-)

Comments are closed.