Interesting Benchmark Of SBC Boards

This video has a comparison of several boards of interest to me, and a surprising result. It isn’t a direct hardware comparison, as the boards are running different OS versions; but as the author points out, these are the OS releases most available and most likely to be used on each board.

He compares the Tinker Board, Rasperry Pi 3 B+, R. Pi Zero W, Rock64, RockPro64, and the Odroid XU4. While I’m not interested in the Tinker Board (another reviewer called it “broken by design” as there is a huge volts drop over the micro-usb power spigot when you attach anything to the board…); I do have both a Pi M3, and Odroid XU4, plus I’m looking at the Arm A53 / A72 type Hexa Core and similar boards as the best alternative to the Odroid. So I’ve lusted after the RockPro64. (Which is why I was looking).

From comments on the video:

Published on Sep 9, 2018

Odroid-XU4, Rock64, RockPro64, Tinker Board S, Raspberry Pi 3 B+ and Raspberry Pi Zero W ARM SBC group test. Video includes specification comparison tables, boot test, Sysbench CPU test, and GIMP complex filter test.

The test image used in the GIMP test can be downloaded here: https : //www…

17 minutes:


It is surprising how well the Rock64 boards do in his benchmarks. I need to learn more about the RockChip offerings. They have several. Some are used in the Libre Boards, and there’s an Orange Pi using another chip.

Lists some various uses / chips. It will take a bit to figure out which of these is a what and their comparative speeds.

The point out the Pine64 is similar / the same as the Rock64 ( I think they are made by the same company). Using the RK3328. (quad core A53 – 64 bit data path! I think at 1.6 GHz)

The RockPro64 uses an RK3399 (Hexa core that has A72 & A53 in 6 (2+4) config. ) I don’t know the GHz. In general it does look like the overhead of swapping core types slows things down more than desirable…

The Tinker Board uses an RK3228 (quad core A17 32 bit 1.8 GHz. )

Firefly-Rk3288 (no surprise) uses the RK3288 (quad core A17 32 bit 1.8 GHz. )

Libre Board ROC-RK3328-CC uses the RK3328 (quad core A53 1.5 GHz – 64 bit data path!)

Orange Pi RK3399 uses the RK3399 (that has A72 & A53 in 6 (2+4) config. )

Radxa Rock uses the RK3188 (quad core A9 1.6 GHz.)

With each core type ( A9 A17 A53 A72) of different word lengths, data path width, and “computes per GHz”, figuring out the Bang / Buck on those guys is a bit complicated. While benchmarks work best, it can be pricey to buy them all to find out ;-) So some theoretical comparison will need doing instead. The compariative performance of the Cores is generally known, so that can be computed, then it’s down to what the memory and connections on the board lets it do. I think I’ll leave that for tomorrow ;-) Along with finding prices ;-)

Just as an FYI, the Odroid XU4 has a 4 + 4 with 4 of the cores slower A7 and 4 the faster A15 cores – but all 32 bit v7 instruction set and 32 bit data path. In my watching the boards CPU monitor, it rarely uses the A7 cores when there’s any real work to do; so anything of about the same performance as 4 x A15 @ 1.8 GHz ought to match it most of the time but without the cost and complexity of an octo-core board. But to compare a A15 to an A17 or A53 means “some math required”… then there’s that issue of the data path width and memory performance… (why folks benchmark…)

But: The big takeaway for me was that the Rock64 was surprisingly high performance, though just why was a bit unclear. In theory the RockPro64 has a lot more “juice” with much more silicon available, but something is holding it back. Scheduler not up to the task maybe? Ore overhead of swapping CPU types eating up too much performance? Or is it that Ubuntu is a resource hog in comparison to Debian? Who knows… It could just be that the OS was compiled with NEON float (uses the GPU as a math processor).

So I guess this posting mostly amounts to a big “I’ll be digging here!” on those chips and GHz and who’s a what for how many $$…


The Pi M3 is almost fast enough for a daily driver, but between needing to fiddle the clock to get it to run full speed, the “usual” heat sinks being too small, and the communications paths on board being limiting; well, it is just slow enough to be annoying after an hour or two.

The XU4 is a wonderful experience. The speed is a joy and I’m happy with it. Except software is a bit of a PITA. Not a lot of folks doing ports for it, and those that are done often put it last or late on the list of QA – so often some bits are not tested or working right out of the box… Then it substantially never uses all 8 cores as they are in a “Big/little” arrangement and mostly you just use the fast 4 or are not doing enough to matter what core is being used.

So I’m thinking something with 4 faster cores and better data path design would likely be A Fine Desktop. That’s the idea anyway. 4 cores at about 1.8 GHz ought to do it (A53 or better). It’s not a big itch to scratch, just a minor annoyance. Then again, with boards costing less than 1 tank of gas (and sometimes 1/2 tank…) it’s not an expensive itch to scratch either…

Given that the OS types were not characterized as to how math was set up (software float, hardware float – coprocessor, hardware float GPU / vector) that alone is a great big open issue that could account for the Rock64 beating the RockPro64. It is one of the reasons I’m thinking I need to “roll my own” OS. So I can set the math type. It can be a BIG win. Using NEON can give order of magnitude speed ups on some problems, at the cost of a small precision loss that drops you from I.E.E.E. compliance. Unimportant to things like graphics and general use; but sometimes not acceptable for Engineering Design work or hard core Science where using double precision is common and every bit matters.

Many Linux distributions, especially early ones or for unusual boards, will just set soft float since it always works and then they don’t need to deal with just which math coprocessor is in use or how to program for it. A rude thing to do, but it is done. Let’s you ship on time by skipping work that most folks will not know was skipped. Also the same software can run on more hardware types then.

So finding a straight hardware comparison benchmark with the same software is likely to be in order.

I’ve looked into a “roll your own” OS before, enough to see how to set the flags for soft vs hard vs NEON floating point math. (Or a different vector math – not all video cores use NEON and I’m not yet familiar with which GPU cores are NEON vs other instructions).

It is one of THE big things worth playing with, to convert the General Purpose releases that are usually Soft Float into Performance releases fitted to your particular hardware.

IF I had to bet, I’d bet that the Debian release on the Rock64 was compiled with some kind of hard float, and the Raspberry Pi was done with soft float. Very few folks use NEON as it means sometimes your video cores are not fully available for drawing pretty pictures and doing movies… it’s mostly a hard core science geek thing to use it; and even then you must know what precision is needed for your work and is the math coprocessor the better choice.

So I’d say that computing primes benefits from 64 bit math, and that most of the loss of speed in the Pi and Odroid are from the 32 bit OS on the Pi and the hardware 32 bit on the Odroid; perhaps also levered a bit by hardware 64 bit math vs software math. But it would need testing to know; and it might be compiler flags in the build.

So there you have it. A “hot board” but with the question of “Is it the board, or the OS build?” Then there’s just a lot of leg work to compare chips, speeds, bucks, and builds.

Even with that, though, the Rockchip boards are looking good.

Subscribe to feed


About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits. Bookmark the permalink.

37 Responses to Interesting Benchmark Of SBC Boards

  1. CoRev says:

    E.M. a big thanks for finding that video. I too was thinking of upgrading and had considered a Rock64 (various sources TBD), and now have it at the top of the list. I’ll let you know which Rock I go for.

    I’m just tinkering with no real need, but my stable includes 2 Pi (2s), Pi 3, and 2 XU4s. I’ve stabled the Pi2s, and am to cluster the XU4s and the Pi3.

  2. Pingback: Interesting Benchmark Of SBC Boards –

  3. jim2 says:

    I was thinking it would be good to have some indication of what chip is doing what. Given the somewhat long duration of the tests, I bet an infrared camera could “see” if it is the CPU or GPU running. Regular web cams have an IR filter which in many cases can be removed. It doesn’t make a great IR camera, but it might suffice for this application.

    If nothing else, a simple IR thermometer might do the trick.

  4. E.M.Smith says:


    I’ve got 2 x PiM2 in my cluster. They are an OK addition to a cluster with a lot of small jobs. Not that much slower than the Pi M3 (though there are 2 different kinds of Pi M2 I think…). When doing distcc compiles, the Pi M2s happily contribute to throughput.

    I’ve also got an old Pi M1 (B+) doing regular duty as my DNS / Proxy server. My point? Even the low end machines can have a productive life ;-)


    I find it just fine to simply put my finger on a chip to see if it is hot, or not. But in this case that’s complicated by the fact that the FPU and GPU are on the same die in the same package as the CPU. It’s all one chip.

    Usually the “build” includes an indication of the kind of float math used in the name of it along with word length. The whole armel armhf arm64 naming thing.

    armhf = hardware floating point instructions + 32-bit instruction set. 64-bit ARM supports hardware floating point and NEON by default, so no need to specify a qualifier like ‘hf’. As mentioned below, RPi foundation hasn’t added support yet for 64-bit mode on the Pi3. – BitBank Jun 13 ’16 at 16:15
    Please note that Arch linux community division dedicated to ARM platform ( already has support for Aarch64 on Rpi3. You can download an image for Rpi3. – Amit Vujic Mar 8 ’17 at 1:02

    So just depending on how the math libraries were compiled you can get dramatic changes in the math throughput. In particular, was the Pi running a 32 bit OS (so doing many math steps to get to an answer instead of just one) vs the Rock64 OS built as arm64 by default doing NEON.

    For the XU4, it is all 32 bit hardware so for Double math (that primes calculation might well be written to use doubles) it too would take many steps instead of one hardware calculation step.

    That would also explain why it was only the primes calculation that showed the huge speed advantage, while the other processes using more 32 bit math,or not much math at all (booting) would be more as expected.

    That’s why I spend time looking at particular OS builds and futzing around with them. Because it makes big differences to some (limited) kinds of operations that just happen to be the math based ones I care about.

    Oh, and a sidebar on heat: Notice that both the Pi M3 and the Rock64 in the video are shown without heat sinks. So you might get a quick 7 second run, but then heat limit as the chip heated up. The Pi M3 since it ran longer would heat up more and heat limit more. I’d like to see pictures of the board “as run” to see the heat management in place. I know the Pi M3 needs a decent sized heat sink to get full computes out of it.

    Then the exact nature of the OS build. Is it named with armhf, armel, arm64, … I’m pretty sure that the Pi when running the 32 bit OS is running armhf but someone could have done an armel.

    Small CDs or USB sticks

    The following are image files. Choose your processor architecture below.

    amd64 arm64 armel armhf i386 mips mips64el mipsel ppc64el s390x

    As you can see, they offer the soft float (armel) along with the hard float 32 bit (armhf) and the 64 bit NEON / hard float (arm64) builds. All of them ought to run on the Pi M3.

    Which one was being run in the benchmark? I don’t know… but I think it was most likely armhf so the Double math would take a lot more steps / Double calc. I doubt it was running armel anymore (but who knows) just because the guy doing the benchmarks “has some clue”. Then arm64 was still a bit young some months back so I’m not running it on my cluster (yet…). In general use for things like a browser it was slower(!) likely from the fact it was moving 2 x as much instructions & data per operation and the Pi data path & memory isn’t very strong… so in general purpose computing it can actually be a detriment as much as a benefit.

    FWIW, there are also “Thumb” instructions of 16 bit width in the Arm architecture specifically for the purpose of very small very fast code with only 16 bit precision.

    Oh, and just to make it even more fun, while your general build might say something like:

    It is often the case that particular packages can be compiled with armhf set (or even armel) and you can have folks even coding bits in Thumb Instructions if desired. So it all, really, depends on how much optimization time the packager is willing to put into it…. When I discovered in early testing that FireFox-64-bit was very doggy and crashed, I was tempted to just substitute the armhf build into the Debian-64 I was testing. (I decided to “just wait” and run the 32 bit OS a while longer). So in fact even the end user sys-prog can make such swaps.

    In the end, I chose to swap the entire “Userland” and ran armhf-32 bit on top of an arm64 kernel build…

    It’s so common that in some build systems they are set up with a flag you set to easily override any given package build. But with something like 50,000 programs, folks usually ignore it and accept the default unless is actually breaks something.

    Since first ship, there’s now been a few years of experience with arm64, and it is oh so slowly getting (been?) tuned up along the way. So things like FireFox-64 work now. But how much “tuning” was in the Rock64 OS vs the R.Pi version? Depends on who tossed how much programmer time at it…

    Thus my interest in that whole armel armhf arm64 NEON thing… There’s a fair amount of speed to be gained in any given processor by adjustments and tweeks… but to get that you likely need to roll-your-own for the codes you want to use. (compile them with the “right” flags set).

    Oh, and just to complicate things more: There are also build flags to do optimization (at the expense of much longer compile times) and run time checks (at the expense of slower running)… so depending on how those are set you can get swings of speed too. Why “using the same build of the same OS” matters in benchmarking and why he had the “Horror!” disclaimer up front ;-)

  5. jim2 says:

    The chip geometry is known, so IR imaging might still work. What part of the chip is hottest?

  6. jim2 says:

    And how does the heat image change over time, if the chip gets too hot and changes modes.

  7. E.M.Smith says:

    Oh, and worth a mention that the OS scheduler may be set up to use the “Big/little” feature of the RockPro64, in which case that Primes calculation might well be running ONLY in the 2 x A72 cores and ignoring the 4 x A53 cores that would be faster in a parallel group of 4. So one thing missing in the benchmark is the output of htop during operation showing how many threads running on how many cores of what type…

    It looks like the Devuan build on the XU4 lets you use all 8 cores at once (not running “Big/little”) and with process ‘stickage’ in that a process once started on an A7 core tends to stay there instead of migrating to the faster A56 cores. Why? Donno… Somebody made the decision that was “better” – but better for throughput or better for ease of process control or ease of programming? Did something similar happen on the RockPro64? Again, don’t know. Seeing the htop output during operation would be enlightening… But clearly the Debian derived OS ports are not fully using the “Big/little” architecture well – yet. It’s a relatively new idea, so “some assembly required” and much coffee needs application ;-)

  8. E.M.Smith says:


    I think I’ve seen that done on an Intel CPU chip. But with the heat sink removed…

  9. E.M.Smith says:

    An interesting UPS board that attaches to an Odroid C1/C2 board for $46
    Has a built in battery and PC board to match the base SBC…

    RockPro64 for $60

    Pine A64 2 GB board (ought to be the same as the Rock64? but isn’t) for $37
    but the description says Allwinner chip at 1.2 GHz… so looks like there are old vs new board types to watch out for.

    Actual Rock64 board (low memory 1 GB option at $30 ! 2 GB for $40 and 4 GB for $50)
    has Rock chip at 1.5 GHz…

    My only relationship with Ameridroid is as a satisfied customer. I’ve bought 3 Odroid boards from them (with PSU & some case bits). All arrived fast and well packaged. Prices are better than Amazon.

  10. E.M.Smith says:

    The WIki on comparative ARM cores was unenlightening about the A72 cores. But here:


    The Cortex-A72 delivers 3.5x the sustained performance in the smartphone power envelope over 2014 28nm Cortex-A15 processor designs. The processor features several major micro-architectural improvements which build on the current generation of Armv8-A cores. The enhancements in floating point, integer and memory performance improve the execution of every major class of workload.

    The processor is optimized for the 16nm FinFET process technology, enabling the Cortex-A72 to clock up to 2.5GHz in the mobile power envelope and leading to even higher total delivered performance.

    On top of these key performance improvements, the Cortex-A72 CPU also benefits from significantly lower power consumption. This improved efficiency combined with the 16nm FinFET process technology enables the Cortex-A72 processor to achieve a 75% power reduction in representative premium mobile workloads.

    So 3.5 x an A15. From here:

    I’d found the A15 was roughly a 4 x uplift from the base core in the Wiki and the Arm A7 was a 1.9 x while the A53 was a 2.3 x uplift. So that makes the A72 a 4 x 3.5 = 14 factor on that scale.

    Given 2 x 14 = 28 “base cores” of production from those 2 x A72 on the RockPro64 and the Rock64 is just 4 x 2.3 = 9.2 x there’s something else going on in the benchmark. I’d have to assert it is in the OS build choices. On raw computes, that RockPro64 ought to just rip.

    So I guess it will come down to “Do yo want to find / make a more optimized OS? Or do you just want fast 64 bit computes out of ‘the usual’ OS install?”

    I’ll need to do some more comparative theoretical speeds figuring, but after coffee ;-)

  11. E.M.Smith says: has the Debian as:

    So as an arm64 is is going to be using SIMD (Single Instruction Multiple Data) NEON as well as FPU floating point.


    1.2 Ubuntu 18.04 Bionic

    1.2.1 LXDE Desktop aarch64 [microSD / eMMC Boot] [0.7.9]
    1.2.2 Minimal aarch64 [microSD / eMMC Boot] [0.7.9]
    1.2.3 Minimal armhf [microSD / eMMC Boot] [0.7.9]

    So IF the guy doing the testing didn’t like the LXDE desktop (and IIRC he said MATE?) then he could have chosen the armhf and missed out on the NEON SIMD advantage. (Or got the SW somewhere else and had no choice offered of 64 bit Ubuntu).

    I’d really like to know why it was only “roughly equal” to the Rock64… it’s acting more like it was doing 32 bit math on 2 cores and not NEON 64 bit math. Or perhaps this is the issue:

    While RockPro64 does not come with a heatsink by default, some sort of cooling solution is a must for Rockchip RK3399, as we’ve seen with Firefly-RK3399 that a fansink can make a big difference in terms of performance compared to a thin heatsink, so I can’t imagine what would happen if I run the board without any heatsink at all.

    I didn’t see a big ‘ol heat sink in his photo… Running two very fast hot cores at full bore will have them heat limit very fast… IF computes are heat limited out of the two boards, then it would make sense that they would have about the same finish time (and the darker more radiative chip finished first…)

    I think I need to look for a comparative benchmark using large heat sinks…

    In any case, the Rock64 looks like a fairly cheap way to get a nudge more performance than the PiM3 and I’d really benefit from the 2 GB “uplift”. (My Pi regularly shows a few hundred MB rolled out to swap space, and for file copies, it uses all the memory it can as file buffer space so more is faster).

    I’m not really seeing the personal need for a RockPro64 at the moment. Yeah, it would be fun to play with the A72 cores, but most of their added speed comes from “very deep predictive pipelines” and folks are actively taking that out of the Linux Kernel / patching it to fix the issues. (Potentially another issue: What patch level was the kernel at for the Ubuntu? Was it the 4.19 with patches, so slowed a lot? So that might benefit from waiting just a little while to see what 4.20 or 4.21+ does to A72 performance…

    I think I see “more digging” required and finding some benchmarks with heatsink and same OS for the Rock64 vs RockPro64.

  12. E.M.Smith says:

    That cnx-software link above is a real gem. Guy does video testing and more. Seems the Ubuntu / RockPro64 is not doing hardware video decode… all software:

    Video Playback in RockPro64 Board

    AIO3399-J did not come with any video player, and for good reasons since I did not manage to play any videos with hardware decoding in that board, although software decoding works up to 1080p.

    But in the case of RockPro64 Ubuntu 18.04 image, SMPlayer, YouTube Browser for SMPlayer, and mpv Media Player were already installed. So let’s play some videos.

    Sadly the video was very choppy, and as you can see SMPlayer relies on mpv Media Player with everything decoded by software.
    The video played in full screen mode, was very choppy, and from the error messages above it’s pretty clear hardware video decoding failed. People told me they manage to use rkmpv, but I failed to fix the issue in a reasonable amount of time. Anyway, I’m pretty sure this will be working in subsequent firmware releases, and rkmpv is the program to look into for hardware video playback.

    I’m no fan of Ubuntu as they tend toward the “fat and slow and WE are in charge, not you.” Exactly opposite my desired direction… So it’s possible this is working in some other release. But since Ubuntu is a variation on Debian, were it fixed in Debian it would likely be in Ubuntu…

    Since “jittery video” and recomposting artifacts on panel moves in a frame are the major annoyance I’m trying to fix right now (with the Odroid XU4), it looks like this board is not for me at this time.

    Sidebar on heat: Also down in that article, AFTER applying a very very large heat sink, he still has heat limiting “issues” with the chip. This thing MUST have a very big heat sink to work fast. Just the lack of a heat sink in the benchmark video could cause this thing to lock up right out the gate and limp to the finish line… Explaining why 6 cores an WAaaayyyy more total computes pulled up last. 6 cores of “instant heat” at the start then stuck in low gear to the end…

    So, onward to the question of Rock64 video performance…

  13. Sandy MCCLINTOCK says:

    Here is a series of SBCs that sit between a desktop and a low-cost Rock or Pi type board.
    I have to wait another month or two to test, but the sales pitch looks interesting ;)

    By the way did you see this ‘hardware virus’ that’s been around for 3 years but has been kept quiet by the TLA agencies. It is almost invisible and the version 2 sits inside the glass fibre mother board!

  14. E.M.Smith says:

    While the software issues noted above for the RockPro64 had me thinking “early port of software not any hardware optimizing done” (as folks need to write / modify code to run on new hardware so at first they get the CPU going and release it, then slowly over a year or two get the various “other processors” in the chip working – Video Core / GPU, FPU, etc…) but it is nice to have confirmation:
    down near the bottom:

    Please Note:
    1. Pine64 is committed to supplying the ROCKPro64 board for at least 5 years until the year 2023.
    2. The ROCKPro64 is still in early stage development cycle, so the current batch is only suitable for developers and early adopters.

    That explains a lot. Young software not using all the hardware. No heat sink when it needs a giant one to cool 6 hot cores. Perhaps some firmware issues to work out (so, for example, poor distribution of threads to available cores or poor breaking a task into multiple threads. That kind of thing, tuning and polishing, comes way after “Did you get it to compile or maybe even run, at all?”).

    There were similar issues with the move to the Pi M2 where multi-thread to multi-core just was not in Raspbian. The first adopters had an OS tuned to run on one core. You could get more cores used with discrete commands (one / CPU) but things like FireFox or gcc would only load up one core. Now the distribution of work is much more multi-core.

    So it’s possible that Sysbench was running on just one core… not compiled for multicore and multithread… and they just got it running single thread… (as a guess of possible).

    In any case, I’ll wait a year for any Hexa-core board, thanks. A highly optimized quad core with a couple of tuning years behind it will work better… (Subject to change if someone shows they have all the hardware working now…)

  15. E.M.Smith says:


    Yeah, it came around in an earlier W.O.O.D. posting. A neat trick and why I give a BIG set of bonus points to boards made outside China. Why I like Odroid is they are from S. Korea. R.Pi is designed in the UK and folks play with it ‘low level’ so any buggery would likely be noticed by someone (possible to do it, but the payoff is likely very very low and the probability of being found out is very high.) Nice to have it being recognized as an issue, though.

    I’ll take a look at the boards / link… Thanks!

  16. E.M.Smith says:

    Looks like a heck of a hot board, but not the kind of board I’m interested in.

    I’ve settled on ARM chips as they are less TLA Influenced than Intel / AMD chips, I hate fans, and I’m not in the “few $hundred” price range. Also don’t do gaming so the integrated GPU is of only mild interest as a potential OpenGL compute engine. It IS still likely more Bang/$ than a gaggle of little Arm cores, and maybe ‘someday’ when I’ve got some code actually running that needs it, I’ll decide to make a farm of AMD-64 cores; but not for a couple of years at least… and by then there will be something newer and hotter.

  17. E.M.Smith says:

    Interesting chart of integer performance of ARM Cores:

    As the Odroid XU4 is A15 cores, that’s hard to beat. Only A57 and A72 beat it at a given GHz.

    Note that chart is “Dhrystone MIPS / MHz” or roughly Vax 780 / MHz meaning that the Core A15 gives you about 4 MIPS or about 4 Vaxen per MHz or about 4000 MIPS or 4000 Vax 11/780 computers for the 1 GHz clock, so 8000 Vaxen at 2 GHz but 4 cores so 16,000 Vaxen in the Odroid XU4.

    Just makes me wonder where all that processor power goes… I ran a Vax 11 / 780 with logins for a couple of thousand people in about 1980. Somehow I don’t think I’d get 2000 simultaneous users on an XU4… Then again, with them running text terminals, maybe ;-)

    So, OK, those 2 x A72 cores in the RockPro64 ought to, alone, add up to 14 MIPS / MHz or 28,000 MIPS for two at 2 GHz. It ought to be about 7/4 times as fast ( 1.75 x as fast). Any thing less than that is software not being in shape.

    Note again this is all INTEGER so none of the floating point complexity enters the game (GPU, FPU, NEON). Integers are usually what maters for text tasks and general systems operations.

    So I’m left with the nagging belief that for the quad-A53 RockPro to do so well on the floating point math benchmark, it must be doing SIMD processing using the GPU / NEON and for the RockPro64 to do so poorly, it must have “issues” in the software and heat management (no heatsink). That implies that with the same Debian on it that was on the Rock64, and with a big heat sink, it ought to go to “way fast” for results.

    ALL the Rockchip 3399 chip users are very new products and it usually takes 6 months to a year for the software guys to catch up with the hardware advances… so “watch that space”. There are also several being touted with “Availability next year sometime”.. The Odroid-N1 is stated as being maybe next May or June… (at $110 or so – not cheap).

    So far my list of present or proposed RK-3399 boards is:

    Nano Pi M$
    Nano Pi NEO4
    Nano Pi PC-T4
    Odroid N1
    Libre Renegade (maybe…) and Renegade Elite (at $110 also)
    Rockchip RK3399 Saphire
    Orange Pi RK3399 ($109 for the 2 GB version)
    RockPro64 (at $60-$100)

    Expect those prices to stay high for a year or two, then drop.

    FWIW, I saw one reference that claimed the Rock64 had an A17 core in it. Clearly it was a mistake as that’s a 32 bit core. It has A53 cores at about 2.25 MIPS / Mhz or about 3375 MIPS / core at 1.5 GHz. Since there’s 4 of them, that would be 13,500 total Dhrystone Integer MIPS (or Vaxen ;-). Given that I once spent something like $50,000 to get just one of them, and used at that… I’m tempted to get one of the Rock64 boards just to say I paid (given about $40 ) roughly 1/3 ¢/Vax for 13,500 Vaxen ;-)

  18. CoRev says:

    E.M. I decided on and ordered the PineRock64 4GB yesterday. If the Chinese can get any intel from my computing their welcome to it. Board, power supply and the open case came to ~$69 directly from Pine. Now to wait to see how fast hey deliver.

    I’m now in the process of taking one of my old AT power supplies and butchering as a source for all these SBCs.

  19. jim2 says:

    Maybe you can try the Vax 11/780 OS.

  20. E.M.Smith says:


    Cute, very cute ;-)

    But that’s a BSD emulator for a Windows machine. It would be easier to dig out an old BSD source archive and just port it directly to the R. Pi. Given that BSD already runs on it, and isn’t particularly different from the BSD of old (part of why I like it so much, they took “If it ain’t broke don’t fix it” to the extreme of “If it ain’t broke leave it the Hell Alone!” and since it pretty much “just worked” it pretty much didn’t change… Which brings us to FreeBSD / NetBSD. Based on just that Version 7 BSD from the Vax Era. Every time I boot it, there’s a bit of nostalgia involved ;-)

    And the “Retro Computing” folks have already “gone there” 8-)

    Here’s a project that makes a PDP-11 clone out of a R. Pi and states it runs V6 Unix, to Ultrix to 2.11 BSD!

    The expensive bit was he had a custom injection moulded case made to match the PDP-11(!)

    FWIW, squirreled away in my archives somewhere is an Algol-68 Compiler matching my 2nd programming class… Then there are some folks making a Burroughs B-6700 clone and they have saved a copy of the OS (I think! – they were still trying to find a few bits like the OS side compiler last time I looked); so in theory I could make the environment where I learned to love computers…

    So yeah, retro computing, it’s a thing ;-)


    The SBCs are not a very attractive bugging target. That’s part of why I’m moving onto them. They also have a huge gaggle of highly technical folks looking them over really really closely so the odds of getting caught are very high. Finally, they don’t have a lot of room (physically or in terms of computes) to wedge in “hacks”. The hardware is open for inspection by everyone and the standard chip behaviour is clear. The firmware doesn’t have a whole lot in terms of “blobs” to exploit (and most of that has security hash values that are known so change is hard). The OS of choice is beyond their control (and in fact most vendors provide only modest OS support so lots of folks run code from someone else…)

    It just isn’t an attractive target and has near 100% probability any attempt will be discovered by someone eventually…

    Interesting idea on the PC Power Supply… I’ve got a half dozen of those (currently in machines headed for decommission / parts pile) and now that you remind me, it does have a nice regulated 5 VDC in it… Add a screw connector bar and make some cables with a GPIO pin connector on one end, screw lug on the other end, and you can power just about anything via GPIO (avoiding the limits of the silly uUSB connector with hard current limits before it overheats). With a big enough wire from PSU to power bar, you could even remote the PSU enough to not have much fan noise… (like in a box in the closet ;-)

  21. jim2 says:

    I know a guy that has the PDP-11 emulator. Looks pretty cool.

  22. Simon Derricutt says:

    EM – running long wires for your 5V risks interference and bad power supply. Might be better to run a 12V long wire feeding a DC-DC converter board close to the system. Wiring in to the GPIO looks good, providing that feeds to the ground plane (almost certainly) and the power-plane (maybe, since it depends on how they’ve routed it). Since each board is likely limited to around 10W total onboard, since that’s the limit of the connector they use, then an SMPS wall-wart at 12V and a few amps might be more efficient for half a dozen SBCs than the 200W or more multi-voltage PC power supplies, and somewhat easier and quicker to change out if it blows. Also likely no fan at all. A further advantage of the 12V-5V drops is that you can switch to run on batteries easily. Disadvantage is of course the extra boards needed and the associated failure rate, but the cost is of the order of a couple of dollars and the failure rate should be low.

  23. E.M.Smith says:


    Out in my garage is a parts bin with a bunch of TO-3 can linear regulators for 12 VDC and 5 VDC. Got them for free. Were I running a long wire, it would be fat and shielded…

    FWIW, first electronics I ever built was a 5u4 tube base PSU w/ 5 VAC at God awful amps for tube heaters and 400+ VDC for plate volts. Made from old TV tranformer & tube. I have nostalgia for powersupply units made from recycle ;-)

    In my office are 2 UPS boxes w/ 12VDC batteries, should I bother to add a V regulator dropper circuit. But less nostalgic than a chunk of my white box PC from 1980 living on driving Pis 8-)

  24. Simon Derricutt says:

    EM – I understand the nostalgia bit, and still have some old PC power supplies (and an 8086 system) that haven’t actually failed but just got stored. I’ve also got some of those TO3 voltage regulators, but I suspect I’ll never actually use them now. Modern stuff uses far less power, so doesn’t need cooling as much. As such, using a switch-mode dropper saves battery power in case you’re needing to go to batteries.

    Practical rather than nostalgic…. On the other hand, building the Pis into an Apple II case or some other 80s computer could be nice. Commodore PET? My Z80 box would have been way too large, since that had a couple of 8″ floppy drives in it. Maybe it’s about time I chucked out the CP/M manuals, too….

  25. CoRev says:

    E.M. yup, a PSU with 12+/-, 5+/-, and 3+/- volts out could be useful with all these little boxes we’re collecting.

  26. E.M.Smith says:


    The ones I have are from National Semi ALIC (Advance Linear) group and are pretty efficient. It did depend a little on how you made the rest of the circuitry, but could be done without too much loss / waste of power. Besides, at 5 W needed it isn’t like there’s a lot of waste even at 50% ;-)

  27. E.M.Smith says:


    Seems folks put some money in my paypal donation site ;-)

    I just ordered both the Rock64 and the RockPro64. When they get here I’ll do my own set of benchmarks that are a bit more rigorous (with observation of cores used and explicit observation of the float type…) and post the results. Ought to be here in about 3 days.

    Thanks to all who have funded the “Tip Jar” and make such things possible!

    I’ll be able to do “head to head” comparisons of the Orange Pi One, Odroid C1, Odroid XU4, Rock64, RockPro64, Raspberry Pi M3 and if desired, the M2 as well. I still have a bit of coin left over so could spring for another board if it were a compelling idea ;-) (and under $75… otherwise the balance goes negative…)

    I’ve been pondering the benchmarks in the video and I think there are 4 confounders present.

    1) Different OS type and build. This can make a dramatic difference as it determines if you are using hard float, soft float, SIMD GPU based float (NEON) and in 32 bit or 64 bit word size for 64 bit machines.

    2) The time to boot the OS depends A Lot on what all software you are loading at boot time and that is NOT the same between builds. Then SystemD backgrounds some processes for parallel execution so when you get the login prompt it may not be done with all the booting tasks. Just enough to present a login prompt. It’s a lousy benchmark and it is variable with config choices. It can also depend more on “disk speed” than the board / SBC / CPU speed.

    3) The float math dependent benchmark IN GENERAL was in keeping with expected relative performance of the boards, with the exception of the Rock64. It is my belief that it was using SIMD (Single Instruction Multiple Data or vector processing or NEON or GPU math – all saying the same basic thing, that a lot of math was done in parallel on the GPU). This implies that it was the software build that had SIMD enabled. I want to prove that, and see if I can extend it to the other boards.

    4) No reporting of heatsink type, status, or actual temperature during the test. No reporting of CPU frequency throttle action if any.

    Then there’s the simple fact that it would be a Very Good Thing to do specific benchmarks on One Core with Integer math, then Float math. Then multi-core. Now all that can take some time, but long term readers here will remember I did just such a comparison back on the Orange Pi One vs Raspberry Pi M3 when I first got it: and discovered that HEAT management was the biggest impact on performance. (And that the Pi M3 didn’t set the CPU freq to the spec but was set low to keep heat down.)

    So I’ve got a pretty good benchmarking project to carry me through Christmas ;-)

  28. E.M.Smith says:

    FWIW, looking at all the “Usual Suspects” operating system choices that exist for the various boards I’ve got (and am getting) with an eye to being able to know how it was built with what compiler flags and all:

    I’m not doing a full system install / build of Gentoo on the XU4. I’ve reached the step that says “Update The System” (built in a chroot partition on top of an Armbian/Devuan system) and it says “note this will take a while(several hours, gcc takes about half the time, be sure you have makeopts=”-j3″ as bigger values tend to saturate memory),”

    It IS taking several hours, but memory use does not seem to be that high. Htop does show 3 cores saturated with compiles and one core “busy” on and off with housekeeping, then the 4 lite cores doing occasional misc stuff. IMHO it ought to work with -j4 and then the 4 big cores could crank on the compiles.

    A finger to the heat sink says it’s hot, but not too hot (i.e. not heat limiting); but this sucker is working hard! In the future it would make sense to explore their distributed compile model and use the 16 cores sitting in my cluster stack.

    I’m planning on using Genoo for the cross systems benchmarks just since I’ll know that each system was specifically built on THAT hardware using all the features of that hardware (and I’ll know what those are).

    I’m also toying with the idea of swapping to Gentoo. It has a non-systemD option / default. It is a little bit of a Technical PITA as you MUST build it from source. We’ll see how the experience goes of building it for all these various boards and if it turns into a nightmare or not ;-)

  29. E.M.Smith says:

    Well, built the base userland. Now doing the kernel build (it involves a source download from Hardkernel for the XU4).

  30. E.M.Smith says:

    OK, well the guide I’m following
    uses genkernel in gentoo and an attempt to do
    emerge genkernel
    failed so either I need to find how to do this long hand or how to get genkernel to actually install.

    In either case I’m kind of stuck (off script) at the moment and it’s time to go cook dinner (or at least get statted figuring out what I’m going to make) so a good time to take a break.

    I’ll likely pick this up way late at night, or tomorrow. It will depend a bit on what happens as I browse “how to build a kernel under gentoo” pages ;–0

  31. E.M.Smith says:

    Having done a bit of R&D on it, I found I’ve laid out my SD card in an unacceptable way. That is, it will not be bootable. I put a 2 GB “swap partition” in the second partition and the way Odroid does things it puts all the Uboot stuff up front (in a non-partition non-readable area) then in partition 1 mmcblk1p1 their web site claims you will find a Fat partition with the /boot stuff, then in mmcblk1p2 an ext4 partition with the rest of the OS.

    I’m thinking that the boot process will not be happy when it looks in mmcblk1p2 and finds Linux Swap.

    So I need to tar off my Gentoo “userland” and then reflash my SD card with the boot blocks and without the swap partition.

    One further complication: gparted reports my present operating SD card as having only one partition with everything in an EXT4 file system… I did find in the Odroid docs where it said you could use ext4 for partition 1 but with the caveat that then you could not edit boot.ini on Windows (a “feature” I really do not need) so it may be the current OS does just that.

    I also found some very nice docs on building a new kernel at the Hardkernal / Odroid site. So I think what I’m going to do is use their directions for the kernel build (but using the Gentoo chroot environment and tools so all the libraries used are Gentoo – it ought not matter but…) then copy all the kernel and boot bits off, tar off the userland, reflash / reformat the uSD card, and put it all back again. Then try to boot. IF that works (once it works?) then I’ll run through it all again “straight” and using my notes to make a scripted process.

    Lots of folks in SBC land seem to like cross builds from Intel PC land to SBCs. I’m FINE with a 25 minute kernel build (what Hardkernel says it will take on the XU4) and really like the idea of a self hosted build on the XU4. They do have a recipe for building the kernel on the XU4 directly, but it is for a generic Odroid Ubuntu set-up, not building Gentoo on it in a chroot; so “some assembly required” ;-)

    In summary: I now “have clue” on what to do to do it right. Typical: Do it once not quite right and with ‘recovery steps’ along the way; THEN write down how to do it straight through right the next time; THEN test your notes on a third build. This is sort of the typical “Rapid Prototyping” (Skunkworks) process we used on just about everything at Apple. You could sit in a room arguing with a committee for weeks (months?), or you could spend many days to weeks planning it all out and getting approval (only to find out in mid-project you didn’t know enough to plan it exactly right so need to revamp and get re-approval), or you could just jump in and “figure it out as you go along” and be done in one or two days. We’d choose the 1 or 2 days ;-) “Sure I don’t know how to do it and I’ve never done it before, but by tomorrow I’ll be the local expert.”

    FWIW, the “oddest” bit is learning Uboot. I know “it’s good for me”, but… There’s this bunch of stuff in what looks like “unused space” at the front of the uSD card (or eMMC) that is actually critical to the booting process. A couple of binary blobs from Samsung for the chip, the Uboot code, some other bits. All you can really do is brute force copy those bits. Somebody somewhere has someway to edit them, but it isn’t for the rest of us to know… So OK, for now I’ll just ‘dd’ them over… but for ultimate security it would be better to have a disassembled version of the binary blob looked over by folks with clue. It is almost certainly just some firmware and / or firmware patches to load the rest of the OS bits; but it is a place to put unwanted things by the Bad Guys too. Very unlikely, though. But for all the people who complain about the R.Pi having a binary blob that runs in the GPU as boot loader: I wonder if they realize others do that too, but where they don’t see it?

    OK, that’s my morning check in. Now back to work ;-)

  32. E.M.Smith says:

    Aaaannnd it looks like we have an answer to why Sysbench ran so fast on the Rock64:

    CPU performance with H5 compared to H3 is slightly higher at the same clockspeed but some workloads that benefit from either 64-bit or ARMv8 instruction set are significantly faster (eg. software making use of NEON instructions might perform almost twice as fast and the best example is the stupid ‘sysbench’ CPU pseudo benchmark which shows over 10 times better scores on the same hardware when compiled with ARMv8 settings).

    So your ARMv7 chips (32 bit) won’t do that. Then I’d guess that the Debian on the Rock64 was compiled with NEON on, and the Ubuntu on the RockPro64 was compiled with FPU support or more likely was an ARMv7 version and since it “just runs” on Armv8 nobody bothered to recompile it for 64 Bit (rather like my running v7 on all my stuff so it’s compatible in a cluster).

  33. p.g.sharrow says:

    the ChiefIO says; “Sure I don’t know how to do it and I’ve never done it before, but by tomorrow I’ll be the local expert.” Lol the story of my life! The way to learn to fly, FAST! Just jump off the cliff of ignorance. and learn on the way down, ;-) …pg

  34. E.M.Smith says:

    Well, that was interesting…

    I’ve built a kernel. No idea if it’s right or not… ;-)

    I’ve now got to try the “assemble a system” step. Mating boot loader, kernel, and userland into a uSD card. Then I’ll find out if it works or not.

    I checked some of the default configs and it included NEON so one hopes this means it will be doing vector math in the GPU…

    If it boots at all ;-)

    I did ‘make -j8’ for 8 jobs and got to watch about 25 minutes of all cores pegged. CPU temp rose form 47 C to 87-88 C. The ‘armbianmonitor -m’ command shows CPU frequencies too. Most of the time the BIG (A57) processors were 1200 MHz but sometimes they hit 1700 MHz, once touching 1900 MHz. The A7 ‘little’ processors were mostly at 1500 MHz but did drop to 600 MHz when system load went low or all 8 could not be run at once.

    My take on this is that at absolutely 100% load, the passive heat sink isn’t quite enough and it frequency limits a few hundred MHz on the Big cores. (This was also found by others – that the XU4Q without a fan is 96% or so the performance of the one with a fan due to heat limiting).

    It drops to 600 MHz / 600 MHz at nearly no load when it goes to all idling.

    IMHO, this is a pretty good balance. IF I ever really want to leave it pegged at 100% for very long times, I can just point a desk fan at it. The rest of the time it’s silent. I like silent.

    I’m also attracted to the easy way the ‘menuconfig’ command lets me set al sorts of things in the kernel build. It’s a ncurses menu system (block of blue screens, check boxes in ( ) thing). I can see the potential of building custom kernels. Like systems where I know I’m going to run them on WiFi, build that into the kernel instead of into a kernel module. No more module not loaded problems ;-) The the ability to change the foating point choices – nice way to do the same benchmark but against different parts of the hardware (soft float, FPU, NEON GPU…) Only problem is there’s about 200 things I can set and I understand about a dozen of them… but I’m sure some web search time and time in the docs will bring enlightenment. Interesting to note several involve encryption of various things, like disks.

    Reaching the point of a userland build and a kernel build in a chroot is enough accomplishment for 24 to 36 hours to feel like I’ve accomplished something. :-}

    Well, back to “some assembly required” ;-)

  35. E.M.Smith says:

    Interesting, on bursts of work, the clock jumps up to 2 GHz on the Big cores. Hangs around 1.8 GHz to 1.9 Ghz. Then when jobs are longer, slowly works down to 1500 MHz and finally 1200 MHz. Seems to take it about 30 seconds to start showing significant warm up, then it takes a good while to reach heat limiting. A minute or two? That’s a lifetime of code execution at 1.8 GHz…

    Even with the big ‘ol heat sink, it would clearly like more cooling under any medium to large loads. Many cores over 4+ minutes.

    For typical workloads of a desktop, it’s got the way-fast kicker. It rarely loads up enough cores to get even close to heat limiting. Only with jobs that load up 4 to 8 cores at 100% does it start to notice heat at all. Interesting.

    FWIW, hit a small snag on making the initial ram disk. I’ve got Odroid Ubuntu directions and I’m running an Armbian / Devuan world – inside a Gentoo chroot… So I’m going to shut down and do some R&D into how Gentoo does it, and maybe figure out what I need to do next.

    In theory, I’ve got modules built and installed, kernel built and installed, and just need to do the ramdisk build. Then tidy up the uSD card layout… (dump stuff, reformat, reload stuff).

    For now I’m getting a bit fuzzy (woke up at about 5 AM for no good reason ) so I think it’s time to veg with the TV, a beverage, and a laptop learning how to make initrd in several different releases ;-)

  36. E.M.Smith says:

    This concludes the Rapid Prototyping phase of a Gentoo bringup.

    Last night I’d got everything done, I thought, and tried a boot. It failed on “could not find root device”. Being sleepy, I just left it for this morning. This morning I read up more on Uboot, and discovered that I’d not changed the UUID setting in /boot/boot.ini file. This is the parameter that says what disk to look for by its Universal ID value. A quick “file -s /dev/sda1” returned the needed value, an edit of boot.ini and I was good to go.

    On a boot attempt, I was again bitten by the “2 incompatible ext4 file system types” as I’d made the file system on the R.Pi (Devuan 2.0) but the Armbian Uboot land didn’t understand that kind of ext4 so could not do an ‘fsck’ file systems check. I gave it the root password to “do maintenance” then just looked around. It was, in fact, my Gentoo.

    So, at this point, I’ve built from sources a Gentoo “Userland” and kernel. I’ve installed it onto a uSD card along with the Uboot / Trustcode / Initramfs etc from the Armbian chip. I’ve used the Armbian Uboot code (and ramdisk) to boot it. (Note to self: Someday do that whole “build Uboot from sources thing… but not needed right now). It booted and ran.

    OK, now I can go back and “do it all over again” following my notes and polishing them up, then post the “How To” (for your reference … and for mine in 3 years when I’ve forgotten how I did it ;-)

    First thing I’m going to try, just for grins, is an ext3 root file system. I’m just really not interested in using ext4 what with the incompatible versions and now the newest kernel having sporadic corruption on some specific hardware types TBD, though rare.

    Once I have “done it again following the script”, then I’m set to put Gentoo with a known set of build flags on all boards I have so that I can know exactly what each is doing in terms of float type math. Then I can also optimize each of them for max performance as desired.

    Then I’ll have a more useful set of benchmark information to publish.

    FWIW, the “thumbnail” how to:

    The first 4 MB of the SD card is the Master Boot Record and the Partition Table and Uboot and some Trustzone code from Samsung and a couple of other bits. Some of it, the Uboot parts, can be compiled from sources. Other bits, like the Partition Table (set by gparted or other disk formatter) and the binary blob from Samsung “just are”. I chose to copy the known working set from my Armbian/Devuan desktop chip with
    dd bs=10M if={Armbian chip carriers like /dev/sda} of={destination chip like /dev/sdb} count=4

    This works best if the two chips are the same size… or the From is smaller than the Too. Mine were not. I copied a 32 MB chip onto a 16 MB chip. This caused some grief in gparted where it complained I had a file system longer than the 16 GB chip… I just deleted the ext file system and it all reverted to “unformatted” (or so gparted things… that first 4 GB still matters…) Then made an ext4 partition with 4GB “unused” preceding it (to preserve all that boot code). At that point I could “tar” the Gentoo Userland into that ext partition…

    What I’d missed was that the format of the ext partition changed the UUID so I needed to update that in /boot/boot.ini in the ext partition. OK, no big…

    FWIW, this is not an ideal approach. The Armbian build / Uboot is using somewhat different versions of device drivers and other codes in the ramdisk at boot (vis the fsck fail). It is more of a “useful hack” to avoid learning “How To Build and Configure Uboot From Scratch”. I spent a good 8 to 10 hours on that and it’s a very complicated bit of kit. Essentially a mini-OS with loads of parameters and even it’s own file system (that initial ram disk…). It has more options than I care to thing about. On first boot attempt, when if failed fo rind /boot, I was dumped into its shell and commands like ls and cat and such are there. A sort of a micro-linux world but where all the parameters are different and all your commands slightly off or missing. Thus my decision to go with the hack for now.

    Now that I have a running Gentoo, I can follow their guide for how to make their boot environment as they like it. I’m no longer working from a chroot inside an Armbian with all that indirection and minor complexity to unravel.

    Just a reminder: Why Gentoo?

    It is a source build, so I will KNOW how the kernel and packages were built, with what flags set, what optimizations, on what hardware.

    It runs on EVERYTHING. Not all OSs were available on all the different board types. Gentoo is.

    It has a non-Systemd version by default and you can force it to stay that way with option setting. I don’t need systemD second guessing me on benchmark machines.

    It is fast. Any benchmarks are likely to be the best you can get. Compare Ubuntu where it is designed to have Eye Candy and is compiled to work in the most places possible as a fat distribution even if that means compromising speed.

    IMHO that’s what bit the demo that started this thread. Most likely, an armhf v7 32 bit build was run on the v8 instruction set (64 bit) RockPro64 while the Debian on the Rock64 was compiled as v8 64 bit code. The v7 Ubuntu would “just work” on all the related 32 bit v7 machines as well as the 64 bit v8 machines; while avoiding any teething problems with things like 64 bit FireFox (that I ran into some time back on the R.Pi). Then, as a result, the Rock64 was using NEON parallel 64bit GPU math while the RockPro64 was using 32 bit FPU math at best.

    OK, enough of that. I’m going to tidy up my notes, make a test install onto ext3, and try to wrap this up today. FWIW, I actually copied the Uboot area and the Gentoo Userland to hard disk as file blobs rather than “chip to chip”. This lets me do things with just one chip carrier and it means I don’t have to save them off the chip again… nor recompile Gentoo from scratch if I don’t want to. So that ext3 test will be very fast. The write-up will reflect the use of a hard disk as intermediary storage spot.

  37. Pingback: New Toys! | Musings from the Chiefio

Comments are closed.