Odroid-C2 Performance and Order

One of my interests is maximal system performance per $ and per Watt. This comes directly out of my time running a Cray Supercomputer site. It is a constant calculation you do. By Definition, a Supercomputer is at the limit of available computes. (One common definition in the industry has it limited to the top few percent of performance at any one time, thus the competition to be on the Top 500 List )

So when people talk about making a ‘supercomputer’ out of a stack of a few dozen Raspberry Pi boards they are being extraordinarily economical with the truth… in fact they are making a very very small ‘cluster computer’.

In supercomputing circles (the real ones…) it is very common to worry about heat load and power consumption. They tend to drive ‘what is possible’ at the limit as much as does software design and methods. It isn’t much good to build a supercomputer that melts if you run it full speed or that costs more to build and run than the answers are worth.

https://en.wikipedia.org/wiki/Supercomputer

Energy usage and heat management
See also: Computer cooling and Green 500

A typical supercomputer consumes large amounts of electrical power, almost all of which is converted into heat, requiring cooling. For example, Tianhe-1A consumes 4.04 megawatts (MW) of electricity. The cost to power and cool the system can be significant, e.g. 4 MW at $0.10/kWh is $400 an hour or about $3.5 million per year.
[…]
The energy efficiency of computer systems is generally measured in terms of “FLOPS per watt”. In 2008, IBM’s Roadrunner operated at 3.76 MFLOPS/W. In November 2010, the Blue Gene/Q reached 1,684 MFLOPS/W. In June 2011 the top 2 spots on the Green 500 list were occupied by Blue Gene machines in New York (one achieving 2097 MFLOPS/W) with the DEGIMA cluster in Nagasaki placing third with 1375 MFLOPS/W.

Because copper wires can transfer energy into a supercomputer with much higher power densities than forced air or circulating refrigerants can remove waste heat, the ability of the cooling systems to remove waste heat is a limiting factor. As of 2015, many existing supercomputers have more infrastructure capacity than the actual peak demand of the machine – designers generally conservatively design the power and cooling infrastructure to handle more than the theoretical peak electrical power consumed by the supercomputer. Designs for future supercomputers are power-limited – the thermal design power of the supercomputer as a whole, the amount that the power and cooling infrastructure can handle, is somewhat more than the expected normal power consumption, but less than the theoretical peak power consumption of the electronic hardware.

So that’s why I keep looking at heat and cooling issues. And computes / Watt.

Because it matters. Even in my tiny little miniature ARM based cluster.

Sidebar: Aren’t all Supercomputers made with Intel or large custom chips?

Well, despite the Intel advertising, no. It changes over time. There’s a nice graph in that supercomputer wiki showing the changes.

Top500 Computers processor type over the years.

Top500 Computers processor type over the years.

Whoever wins the $/compute and computes/Watt race, rises to dominance over about 3 years (that’s the lifespan of a supercomputer as it goes from first in class to out of the race for $/compute and computes/Watt…) Interesting to note, Cray is making a supercomputer out of high end ARM chips…

https://www.top500.org/news/cray-to-deliver-arm-powered-supercomputer-to-uk-consortium/

Cray to Deliver ARM-Powered Supercomputer to UK Consortium
Michael Feldman | January 18, 2017 04:00 CET

Cray is going to build what will looks to be the world’s first ARM-based supercomputer. The system, known as “Isambard,” will be the basis of a new UK-based HPC service that will offer the machine as a platform to support scientific research and to evaluate ARM technologies for high performance computing. Installation of Isambard is scheduled to begin in March and be up and running before the end of the year.

Prof Simon McIntosh-Smith, leader of the project and Professor of High Performance Computing at the University of Bristol made a presentation about the upcoming system at the Mont-Blanc ARM event taking place at the Barcelona Supercomputing Centre (BSC) this week. “I think this is really exciting for a number of reasons,” McIntosh-Smith told TOP500 News. “It’s one of, it not the first serious, large(ish)-scale ARMv8 64-bit production machines. And it’s the first time Cray has explicitly announced an ARMv8 product meant for more than just prototyping.”
[…]
Product or not, Isambard looks to be a formidable machine – probably on the order of tens of teraflops. Isambard will include over 10,000 64-bit ARMv8 cores, in addition to a smattering of x86 CPUs, Intel Knights Landing Xeon Phi processors, and NVIDIA P100 GPUs. The project’s rationale for this architectural diversity is to compare application performance across a range of processors on the same machine. From Cray’s perspective, such diversity fits neatly into its vision of a heterogeneous computing future. “Scientists have a growing choice of potential computer architectures to choose from, including new 64-bit ARM CPUs, graphics processors, and manycore CPUs from Intel,” said McIntosh-Smith.

So there are several things to note here. First, you can see why I’m looking at ARM chips instead of Intel. We are near an edge… Second, it’s clear from the last paragraph that “heterogeneous computing” is here and now. Even Cray is mixing architectures in a system box. (Or more accurately, boxes.) Finally, you can also see when your Pi Pile starts to be a real supercomputer scale… at about 10,000 cores or 2,500 boards… so clear out the garage, get a bigger A/C, and remember you will need something with GigE Ethernet, not the Raspberry 100 Mb, and a Giant Switch for those 2500 boards to talk through…

Clearly I’m NOT making a supercomputer out of Pis… Just a cluster with about the performance of a 1980 Supercomputer matching that used when the GISS models were first written…

Oh, and there is hope for the future that any workload for a “porting” effort to the ARM chips for climate models will reduce a lot:

The UK’s Met Office is also a partner in the effort, since they want to evaluate Isambard’s ability to run its own weather and climate simulations. The rationale here is to see if these compute-heavy workloads can be supported on a more energy-efficient platform. These workloads are currently being run on their in-house 8-teraflop (peak) Cray XC40 supercomputer powered by x86-based Intel CPUs, specifically the 18-core Xeon E5-2695 v4 processors.

So it isn’t all that silly to be looking at ARM chips and all when the “models” run on Intel… because it is “Intel for now…”. But clearly my runs won’t finish in minutes or hours like theirs do. Mine will take days, weeks, or months. And be at reduced granularity. (Later I can look into that 2nd “garage” and the order for the other 9,996 Pi boards ;-)

Looking at Nanos

But those considerations still apply, even to my little cluster system. How many $/compute? How many computes/Watt? How much cooling and what interconnect speeds?

So to explore the low end of $/Compute I looked at the Orange Pi One. I really wanted to look at the Orange Pi Zero for $9, but they were sold out… the One is essentially the same system, plus some added I/O bits and a larger memory option, so a reasonable test case. The result of that “look” was to discover that the Orange Pi One is grossly speed limited by a lousy heat removal system (i.e. none… no heat sink, and the tiny little CPU board gets way too hot trying to be one, so the cores get their mHz downgraded to sloth as needed) and that the build quality leaves much to be desired (in particular, the HDMI system is sucky).

So if Cheap / Compute is out, why not just buy another Raspberry Pi Model 3?

Well, that was “the plan”. Or more accurately, the default. I’ve still got one more slot in the Dogbone Case and figured I’d just buy another PiM3 and “move on”. BUT, as is my way, first you do a rapid cross check on the nearest competition to see who has good $/compute and decent computes/Watt and what thermal management is like. This was facilitated by a nice little comparison page at the DietPi page which quickly confirmed that the “tiny board” systems had consistent heat management issues. It also showed the Odroid-C2 was pretty good across the board.

Further looking about found this interesting discussion page:

https://forum.armbian.com/index.php/topic/1580-nanopi-neo-air/

That was when I was thinking maybe a NanoPi would work better than the Orange Pi while still being incredibly small (and before finding the DietPi page with heat data…).

The discussion is long, but very interesting. Particularly where they look at heat limits to performance. See the dinky boards are marketed to the IoT folks for their Internet Of Things (or Idiot Of things, IMHO) use. Those folks do NOT expect continuous operation. They want a short fast spike of performance, then back to idle, and don’t care about heat load that much.

Posted 07 July 2016 – 06:46 PM

@Per-Mattias Nordkvist

I guess the hardware has some similarities with NanoPi M1. Armbian has a working image for my M1. so maybe if you are lucky :)

the board is very small that’s very nice but the huge issue is the crap power design to feed the H3 SoC (the same as M1)

Either you need a very big heatsink OR you downclock the CPU as much as possible to limit overheating.

You get for what you pay :)

edit: currently my M1 with a copper heatsink, idling @ 240MHz has a temperature of 57°C
[…]
Posted 08 July 2016 – 08:41 AM
wildcat_paris, on 08 Jul 2016 – 07:27 AM, said:

so NanoPi M1 is crap and probably NanoPI Neo as well. (“you get for what you paid”)
Using a H3 SoC and using it at 50% because of low cost voltage management is madness, isn’t it?

Sorry, but we already know that it’s not the voltage regulator that makes the real difference regarding overheating (OPi One/Lite use exactly the same as NanoPi M1/NEO). Xunlong seems to use copper layers inside the PCB on Orange Pis to improve heat dissipation away from the SoC. I read some people complain about this since other PCB components get hot too (SD card for example). So based on perspective less heat dissipation through PCB design on the NanoPis can be considered both bug and feature.

Not all use cases require 100% CPU load on all cores, in fact when trying to use a H3 device as an IoT node you might want to limit max cpufreq to something like 600 MHz (when I did some tests with OPi PC then this setting ensured consumption never exceeding 2.5W while still being faster than most dual-core ARM SoCs when workloads are multithreaded)

Ok, we now see that the micro board I[di]oT targeted boards are made for intermittent use and the design expects to thermal limit rapidly, then cool for a long while. OK, scratch the micro-boards and anything without a decent heat sink from the shopping list…

I may yet get one just to play with. They have a heat sink option where the heat sink is as big as the board. It is discussed on page three of that forum (with picture):

Posted 29 July 2016 – 08:41 AM

FriendlyArm released a NanoPI NEO specific heat sink:
[…]

I am the owner of NanoPI NEO using the modified NanoPI M1 Armbian image of this thread.

I am not a expert of SBC design. But this board gets very hot. The sink has the same size like the SBC! Has the SBC a design problem, e.g. power supply? Or the CPU is so powerful and you must fix the heat problem by a approbiate heat sink or by software.
Attached Images

It looks like even with the heat sink, it doesn’t run cool… As I’m not that interested in a “The Fan Is Your Friend” design, my general idea is to move to more metal (larger board and larger heatsink).

Yet on page 4 with some adjusting of ‘fex’ settings, temperature got down to acceptable:

Posted 06 August 2016 – 12:24 AM

this is on the naonpim1 image… I did update the nanopine.fex to the one linked to above.

If I modprobe sunxi_pwm I get a /dev/sunxi_pwm device and some sysfs interfaces but they don’t seem to match up with any documentation I can find.

root@nanopi-neo:~# ls
a.out  test.c
root@nanopi-neo:~# armbianmonitor -m
Stop monitoring using [ctrl]-[c]
Time        CPU    load %cpu %sys %usr %nice %io %irq   CPU
01:21:22: 1008MHz  0.16   0%   0%   0%   0%   0%   0%   52°C
01:21:27: 1152MHz  0.14   0%   0%   0%   0%   0%   0%   52°C
01:21:32:  240MHz  0.13   0%   0%   0%   0%   0%   0%   52°C
01:21:37:  240MHz  0.12   4%   1%   0%   0%   1%   0%   53°C
01:21:42:  240MHz  0.19   1%   1%   0%   0%   0%   0%   55°C
01:21:47: 1008MHz  0.31   1%   1%   0%   0%   0%   0%   53°C
01:21:52:  240MHz  0.28   1%   1%   0%   0%   0%   0%   53°C
01:21:58:  240MHz  0.26   4%   2%   1%   0%   0%   0%   53°C

But that is starting to look like work, and a Raspberry Pi M3 sized board with a heat sink on it isn’t that big on my desktop… ( I don’t need to embed it in a coffee pot…)

FWIW, lower down page 4 is a discussion of the board picking up RF interference if no console is attached and the need to attach a resistor to one pin to fix it…

From page 5 a set of tests with heat sinks. Nice graph in the original… Note that ALL the cases thermally throttle the CPU max performance…

From left to right:

NanoPi NEO/256 w/o heatsink lying flat on a table, SoC/DRAM on the lower PCB side so no airflow possible (~480 MHz average throttling)
NanoPi NEO/256 with FriendlyARM’s own heatsink operated vertically to let convection help somehow (~840 MHz average throttling)
NanoPi NEO/256 with tkaiser’s standard H3 heatsink operated vertically (~690 MHz average throttling)
OPi Lite with tkaiser’s standard H3 heatsink operated vertically (~900 MHz average throttling)
OPi PC with tkaiser’s standard H3 heatsink operated vertically (~980 MHz average throttling)

Using FA’s own heatsink is an improvement compared to cheap heatsinks both regarding heat dissipation as well as stability (FA’s heatsink is mounted perfectly and board + heatsink are ready for heavy vibrations). But as tests with OPi Lite and OPi PC show obviously PCB size and condition matter (copper layers inside PCB and the larger the size the better the heat dissipation — Orange Pi Plus 2E for example performs better under identical conditions most probably due to its larger PCB size)

In case you want to buy NEO (or NanoPi Air later — I still believe they share the form factor and heatsink) you better order FA’s heatsink too if you plan to operate the device under constant high load (which is IMO not being made for!). Regarding my ‘standard H3 heatsink’:

Thus my earlier statement on the Orange Pi thread about not being interested in the dinky boards anymore. It’s a heat load problem. Even with an added heat sink. FINE for very sporadic use. Pointless for “run compile for an hour” or “run model for a week”…

Further down we have:

But please don’t be surprised that performance numbers reported will be lower compared to other H3 devices. NEO uses a single bank DRAM configuration and DRAM clockspeed is way lower than on all other H3 boards. Therefore performance will be lower anyway but using cpuminer’s benchmark mode you might get the idea how different heatsink/cooling solutions ‘perform’. But to be honest: NEO is not made for performance anyway so better use the heatsink as a simple matter of precaution and forget about benchmarking this tiny board at all :)

In case anyone wants to build a HPC cluster with NEOs (weird to say the least ;) ). I prepared an archive some time ago to do reliability testing with Pine64 that contains already a script to collect cpuminer benchmark numbers and feeds them into RPi-Monitor template therefore the efficiency of the cooling approach in question can be measured/compared directly: […] (see the screenshot there to get the idea)

Here we see that even the memory design of the Neo Pi boards is compromised. Bold mine.

OK, it goes on like that for many pages. The key take-away? The small (dinky) IoT targeted boards have significant thermal issues AND are designed from the ground up for intermittent use at limited performance. So “Get a bigger board” unless you have that use case.

Want a ‘distcc’ cluster that has 40 nodes and finishes a kernel build in 5 minutes so doesn’t have enough time to get hot, then sits idle for 1/2 a day? Sure, it’s fine. Want to run a model for a week? Uhhh…

So that’s what started me looking at the bigger boards, and that Odroid family in particular. That they already have large heat sinks tells me they saw this already and built for it.

The Odroid-C2

I had a different benchmark page saved, but it seems to have gone walkies… so here’s this one:

http://www.cnx-software.com/2016/03/01/raspberry-pi-3-odroid-c2-and-pine-a64-development-boards-comparison/

I has less of the direct Odroid-C2 comments than I’d wanted… This caught my eye, though:

Boards are likely to show similar performance in synthetic benchmark, except ODROID-C2 which should show a significant lead. However, I could not find benchmark for Pine A64 right now, and as we’ve seen this morning, Aarch64 improves performance significantly over Aarch32, so current benchmarks are likely to become invalid if/once Raspberry Pi 3 gets a 64-bit port. For example, Pine A64 is currently 15 times faster in sysbench CPU benchmark (prime numner computation) compared to Raspberry Pi 3, and it’s clearly not showing the true performance difference.

I’m presently running armhf on the Pi Model 3 (not arm64 ) so that it is binary compatible with the Pi M2 cluster members (and because the Devuan arm64 had a few more bugs on first test of it… but over time that will resolve). Especially during model runs, that use significant DOUBLE or 64 bit math, that difference will start to matter a lot…

The bottom line is that it sure looks like it is a better bang for the buck, and runs fast and well.

Found the other page:

https://www.jeffgeerling.com/blog/2016/review-odroid-c2-compared-raspberry-pi-3-and-orange-pi-plus

Jeff Geerling
Review: ODROID-C2, compared to Raspberry Pi 3 and Orange Pi Plus

March 24, 2016

tl;dr: The ODROID-C2 is a very solid competitor to the Raspberry Pi model 3 B, and is anywhere from 2-10x faster than the Pi 3, depending on the operation. The software and community support is nowhere near what you get with the Raspberry Pi, but it’s the best I’ve seen of all the Raspberry Pi clones I’ve tried.

[…]
Another primary competitor in the space is the ODROID, from Hardkernel. The original ODROID-C1 was already a decent platform, with a few more features and more RAM than the comparable Pi at the time. The ODROID-C2 was just announced in February, and for $39 (only $5 over the Pi 3 price tag) offers a few great features over the Pi 3 like:

2GHz quad core Cortex A53 processor (Pi 3 is clocked at 1.2 GHz)
Mali-450 GPU (Pi 3 has a VideoCore IV 3D GPU)
2 GB RAM (Pi 3 has 1 GB)
Gigabit Ethernet (Pi 3 has 10/100)
4K video support (Pi 3 supports HD… drivers/support are usually better for Pi though)
eMMC slot (Pi 3 doesn’t offer this option)
UHS-1 clocked microSD card slot (Pi 3 requires overclock to get this speed)
Official images for Ubuntu 16.04 MATE and Android (Pi 3 uses Raspbian, a Debian fork)

The Pi 3 added built-in Bluetooth and WiFi, which, depending on your use case, might make the price of the Pi 3 even more appealing solely based on a feature comparison.

For a desktop the WiFi and Bluetooth might matter, in a cluster as a node, not so much…

I’m not sure where he got the 2 GHz from for CPU Clock, as what I’m seeing sold is 1.5 GHz. Perhaps an overclock? Early model over ambition?

One of the first major differences between the Pi 2/3 and the C2 is the massive heat sink that’s included with the ODROID-C2. Based on my observations with CPU temperatures on the Pi 3, the heat sink is a necessity to keep the processor cool at its fairly high 2 GHz clock. The board itself feels solid, and it feels like it was designed, assembled, and soldered slightly better than the larger Orange Pi Plus, on par with the Pi 3.

One extremely thoughtful feature is the ODROID-C2 board layout mimics the Pi B+/2/3 almost exactly; the largest components (e.g. LAN, USB, HDMI, OTG, GPIO, and even the screw holes for mounting!) are identically placed—meaning I can swap in an ODROID-C2 in most situations where I already have the proper mounts/cases for a Pi.

Here we see the heat issue addressed, and then the ‘drop in replacement’ feature physical design. For software issues it looks like you can pick an Ubuntu MATE desktop, or a more generic Debian (which implies an easy Devuan path…)

The official Ubuntu MATE environment is nice, but for better efficiency, I used the ODROBIAN Debian Jessie ARM64 image for ODROID-C2 instead. The download is only 89 MB compressed, and the expanded image is ~500 MB, making for an almost-instantaneous dd operation. There are some other images available via the community as well, but ODROIBIAN seems to be the most reliable—plus it already has excellent documentation!

There follows a lot of specific benchmarks and power / compute measurements. Then the “bottom line” paragraphs:

The ODROID-C2 is a very solid competitor to the Raspberry Pi model 3 B, and is anywhere from 2-10x faster than the Pi 3, depending on the operation. It’s network performance, CPU performance, and 2 GB of RAM are extremely strong selling points if you thirst for better throughput and smoother UIs when using it for desktop computing. The Mali GPU cores are fast enough to do almost anything you can do on the Raspberry Pi, and the (smaller, but dedicated) community surrounding the ODROID-C2 is quick to help if you run into issues.

The ability to easily install Android or Ubuntu MATE (or one of the community distros, like ODROBIAN) is a benefit, and instructions are readily available (more so than other clones).

So the bottom line is I bought one from their North American vendor, Ameridroid where it cost me $42 plus some small shipping (something like $4). I added the eMMC option for $21 as that “other” evaluation said it made a big difference. This will also let me do a “how big is big” comparison of the running the OS from eMMC vs SD card vs USB disk.

All up the package was $70 something. This will likely become my new Desktop Daily Driver if it performs as advertized. The 1.5 GHz clock is a significant step up from 1.2 GHz. ( 300 MHz is 25% of 1200 MHz, so a 25% uplift just there), then the double RAM size will eliminate swap entirely for anything I’ve been doing while getting much closer to balance on “Amdahl’s Other Law”. Finally, the GigE Ethernet will be helpful “someday” when the rest of my stuff gets that speed… but for now mostly just says the I/O subsystem can handle that speed so will be less limiting. Only real disappointment is the USB 2.0 instead of 3.0 ( I have 3.0 hubs and disks already…)

So between faster OS from eMMC, double the memory, and a 25% per CPU speed boost, and with a huge heat sink to prevent CPU throttling, it is likely “worth it”. Though with the eMMC at 1/2 the board cost, it will need to prove itself to me…

This board is the same form factor as the Raspberry Pi, so fits in the same case. IFF I need any future capacity expansion as the models start to run, having the equivalent on 5 cores of PiM3 speed per board just from clock rate will be handy. Should the eMMC prove “nice but ‘feh’ for a cluster node”, the price of a stack of Odroid-C2 is about 4 x $42 = $168 plus shipping while the Pi Model 3 is about 4 x ($39+5 heat sink) = $176 from Amazon. So roughly a wash on price…

Since I was going to buy another PiM3 by default for the present stack, simply moving my desktop to the stack and putting an Odroid on the desktop is essentially a wash on costs. The software support reputation for it is also fairly good.

In Conclusion

So that’s what all I went through looking at “other boards” and why I ended up buying an Odroid-C2 as an evaluation unit.

Clearly the micro-boards don’t cut it for a personal cluster that sees any heavy use. Clearly also the $/compute and computes / Watt are better on the Odroid-C2 than the Pi M3. The only real question is just how much trouble to get Debian / Devuan working on it, and just how much faster it really is.

Oh, and I didn’t go for the higher ends Odroids just because I hate fans… but that they use a fan on the next step up says they are looking closely at heat load issues.

In any real final cluster build-out with a couple of dozen boards, and a real dedicated ethernet switch connecting them, that GigE will matter a lot too. As will the higher clock rate and the double memory. Memory is used to cache disk pages so it can make up for a limited interconnect speed or slower disks, to some extent.

It ought to arrive in about a week. “Watch this space” for an update when I fire it up and see what surprises might be hiding in it…

Subscribe to feed

Advertisements

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , , , , , . Bookmark the permalink.

17 Responses to Odroid-C2 Performance and Order

  1. Dan_Kurt says:

    Slightly off topic. When will Apple abandon Intel chips for CPUs on the Macintosh and go to ARM chips or clusters using their proprietary ARM chips which keep getting more powerful every year while Intel chips are improving at a slower rate?

    Dan Kurt

  2. E.M.Smith says:

    @Dan Kurt:

    They are already using ARM in the phones and iPads:
    https://en.m.wikipedia.org/wiki/Apple_mobile_application_processors
    so it would be whenever the two performance curves cross or the lower power needs of ARM make the battery life longer by enough to matter.

    I’d guess a low end Mac maybe in a few years, but only the Apple Developers will know (and maybe Intel devs as they know what limits they are hitting and the devo roadmap.)

    IIRC ARM holdings sold out to a Japanese company, so who knows how their devo will go forward. There’s a convergence happening with ARM becoming ever more CISC and less RISC while everyone goes to 64 bit exhausting the word length gain, and optical limits are limiting feature size shrinking… so multicore is the open path…

    My guess would be we hit the thermal and feature size wall in the next 5 years or so with Intel closer than ARM. Also there are few tricks left proprietary anymore as patents on the big bits are expired or expiring. I’ll be very surprised if 10 years from now any monocore systems are made and any systems without CISC, pipelines, GPU computing, and similar are sold. Essentially, it all goes commodity between now and about 2025.

    All that make my best guess being about 5 years out there will be no real difference. Design would begin about two years before that, so design in about 2019. As a guess.

  3. Ian Macmillan says:

    How about putting the whole thing in an oil bath, with a circulating pump?

  4. spetzer86 says:

    Just curious, but I notice that the possibilities of a liquid cooling system are not even discussed. I assume that’s mainly a cost concern, but was wondering if you also perceived issues regarding reliability with the liquid systems?

  5. Soronel Haetir says:

    Re. Apple moving away from Intel for standalone computers: there is also the compatibility issue to consider. Apple has come under a great deal of criticism in the past for incompatible CPU generations. Instruction emulation only takes you so far.

    In that vein I believe that a different CPU architecture would have to be seen as clearly superior (and likely to remain that way for some time), and not just a momentary winner in a performance vs. price comparison before standalone computers are likely to switch again.

  6. E.M.Smith says:

    @spetzer86 & Ian:

    You can get very nice liquid filled heat pipe coolers for high end CPUs. They work very well. Didn’t mention them since they are not needed at this scale of heat load. Mostly used on the very high end multi-core damn fast Intel chips and by overclocking gamers.

    Being a nice metal pipe with decent machining, lined with a special absorbent material, and filled with an inert coolant, they are “not cheap”. So would likely cost more than the board… Looks like Amazon has them for about $25 to 35 …

    https://www.amazon.com/Cooler-Master-Hyper-Plus-RR-B10-212P-G1/dp/B002G1YPH0

    The Cray-2 was affectionately called “Bubbles” because the heat problem was solved by immersing the entire board set in a liquid bath of Fluorinert, a variety of perflurocarbons… tended to have a lot of “tiny bubbles” in the fluid flow ;-) Fun to watch it run…

    So no, there’s no problems with liquid cooling. Heck our old Cray XM-P had a 4 inch water line connected to a water tower… that water ran through aluminum channels inside the CPU box and the boards had thick copper cores to conduct the heat to that water channel. 750 kVA in the power feed, inches from a bunch of water channels in grounded metal… not even fluorinert…

    But when you are talking pennies each for ARM cores fabbed onto a chip, or $100 for a very high end multicore Intel (and dropping…), adding another $30 to $50 for fancy cooling isn’t as effective as a $2 chunk of aluminum and doubling the cores…

    For the IoT guys, they are looking at bolting the cards to the case (with heat sink paste between – a kind of gooey liquid cooling…) as an essentially free heat sink.

    So yes, you basically got it right on the first swing. Cost issues. All the reliability stuff was worked out in the ’80s or earlier. Want to clock your 1.2 GHz R.Pi at 2 GHz? No problem, just add a $35 cooling system…. power in and frequency of clock determine compute speed AND heat load. Can’t get more of one without doing something about the other…

  7. E.M.Smith says:

    @Soronel:

    While what you say is quite true, things are different now than during the 68000 to PowerPC to Intel transitions.

    During the 68000 era, most of the key bits of the Mac OS were written in assembler (highly hand optimized too…) and conversion was very difficult. This was necessary to get the performance needed out of that chipset.

    During the PowerPC era, many of those 68000 bits were instruction emulated but persisted as they eventually were being re-written. Eventually they were left behind.

    Now, with the MacOS being essentially a Mach kernel Unix machine, and the base routines written in portable languages, it is much easier to just ‘recompile and go’. There will still be porting issues, but nothing like the 68000 transition. Also the chips are so fast now that there is no reason for hand crafted assembler (nor, IMHO, do they even bother all that much with optimizing C …)

    One story to illustrate:

    When I was at Apple, we ran what were essentially “internal trade fairs” for the Engineers. We would invite all sorts of vendors to show their workstations, hardware, interesting gear. In one case, we invited IBM (back when IBM was seen as the Great PC Satan by some) to show their RS6000? workstation.

    They asked us how to present themselves, being a bit unsure about us and our ‘culture’ – we advised to ditch the suits and show up in hacker friendly clothing. Despite some questioning looks at each other, they took our advice. The result was an IBM Booth with folks looking much more like Apple Folks and very dressed down compared to the HP and Dec vendors…

    Well, one of our engineers started talking to one of THEIR engineers about their chipset and what was in it. He sat at the terminal and took a test drive…asked some tech questions… “Hang on a minute, I’ll be right back” and he ran off… Back in about 30 minutes with an 8mm tape (IIRC) they loaded it up. Hackity hack hack hackity hack… compile… In about 30 to 45 minutes they were running the full Macintosh look and feel interface on that RS6000 workstation…. Right there in front of an increasing crowd of onlookers. (Mostly Apple folks, but a vendor or two also…)

    Well, the ‘bottom line’ was that a group formed to explore things. This eventually led to the AIM Alliance and a second effort in software that was Taligent Company; as AIM the consortium and the work product from AIM was the Power PC chip family. FWIW, my group in Apple was developing an advanced chipset as well, and many of our discoveries about useful instructions were incorporated into that PowerPC development.

    The point?

    Even then, the Mac OS was highly portable as the instruction emulator was built and running. Yet exactly what level of devo was being run by that engineer is unknown. It might well have been the early versions of the Mach port. But whatever it was, it “ported” with trivial changes and about 30 minutes of work and a compile to a completely different chipset. (Though one with all the needed instructions and a familiar overall design).

    Once in a high level language, IF you write good clean code, most of the porting work is done by the compiler tool chain… And since *Nix runs on just about every architecture out there, that work is already done and largely reduces to finding the right compiler flag settings and maybe stamping out a few obscure bugs you didn’t know you had written…

    So has Apple got a lot of Intel dependencies? Are their things like “timing loops” tied to particulars of the chips? Optimizing that won’t work on a ARM chip? Given that they already have ARM phones and tablets, I doubt it. The Engineers at Apple just love to do ‘different’ and were I still there, I’d have already tried a port of the Mac OS to an Arm Tablet… and maybe even my phone… Just for “Giggles ‘N Grins”… Given that the “look and feel” strives to be identical, I’d not be surprised to find out they are the same OS with different flags set.

  8. Soronel Haetir says:

    I am not talking about OS support, that is the easy part as I see things. Instead I am talking about support for 3rd party programs, potentially from vendors that aren’t even in business now and thus are in no position to release an updated build.

    Heck, MS took heat in the move to win64 for finally dumping support for 16-bit code (and that, as I understand things, was dictated as much by what the x64 can do in 64-bit mode as by an actual desire to no longer support 16-bit legacy programs).

  9. Paul Hanlon says:

    There is another board, called the Parallella, that is in and around the Cubietruck price range (about $99). The blurb says it has the highest computes per watt | $ around, although the architecture looks a bit esoteric. It uses an ARM v9 dual core tied to a 16 core RISC chip. You’ve probably seen it already, but here’s the link anyway

  10. E.M.Smith says:

    @Paul:

    yeah, I’ve lusted after one from time ti time. It is sort of halfway between a cluster of general purpose computers and GPU computing. The added ARM cores have an interesting network (crossbar?) connecting them and connecting to memory. Loading a batch of small calculation chunks to each core would work well. But having each core ripping through different chunks of data would be problematic in some ways.

    In the end, I figured I could get 16 cores as 4 x cheap boards for about $40 and have more memory / core in the process… It’s that whole Amdahl’s Other Law thing…

    Basically, I could not come up with a use case that made sense for my mix of needs, while a small cluster of cheap boards is a clear set of uses (since all cores can work on all problems and have direct memory and networking… MIMD Multiple Instruction Multiple Data…) Then the plunge of price of the Cheap Boards down to $5 to $10 each drove it home. (Thus my playing with the Orange-Pi One… I’ve got it running headless right now as NFS server and web scraper and it’s doing fine at that job). It was a proxy for the O.Pi Zero or NanoPi Neo? that I was thinking would make an interesting MIMD cluster at about $40 vs the SIMD (some MIMD possible I think…) nature of the Parallela. Then add in the “joy” of dealing with an entirely alien programming model and thin software support on the Parallela and I just kind of didn’t want to spend the money for a time sink…

    It would still be fun to play with, but the reality is that 99+% of the time I’d be running a desktop on the ARM A9 and nothing on the Epiphany Chip 16 cores. (I got the impression the 16 core chip was just a prototype on the way to 64 – that actually has some clear uses as SIMD strides or SISD vector equivalent- but when people stayed away from buying they dropped the 64 that had more validity…) Just because the amount of time it would take to program any codes to use it effectively would be, er, ‘an issue’… Where on the Pi Stack I can just launch a typical Linux command or a dozen of them, or run distcc or MPI based codes unchanged.

    Oh, and early reports found you needed to have a fan glued on if you used all the cores… Gee, where did I hear about heat management being important ;-) and I hate fans…

    So yeah, I’d love to have one just to play with; but that is all it would be to me. A toy for amusement and exploration.

    For “real work”, you get more total computes out of a standard Intel chip desktop machine. Not as fun, and certainly not very interesting; but for folks looking to ‘get it done’, that is the better model for most problems that would be put on a Parallela. For my Pi Stack, it has the feature that it can grow without limit (other than $ ;-) if needed. The Parallela model is fixed at 18 cores / board, and then you are back into a programming / work distribution between boards problem… so doing MPI or similar anyway but now complicated in that each work unit has to decompose into 18+ threads once MPI shared… somehow…

    One of the frustrations of a Tech Life: The world is full of very interesting computing platforms that don’t have a justification other than play… (Anyone else remember the Very Long Instruction Word machines?…) or in fancy circles “R&D” ;-)

  11. E.M.Smith says:

    Well… that was fun… sort of…

    The Odroid-C2 showed up today. A couple of quick comments… (This isn’t a full report, just a ‘got it to work’ bubble moment).

    The Good:

    I’m posting this from Chromium on the Odroid-C2.

    It is very very fast with the eMMC (no idea on with SD instead, yet)

    It handles multi-core very well (I’ve got chromium spread over 4 of them near as I can tell)

    The giant heat sink gets warm to the touch, but not OMG Hot, so a ‘right size’ I think. (More when I start doing stress testing with temp readings).

    It was a WHOLE LOT easier to get going that the Orange Pi One was.

    I have working HDMI screen with only about 1 hour of “futzing about” to get it “pretty good”.

    It came with a nice set of instructions inside (not complete, but enough to point at more).

    The OS was pre-installed on the eMMC chip. Zero OS work to get it going. Just snap the tiny little chip into the bottom of the board.

    It can be powered from a ‘barrel’ (or round) plug, or via micri-USB. While the instructions said I’d need to ‘jumper’ the pins next to the micro-USB to use it for power, it was ‘pre-jumpered’ with one of those little plastic block things like used on hard disks to select options.

    The not so good:

    An attempt to launch the installed Firefox failed.

    On first boot, the screen came up using a very skinny vertical slice of about 1/2 the screen width.

    While it was usable ‘enough’ to navigate around, it was PITA and required squinting.

    The “change your monitor settings” tool under System: Control Center : Displays: shows your current setting, but doesn’t let you change anything.

    Editing /media/boot/boot.ini got it usable (though I still need to tune some things). I’m on a ViewSonic DVI that has a DVI to HDMI conversion cable. Likely the source of my “HDMI” issues with the Orange Pi as well. Those with real HDMI monitors are likely not going to have those issues. The current settings that work “OK” are:

    setenv m "1280x800p60hz"
    setenv m_bpp "32"
    setenv vout "dvi"
    

    This is ONLY for my old monitor, your monitor will be different. Also, my monitor does 1680 x 1050 on the Raspberry Pi, but setting that here gives me red line segments as the image ‘tears’ sort of…

    So some combination of things “isn’t quite right” yet. Is it video memory? Is it those settings in boot.ini? Don’t know at this point. Bottom line is that it is working and “OK”.

    Oh, and being Ubuntu MATE you get systemD, and you better like dull LIME Green. (What is it with OS guys and likely Lime Green… Mat&etilde; as a green tea like drink makes sense, but DietPi was lime green too… Oh Well…) But for Ubuntu Mate fans, it’s “good to go”.

    Make sure you have your reading glasses and / or magnifying glass to install the eMMC… or get the person in the house with smallest hands to do it. It has a ‘connector’ about the size of a rice grain. I was fearful of breaking something pushing on it waiting for the ‘click’ sound. Visually lining it up was a challenge but gave me the courage to push harder than I’d expected…

    OK, For Now

    Ok, that’s all for now. I’ll likely run it on Ubuntu for a few days just to get a good feel for it.

    When the site scrape of temp data finishes (running on Orange Pi at the moment), I’ll try these settings in the boot.ini of the Orange Pi and see if that recovers the monitor on it.

    The Odroid-C2 also has a mini-SD slot and can boot from both it and the eMMC, so I’m going to put another OS on SD and do speed comparisons.

    So far the speed enhancement is significant and noticed. Ubuntu is a bit of a fat pig on the Raspberry Pi, but is fast and smooth here. We’ll find out how much of that is the eMMC vs SD ‘later’…

    I need to find a Debian to install to it, and then do the Devian -> Devuan upgrade. IFF that works and gives me a nice experience, this will become my daily driver. (If I’m stuck with Ubuntu… well, we’ll see…)

    Well, that’s enough for now. As it stands, the first bring up was in some ways easier than the Raspberry Pi (no need to download NOOBS, and install and OS). I suspect that had I a real HDMI monitor made in the last half decade even the monitor issues would be gone.

    Oh, and it has Really Nice LEDs on it. There’s a cheery yellow one on the ethernet, a bright Christmas red one on the power, and a delightful rich blue on that blinks if the monitor cable isn’t attached and then occasionally blinks after that. I like the blue one best, but being right next to the red one sometimes it gives the impression of a magenta mix in the peripheral vision. I really like magenta ;-)

  12. E.M.Smith says:

    Looking at the Odroid C2 next to the Pi M3 the aftermarket heatsink on the M3 is about 1/6 the area of the one on the O-C2. Now they ARE different SOC (System On Chip) sets, but both are A8 ARM cores, both have 4 cores, both are 64 Bit. Hmmm…

    I’ll be doing some “computes and temps” testing over this weekend. With the same FORTRAN high compute load on each, what core temp results?

    The efficiency of any given implementation can vary widely. A few more GHz can add a lot more heat, for example, and the core volts vs core freq. settings can give more heat at the same Hz and computes. Does the O-C2 really need all that extra heat sink? Or is it just giving a nice cool core temp? Does the R.Pi M3 overheat when pushed to the wall, even WITH the aftermarket glue on heat sink? Hmmm…. Things to answer before committing to any one of them as the prime compute board… (Though straight physics would argue that the much larger O-C2 heatsink would win even with “other issues” making a bit more heat… 6:1 is a BIG advantage…)

    Oh, one complication: I’m running an A7 (32 bit) instruction set OS build on the M3 at the moment, so for any given float multiply it uses 32 bits of gates while the O-C2 ought to be using 64 bits of gates and moving 2 x as much memory. I need to verify that the Ubuntu is an arm64 build, but I think it is. So I need to find / make an arm64 build OS for the Pi M3 to get comparable bits / instruction; or put an armhf build on the O-C2.

    While it is a great advantage that the A8 cores / instruction sets lets them run A7 instructions and run in 32 bit mode, it complicates the world when looking at comparisons. Any arm64 port of the OS will be young and have more bugs, so the incentive is to just compile and go with new firmware / boot code and an armhf OS build that is known clean and working. (Why the Pi M3 runs armhf instead of arm64)

    Basically the comparison space instead of being 2 ( Pi M3 vs O-C2) becomes 2 squared:
    Pi M3 – A7 (32 bit armhf OS build)
    Pi M3 – A8 (64 bit arm64 build)
    O-C2 – A7
    O-C2 – A8
    which gives 6 comparison sets. ( Pi-M3-A7 vs {Pi-M3-A8, O-C2-A7, O-C2-A8}, Pi-M3-A8 vs {O-C2-A7, O-C2-A8}, O-C2-A7 vs O-C2-A8 )

    Sidebar: That last bit is n!/r!(n-r)! or 4!/2!(2)! or 4x3x2x1 = 24 and 2×1 =2 so 24/(2×2) = 6 for combinations without replacement / repetition. Yes, I actually liked my stats class… sick puppy, I know ;-)

    I doubt very much that I’ll do them all… as the Climate Models want a lot of double math, the A8 64 bit builds are the ultimate goal. So that Pi-M3-A8 vs O-C2-A8 matters most, but is also the least likely to be available (the Devuan arm64 build was a bit dodgy when I last tried it, and I don’t know if the O-C2 is arm64 or not anyway), while the Pi-M3-A7 vs O-C2-A7 is least interesting (but likely the most available). Then the A7 x A8 comparisons don’t really tell you much other than the OS limitations / impact on heat produced.

    Well, it will be an interesting ponder to figure our which comparisons are available and of them which have any real value…

    Oh, and FWIW, I found 2 sets of scripts for building a straight Debian for the O-C2. I’ll likely be using it as my target platform for a “roll my own” OS build. First, because it looks like I’ll need to do that to get Devuan (which is now my standard OS), and second because the Ubuntu build is a bit low on the QA process / young in the build cycle, on the O-C2. Firefox has “failure to launch” and in WordPress dialog boxes (like entering a comment) it has an odd cursor / text ‘jitter’ like the cursor jumping back a space at semi-random times. I speculate it is an artifact of the spell check causing unintended cursor movement – or maybe a Chrome issue in the port. In any case, it’s a distraction and not something I care for. The hope being a straight Debian / Devuan build fixes all that AND gets me a clean arm64 build that uses full 64 bit math. We’ll see how far I get this weekend.

  13. E.M.Smith says:

    Looks like the Odroid-C2 Ubuntu is an arm64 build:

    odroid.com/dokuwiki/doku.php?id=en:c2_building_kernel

    $ cp .config arch/arm64/configs/odroidc2_defconfig
    $ git add arch/arm64/configs/odroidc2_defconfig
    $ git commit -s -m "Change the kernel config file for 
    

    So I’ll put the Devuan 64 bit build on the PiM3 and do that compare. The buggy bits I’ve seen in the arm64 build have been in browsers and graphics, not command line stuff, so likely fine for a backend compute engine use.

    Over time as wider use of arm64 builds sets in, the buggy bits will be rapidly found and fixed.

  14. E.M.Smith says:

    Well this is a pisser….

    Just did the bog-standard:

    apt-get update
    apt-get upgrade
    

    and on reboot it doesn’t. Just a brick.

    So now I get to go through the whole find an OS image on line, download it, put it on an SD card, boot and configure. Why an SD card? Because the eMMC is on the bottom of the card and would require a disassembly of the dogbone stack (think screwdrivers…) and such to get it out where I could install an OS on it.

    Oh Well, that will have to wait a few days while I do other things…

    Had I had my wits more about me I’d have made a backup copy of the original shipped image to SD card prior to any update…

  15. E.M.Smith says:

    Well, on full 100% x 4 cores (or really 400% plus browser and more) for a few minutes, the Odroid not only is letting me type fast and easy (using Ubuntu MATE no less, not exactly a slim efficient OS) but it isn’t getting all that hot, either:

    Hostname: odroid64
    CPU Frequency: 1536Mhz
    TEMP: 67
    

    then after a few minutes

    Hostname: odroid64
    CPU Frequency: 1536Mhz
    TEMP: 71
    

    And a bit later after loading and typing all this:

    Hostname: odroid64
    CPU Frequency: 1536Mhz
    TEMP: 73
    

    So “takes a punch” very very well. I’ll cross post this back where I did the other heat /speed tests too. But for a higher end compute node, the Odroid-C2 has it. (Though I still need to get Devuan onto it… at present it is Ubuntu with SystemD).

  16. E.M.Smith says:

    Well, this is sure a nice thing to find… There’s an OS version of Debian for Odroid called “Odrobian”. It already supports docker. (I ought to note that Docker runs on R.Pi as well):

    http://oph.mdrjr.net/odrobian/doc/docker.html

    What is Docker?

    Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.

    Getting Started

    Docker is already available for you on ODROBIAN through our repository compiled specifically for Debian Jessie, you can install it simply on any edition:

    Desktop (MATE)
    Vanilla (PURE)
    You may want to prefer the Vanilla edition particularly for Docker which is a very minimal Debian environment that will run perfectly for development and servers, otherwsie you can get the Desktop edition if you are after simplicity.

    Installation

    Well, in both cases all you have to do:

    Make sure you are ROOT:
    odroid:$ sudo -s

    Installation:
    odroid:# apt-get update
    odroid:# apt-get install docker-odrobian
    Add User “odroid” to Docker Group

    Now, by default docker will only work with ROOT privelleges, this is not a problem for our Vanilla edition as it’s already supposed to be used with it. However, if you are using the MATE edition (or the latter) through the main regular “odroid” user, just add it to the “docker” group.

    Add “odroid” user to Docker group:
    odroid:# usermod -aG docker odroid

    This way you can perform actions through Docker without using “sudo” command

    Test Docker

    This is how you can confirm that Docker is installed and working as expected:

    odroid:$ docker info
    odroid:$ docker -v

    Both commands should return feasible information.

    Image Containers

    Docker is now supporting ARMv7 image containers, you can easily pull (Debian, Ubuntu, Arch Linux, etc) directly from their repository, you can even run multiple Linux environments with differenct sessions through your ODROID working at the same time.

    Examples

    Let me put you in the right road if you are a new user, here how you can pull the most famous Linux distributions through docker:

    Debian:
    odroid:$ docker pull armhf/debian

    Ubuntu:
    odroid:$ docker pull armhf/ubuntu

    Arch Linux:
    odroid:$ docker pull armv7/armhf-archlinux
    Find More Images

    Are you looking for even more, here’s how you can find the ARMv7 compatible image containers on Docker:

    Search Docker Repository:
    odroid:$ docker search armhf-

    Pull Image:
    odroid:$ docker pull *

    (*) = Name of Image

    Not only does this solve the (small) risk of a library update messing with a canned application, it also lets you run armhf (and other…) binaries on an arm64 built system. Nice. Especially for a heterogeneous computing cluster.

    So, no surprise, I’m going to download Odrobian and see if I can make it even more complicated by doing a Devuan conversion (and then still have it all work….). IFF that ends up working well, I’ll likely pull together some of the upstream bits into a single build process. (Why? Because having a chain of: Debian : Odroid : Odrobiian : Devuan is likely to get bent on some update processes as each in order tries to patch the core Debian to it’s liking)

    This is a bit of a relief as i was not really looking forward to the build process and integrating Debian, with Odroid needed bits, with Devuan. Cutting it down to “just upgrade to Devuan” is a lot less time…

  17. E.M.Smith says:

    Well, after some adventures with disks and downloads on the Orange Pi scraper: I got back to “moving in” to the Odroid. I’ve dropped back to straight Debian (Odroibian-hybrid) from Devuan. I want to get a better feel for what all works in the base case before I charge off into “Army Of One” land running a ‘unique-in-the-world’ distribution / hardware combo. Devuan in this combo had some bits not working like gparted and Firefox that I use a lot.

    So far on Debian 8 “it all just works”. (Like Devuan on the Pi-M3) This being the hybid image is really an armhf 32 bit image with arm64 libraries also installed so arm64 applications can run. That means two important things to me:

    1) It is binary compatible with the Pi-2 and Pi3 boards I’m running in the rest of the cluster as they are all running armhf.

    2) More stuff runs. The armhf 32 bit build has had a lot more work on bug stomping and porting, so pretty much everything runs there. The arm64 builds are younger, and not everything has been ported to it yet.

    So far I’m quite happy with the Odroid-C2 overall. It has some quirks, though. It isn’t as happy with HDMI – DVI adapters, for example. (Most likely all the developers are likely running on straight HDMI and will not be buying an old monitor and adapter for testing…) It didn’t take the Devuan upgrade as gracefully as I’d hoped. (Likely my bad as I just blasted it in ‘to see how it goes’ – and a more thought out upgrade / port would be better. Things like actually deciding before hand what kernel to run, what modules are needed, etc. etc.)

    So the video isn’t as ‘crisp’ as on the Pi-M3 (which may also be that my manual settings were not the best… but then again it made me do manual settings while the Pi-M3 is auto and go…) and I’m running down res at 1440 x 800 (because 1600 x 900 puts a raster of red streaks everywhere and 1920×1200 almost works right but has an occasional flicker/line that was driving me batty). OK, so I need to get better at determining the proper video modes to use…

    It is enough faster to notice. Especially on things like launching a browser or typing in a spell check heavy window. In Firefox, every word typed seems to cause a new spell check pass. As a page gets longer, this eventually bogs down the Pi M3 to unusable land until I turn off spell checking to finish the typing (then check at the end). I’ve not run into that yet on the Odroid. (At some text length it WILL hit for any processor, so that length is just enough beyond what I do regularly to not hit it…) That may be related to:

    The big win is the extra memory. With 2 GB, swap doesn’t happen unless doing something outrageous. I suspect that hitting swap is the big limit on performance for some of the things I’ve done on a “Desktop Pi”. Moving swap to a disk on the Pi-M3 seemed to help on heavy loads.

    So none of this changes the basic evaluation. Pi-M3 better for folks just starting out with Dinky-board computing and Linux-land; Odroid better for folks with some experience under their belt and a need for more power.

    Will the Odroid-C2 stay my Daily Driver? That depends. How much work to make Devuan go? The Systemd version of Debian works OK, but is a constant irritant as “everything you know is wrong” on where and how systems administration is done. I tire of looking up how to do things I’ve done for 30 years… As long as you don’t change things, no problem ;-) Also if Devuan came out with an official Odroid-C2 port, I’d be on it in a flash. They have one for the higher end Odroid board, but the two have different CPU chip sets in them and that may well matter (different drivers and settings and…)

    The good bit is that my environment (modulo screen precision) is now working correctly. Things like familiar wall-paper and the whole home directory and all. Accounts built, passwords changed, build script run. Which brings up the point that I’m not going to do an Odroid Build Script as it is substantially the same as the buildnew script I last posted. There are trivial differences in what is already installed by default and that’s about it. I mostly just ran the prior version and let it complain if something was already up to date.

    For me, the biggest benefit is just that I can run it with a clean 64 bit build of Ubuntu on one chip, or with a 32 bit build of Debian with another chip. (Ubuntu just being a gussied up Debian this is a fair comparison set). This will let me test the relative speed of sample FORTRAN code doing things seen in the GCM Model code. Especially since they are heavy in Double Precision math, having native 64 bit math “may matter”. So I’ll test it and see. At present, I’m building out the cluster as all armhf 32 bit (more on that another day…) for binary compatibility. This lets distributed code like MPI dispatch any job to any node and lets me avoid dealing with heterogeneous computing issues.

    BUT running 64 bit for double precision math sometimes is a big win. Oddly, sometimes it isn’t or can even cost you some. Why is interesting: The memory path may be the limiting factor. You fetch 2 x the bits per unit of math with 64 bit. On these little chips they often cheap out on the memory path and that ends up taking longer, so depending on just what data and math is being shoved around, going to 64 bit causes some benchmarks to take longer to complete. Go figure… So the only way to know is to benchmark it. (Theoretical evaluation can give clue, but not proof…)

    As of now, I’ve got my nice 64 bit vs 32 bit benchmark machine running. Yay! The Pi-M3 in theory could be used that way too, but the 64 bit Raspbian is very young and was still buggy last I tried it. Give things a few months to a year and it will be as clean as the 32 bit and just as populated with running programs… (A 64 bit CPU with 32 bit operation is a mixed blessing. It lets you run all the stuff you want as 32 bit while the 64 bit bugs get stomped, but it also means there is little pressure on code developers to make the arm64 work as “you can just run armhf”… but over time they get a round tuit…)

    In summary: As of now I’ve got a stack of 4 boards, all quad core, all running armhf. 2 x Pi-M2, 1 x Pi-M3 and one Odroid-C2. It is already configured for distcc and “distributed compiles” of C code, so things like kernel builds and OS builds will be quite fast. 16 cores with 5 GB memory is like that ;-) I have the GCM models unpacked and high level assessed, and I’ve unpacked the input data for the Model 2 code (now I just need to put model and data together and figure out how to make it go…) That means next on the horizon is speed testing. (Performance testing). I’ve now got three speeds of boards to test it on, and as of now both 32 bit and 64 bit environments for testing. That means the only thing in the way is “doing it”. I can also figure out if running parallel armhf is faster than big-box 64 bit and a few other odds and ends as well. Once that ‘run and asses’ is done, I’ll be able to say just what all works best (and what works ‘well enough anyway’…)

Comments are closed.