Pi Cluster Parallel Script First Fire

Posted on 23 February 2018 by E.M.Smith

Today was spent “slaving over a hot computer”. Part of that was getting GNU parallel to work with a couple of commands.

I first mentioned GNU parallel command here:

https://chiefio.wordpress.com/2017/11/10/parallel-scripts-clusters-easy-use/

That page includes some on how to install it and a video on basic operation. Today I decided to make a couple of scripts that would automatically spread work around to the cluster instead of me needing to type arcane command lines. I started by reviewing the tutorial and getting a basic example onto my desktop. That was fairly easy.

I chose to have a naming convention that any command I added to my $HOME/bin would have p_ as the first letters if it was a parallel command that just used my local cores, and ps_ if it was a parallel across systems command spreading things over the whole cluster of systems. In this way, I have to know a command is going to use parallel when I’m typing the name.

The example in the video used gzip and gunzip. Since I’ll often need to gzip or gunzip whole directories full of stuff, I thought I’d start with a p_gzip and a p_gunzip. They worked relatively well and somewhat easily.

Here’s the p_gzip text:

parallel -j0 --eta --progress gzip {}

It can be used by doing an ls to list the files you want zipped and send their names into standard-in like:

ls junk* | p_gzip

The command says to do things in parallel, the -j0 says use all cores available (in this case 4 on the Pi M3) so doing 4 gzips at a time, give me status on progress and eta to complete if it takes long, and run the gzip command on each name that comes in standard in.

When I first ran it, I thought I’d failed. I was staring at my monitor screen and saw no increase of activity. I expected to see the 4 cores load up. (That’s when I added some of the status information). Doing an “ls” command showed all the files were gziped. What? Yeah, it finished fast. I’m working in a directory where I’d copied all my log files. Guess it wasn’t enough load.

chiefio@DevuanPiM2:~/pp$ ls | wc -l
64
chiefio@DevuanPiM2:~/pp$ du -ks .
572

64 files, compressed size 572 kB.

I then ran it with “time” to see just how long it was taking:

chiefio@DevuanPiM2:~/pp$ time ls * | p_gzip

Computers / CPU cores / Max jobs to run
1:local / 4 / 64

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s 0left 0.00avg  local:0/64/100%/0.0s 

real	0m1.671s
user	0m1.510s
sys	0m0.590s

Oh. 1 2/3 seconds. No wonder I didn’t see a blip on the load… Notice the top status report. 1 computer, the local one (i.e. my desktop) with 4 cores running 64 jobs. As all of them total only took 1.6 seconds, each individual job was too little to show up on the htop monitor. I need bigger files to compress ;-)

Clearly from this example it is Real Easy to use parallel to load up all the cores on your local computer for the various tasks that are annoying to systems admin types, but need doing.

Here’s the equivalent parallel gunzip:

ls *.gz | parallel -j0 --eta --progress gunzip

For this one I chose to put the “ls” inside the script itself. Just gunzip anything that has a .gz suffix in a given directory. How long did it take to unzip all those 64 files?

chiefio@DevuanPiM2:~/pp$ time p_gunzip

Computers / CPU cores / Max jobs to run
1:local / 4 / 64

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s 0left 0.00avg  local:0/64/100%/0.0s 

real	0m1.735s
user	0m1.210s
sys	0m0.770s

About the same. Notice the “ETA:” line. 0 seconds to complete, 0 left to do. By the time it got set up for the first ETA report, it was done. 64 jobs and 100% done on the local machine.

Clearly I needed something that would load up a core so I could see it actually doing something. I have such a “do nothing” load script, named load.

while true
do
ANS="foo"
done

All it does is repeatedly stuff “foo” into the variable ANS forever in a loop. Locks up a CPU Core at 100% very nicely.

chiefio@DevuanPiM2:~/pp$ bcat p_load
ls ${1-d*} | parallel -j0 --eta load

This, run in my parallel program testing directory, is doing an ls of all the files starting with a ‘d’ (that’s about 20 of them with names like daemon.log.1) as a cheap hack to let me vary the number of invocations of the “load” script. Works a champ, and here’s the display it gives when run:

chiefio@DevuanPiM2:~/pp$ p_load

Computers / CPU cores / Max jobs to run
1:local / 4 / 20

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete

Note that it never completes and so can’t really estimate when it will complete. A dirty trick to play on –eta ;-)

1 computer, the local one, with 4 cores in use, and 20 instances of “load” pegging it at 100% loaded.

Great! I thought, now lets just bring in the rest of the cluster.

It Is Much More Complicated To Cluster Parallel

While not all that hard for an old hand at Linux / Unix, it is, IMHO, more complicated than it needs to be for simple things (while letting you do all sorts of complicated things no doubt…)

When I ran my first test case I got something like 64 prompts for my login / password. Ooops.

First off, the underlying transport to the distributed cluster nodes is SSH so you must have it installed AND have a shared key set to let you log in without password prompting. Oh Joy. I got to go through the generate a key pair and copy it to each machine in the cluster exercise. Gee, that would be a great thing for a parallel automated task… oh, right…

Here’s a helpful page on how to do that:

https://www.ostechnix.com/configure-passwordless-ssh-login-linux/

It has you install openssh. Debian / Devuan already has ssh for me so I just skipped that step and dropped down to:

Generate SSH keypair on local system

ssh-keygen creates a keypair, private and public keys. The private key should be kept secret. You shouldn’t disclose it to anyone else. And, the public key should be shared with the remote systems that you want to access via ssh.

Important note: Do not generate keypair as root, as only root would be able to use those keys. Create key pairs as normal user.

Run the following command in local system (Ubuntu in my case) to generate the keypair.
ssh-keygen

So on my “headend” machine, logged in as “me” (chiefio) I did that ssh-keygen command. Easy.

THEN, I got to do this next step 5 times for the 5 other computers presently in the cluster… Easy but tedious.

Now, we have created the key pair in the client system. Now, copy the SSH public key to the remote system that you want to access via SSH.

In my case, I want to access my remote system which is running with CentOS. So, I ran the following command to copy my public key to the remote system:
ssh-copy-id ostechnix@192.168.43.150

As my login was “chiefio” and my machines are in a 10.x.x.x network, I got to do something more like:

ssh-copy-id chiefio@10.168.168.141
ssh-copy-id chiefio@10.168.168.142
ssh-copy-id chiefio@10.168.168.143
ssh-copy-id chiefio@10.168.168.144
ssh-copy-id chiefio@10.168.168.145

All from the headend machine.

I could likely have put that in a script-lette but it was a one-off so I just did it long hand. I could likely have also used the hostnames, but just followed the model shamelessly out of sloth.

That, then, let me do a password free login from headend to all the headless nodes over SSH. Oh Joy. This time for sure!

Well, I launched the parallel system version of my load command and it showed work blips show up on the various machines as they prepared to be infinitely loaded forever, and then it was done. WT?… There were also a fair number of error messages. First try was to just put a copy of “load” in my home directory/bin on each machine. Figuring if it logs in as me, it will be in the search path. Nope. Still telling me “cannot find command load” and just ending.

Well, cutting to the end, it’s got ‘fiddly bits’ that need a good fiddle. Probably to protect you from stray programs of the same name but different content, or perhaps to make it ‘easier’ so you never have to install things on the headless nodes, the parallel command has its own way of distributing the program / script to be run. I also got some nags about SSH wanting to have the max connections made larger (that I’ve not done yet) and some about Python not being happy at “locale” being unset (so it uses the default that I want anyway) and some more housekeeping clean up to do. But it DOES run and it DOES work.

The “feel the other systems and set up to distribute work” seems to take a few seconds, so anything needing less than 30 seconds to run on your local box probably will not benefit from a cluster command. Use this only for things that take long enough locally to bother you enough to write and debug a command / script.

Here’s the final result:

ls ${1-d*} | parallel -j0 --eta -S:,h1,h2,h3,h4,h5 --basefile /home/chiefio/bin/load '/home/chiefio/bin/load'

Again, I’m feeding it a set of bogus input just to stimulate a number of jobs being farmed out. This goes into the pipe into parallel with -j0 for jobs = cores on any given computer. The –eta to give us a useless eta on something that will never end, then the interesting bit. The -S lists the systems to run this command upon. The “:” is special and means “localhost” (in this case ‘headend’ machine) then a comma separated list of other computers. For me, headless1 has an alias of h1, and headless2 an alias of h2, etc. So I’ve listed my 5 headless nodes.

Now, the “trick”, is you must tell it where your “script” or command is located. That –basefile that points at my bin directory and the command named “load”. That –basefile option can also share data files around too. Next, in single quotes, you put the actual command and arguments to execute. As load takes no arguments, it’s the same as the –basefile listing, but in single quotes. I’ve not yet gotten into the whole argument passing thing. That will be for tomorrow and making a parallel systems version of gzip and gunzip.

For now, this proves I’ve got the cluster running parallel scripts and paves the way for me to do things like a set of parallel FORTRAN compiles. While there is a dedicated commend for parallel C compiles (“distcc”) I’ve not found an equal for FORTRAN. I have distcc on each node of the cluster, so parallel C complies are ready to go (but having different libraries might be an issue – identical systems, releases, and libraries are best). Hopefully FORTRAN will be a bit forgiving on that front. We’ll see (eventually).

To the extent that parallel FORTRAN compiles require the same library and OS images, I can just use the 4 boards ( 16 cores) that are all Devuan matched R.Pi systems Leaving the Odroid C1 and Orange Pi One out of it. At least until a properly matched Devuan is available for them. Just changing the -S system list is all it takes.

Oh, here’s the result of running my ps_load command and loading up all 6 systems. A couple only had 3 cores in use, but that’s because I had more cores than files starting with the letter d ;-)

chiefio@DevuanPiM2:~/pp$ ps_load
parallel: Warning: ssh to h2 only allows for 15 simultaneous logins.
You may raise this by changing /etc/ssh/sshd_config:MaxStartup on h2.
Using only 14 connections to avoid race conditions.
parallel: Warning: ssh to h4 only allows for 17 simultaneous logins.
You may raise this by changing /etc/ssh/sshd_config:MaxStartup on h4.
Using only 16 connections to avoid race conditions.
parallel: Warning: ssh to h3 only allows for 14 simultaneous logins.
You may raise this by changing /etc/ssh/sshd_config:MaxStartup on h3.
Using only 13 connections to avoid race conditions.
parallel: Warning: ssh to h1 only allows for 17 simultaneous logins.
You may raise this by changing /etc/ssh/sshd_config:MaxStartup on h1.
Using only 16 connections to avoid race conditions.
parallel: Warning: ssh to h5 only allows for 16 simultaneous logins.
You may raise this by changing /etc/ssh/sshd_config:MaxStartup on h5.
Using only 15 connections to avoid race conditions.

So I need to go to each of those headless systems and raise the SSH logins limit, or set parallel to ask for fewer. It still runs, just nags you.

Warning: Permanently added 'h4,10.168.168.144' (ECDSA) to the list of known hosts.
Warning: Permanently added 'h1,10.168.168.141' (ED25519) to the list of known hosts.
Warning: Permanently added 'h5,10.168.168.145' (ECDSA) to the list of known hosts.
Warning: Permanently added 'h2,10.168.168.142' (ED25519) to the list of known hosts.
Warning: Permanently added 'h3,10.168.168.143' (ECDSA) to the list of known hosts.
Warning: Permanently added 'h1,10.168.168.141' (ED25519) to the list of known hosts.

Since I’m frequently changing host boards, I set the file where this “permanent” list is kept to be a link to /dev/null, so it nags me about this each time, but I don’t have to remove old identities to let new ones be added. IF the cluster stabilizes, I’ll put it back to normal and these will go away.

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Warning: Permanently added 'h2,10.16.16.42' (ED25519) to the list of known hosts.
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

So Perl on a couple of the boards needs to have ‘locale’ fixed. Maybe someday… I did it on a couple of others. Really a stupid Perl behaviour.

Warning: Permanently added 'h3,10.168.168.143' (ECDSA) to the list of known hosts.
Warning: Permanently added 'h4,10.168.168.144' (ECDSA) to the list of known hosts.
Warning: Permanently added 'h5,10.168.168.145' (ECDSA) to the list of known hosts.

Computers / CPU cores / Max jobs to run
1:local / 4 / 20
2:h1 / 4 / 16
3:h2 / 4 / 14
4:h3 / 4 / 13
5:h4 / 4 / 16
6:h5 / 5 / 15

Computer:jobs running/jobs completed/%of started jobs

And there you can see the 6 computers, each with 4 cores, and what max number of jobs each can get. I don’t know why h5 claims to have 5 cores. It’s an Orange Pi One, with Armbian, so maybe it’s quirky… Ran fine in any case, just might get assigned more work units than it ought to get.

In Conclusion

The way distributing scripts / commands is handled is a bit obtuse. Passing parameters and file names is a PITA (partly as they have 3 or 4 ways you can do it to try pleasing everyone and that just makes it 3 or 4 times as much to learn and 9 to 16 times as many disconnects between the way you read and what the example shows… Oh Well.)

The good bit is I now have model parallel scripted commands for both machine local and cluster distributed execution, and they work. Now I can just, one thing I do at a time, model off them to make a local parallel or cluster parallel script to distribute the work, as appropriate. I’ll need to learn some more options as I do that, but those will be “as needed / where needed”.

Over time, as I run into things that are slow, I can spend time while they run making a parallel scripted version of them. For now, gzip, gunzip, xz, unxz, perhaps some data move / preen tasks, and making FORTRAN modules in a distributed way. Certainly testing them on different data sets. I also need to integrate a Real Disk NFS data store for cluster shared data. That can cut down on some of the headend communicating data copies to the headless nodes, but at the cost of an NFS server bottleneck. I also need to find the powersupply for my 100 Mb network switch, since runnig the cluster through the old crusty 10 Mb hub it is on now will certainly make “network” the hang-up point.

But now that I’ve got things for it to actually DO, I’ll be motivated to fix those things as they give me a bit of a pinch ;-)

The other thing I’ll end up doing is building more cluster control / management tools. Doing a log-in to 6 systems every time you want to shut it down will get old quick ;-) Not bad when playing with 2 to 4 boards at a time, but a PITA as a normal operations process.

With that, I’m signing off for the evening. I’ve got a cluster that works. I can distribute just about any Linux command or script to 24 cores as it stands and another 12 if I bother to configure them too. Now it’s just putting it to work. One script and one task at a time…

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...

View all posts by E.M.Smith →

This entry was posted in Tech Bits and tagged Cluster, Linux, Parallel, PiM3. Bookmark the permalink.

27 Responses to Pi Cluster Parallel Script First Fire

jim2 says:

23 February 2018 at 2:20 pm

CIO goes hard core!
jim2 says:

23 February 2018 at 2:21 pm

Must be nice to be retired :)
E.M.Smith says:

23 February 2018 at 6:40 pm
Well, this was remarkably easy…

I added the other Orange Pi that’s also my NFS file server to the cluster (that whole ssh keys thing, it was already on that network, and added gfortran to it, it already had parallel and distcc) Plus mounted /LVM on each machine in the cluster (all but one already having it in the /etc/hosts file – later I’ll remove the ‘noauto’ flag so it just mounts at boot time).

On one card, the new Orange Pi (h5, not Opi the fileserver), I had to install nfs-client and add the /etc/hosts entry for /LVM. Guess I’ve not fully configured it with my standard build intalls 8-}

Then added a /LVM/parallel directory to work in, copied my batch of test (log) files over, and made sure they were all un-compressed. The script to do that compression was made and tested. As they all share the same /LVM/parallel file space, no need to tell it to copy files and copy back results.
```
chiefio@DevuanPiM2:/LVM/parallel$ bcat pn_gzip
parallel -j0 --eta --progress  -S:,h1,h2,h3,h4,h5,Opi gzip {}
```
I’ve adopted the naming convention of pn_ for those scripts that expect to work on the nfs mounted file system (as that takes a different command argument set than passing files)

I tested this with:
```
chiefio@DevuanPiM2:/LVM/parallel$ ls
alternatives.log    daemon.log.2  dmesg.3         messages    user.log
alternatives.log.1  daemon.log.3  dmesg.4         messages.1  user.log.1
alternatives.log.2  daemon.log.4  dpkg.log        messages.2  user.log.2
auth.log            debug         dpkg.log.1      messages.3  user.log.3
auth.log.1          debug.1       dpkg.log.2      messages.4  user.log.4
auth.log.2          debug.2       faillog         syslog      wtmp
auth.log.3          debug.3       fontconfig.log  syslog.1    wtmp.1
auth.log.4          debug.4       kern.log        syslog.2    Xorg.0.log
bootstrap.log       distccd.log   kern.log.1      syslog.3    Xorg.0.log.old
btmp                dmesg         kern.log.2      syslog.4    Xorg.1.log
btmp.1              dmesg.0       kern.log.3      syslog.5    Xorg.1.log.old
daemon.log          dmesg.1       kern.log.4      syslog.6    Xorg.2.log
daemon.log.1        dmesg.2       lastlog         syslog.7
```
and it worked dandy. Slower than just doing it locally as the test files are so small that the added set-up overhead of spreading them around dominates, but it worked nicely.
```
chiefio@DevuanPiM2:/LVM/parallel$ ls /LVM/parallel/* | pn_gzip

[leaving out the same nag messages noted above in the article -EMS]

Computers / CPU cores / Max jobs to run
1:local / 4 / 64
2:Opi / 5 / 39
3:h1 / 4 / 36
4:h2 / 4 / 37
5:h3 / 4 / 45
6:h4 / 4 / 39
7:h5 / 5 / 41

Computer:jobs running/jobs completed/%of started jobs
ETA: 0s 0left 0.19avg  local:0/9 Opi:0/9 h1:0/9 h2:0/9 h3:0/9 h4:0/9 h5:0/10

chiefio@DevuanPiM2:/LVM/parallel$ ls
alternatives.log.1.gz  debug.1.gz      fontconfig.log.gz  syslog.5.gz
alternatives.log.2.gz  debug.2.gz      kern.log.1.gz      syslog.6.gz
alternatives.log.gz    debug.3.gz      kern.log.2.gz      syslog.7.gz
auth.log.1.gz          debug.4.gz      kern.log.3.gz      syslog.gz
auth.log.2.gz          debug.gz        kern.log.4.gz      user.log.1.gz
auth.log.3.gz          distccd.log.gz  kern.log.gz        user.log.2.gz
auth.log.4.gz          dmesg.0.gz      lastlog.gz         user.log.3.gz
auth.log.gz            dmesg.1.gz      messages.1.gz      user.log.4.gz
bootstrap.log.gz       dmesg.2.gz      messages.2.gz      user.log.gz
btmp.1.gz              dmesg.3.gz      messages.3.gz      wtmp.1.gz
btmp.gz                dmesg.4.gz      messages.4.gz      wtmp.gz
daemon.log.1.gz        dmesg.gz        messages.gz        Xorg.0.log.gz
daemon.log.2.gz        dpkg.log.1.gz   syslog.1.gz        Xorg.0.log.old.gz
daemon.log.3.gz        dpkg.log.2.gz   syslog.2.gz        Xorg.1.log.gz
daemon.log.4.gz        dpkg.log.gz     syslog.3.gz        Xorg.1.log.old.gz
daemon.log.gz          faillog.gz      syslog.4.gz        Xorg.2.log.gz
```
1/5 second for the average job. It ran 9 each on the basic cluster and 10 on the OrangePi (not doing nfs service) likely as it claims more cores than it really has (by one) and the Orange Pi doing nfs service too would be slower on the pure compute tasks as it’s taking all the I/O load too (so it could not snap up the extra job…). 7 boards x 9 jobs each = 63 plus one left over that the Opi2 took, making the 64 total that matches the files to be gzipped.

Well, surprisingly easy and it lets me dodge that whole copy / return business on data files and results.

I’ll now go back and start addressing some of those ‘sanitation’ error messages and see if I can get most of them to go away. It’s only a minor annoyance, but it is ‘untidy’.

Oh, and I AM going to make versions of the command that do a copy / return just so I can test relative speed and performance. True, all of them will be sucky at the moment as I have THE worst possible network connecting all these boards (a 10 Mb HUB, so packets fighting over a single shared 10 Mb pipe for all machines). I need to make that 100 Mb switched, so each board gets a dedicated 100 Mb (matching their network spigot speed) and NOT shared. This is fine for development / testing (other than performance testing) but not for production… So maybe tomorrow I’ll go digging through my junk boxes looking for where that PSU went ;-)

Since most of my actual large data lumps are on the NFS file store, this lets me handle most of them rather directly and in a clear way and without a lot of read / copy / return parameters and such. It also means much less “bit wear” on the SD cards as everything will be happening to Real Disk ™ on the file server.

Well, that’s one big lump out of the way. On to parameter passing / return copy example… and that systems management smoothing ;-)
p.g.sharrow says:

23 February 2018 at 7:02 pm

This is a timely subject as my grandson and I were discussing this paralleling last night. I am glad to hear that things are going so well with the Pi format SBCs. and they can be mixed….pg
E.M.Smith says:

23 February 2018 at 7:06 pm
Interesting… The machine that gets the “extra” job seems a bit random (so my thesis about why is bogus). Here’s the results of the pn_gunzip version:
```
chiefio@DevuanPiM2:/LVM/parallel$ bcat pn_gunzip
parallel -j0 --eta --progress  -S:,h1,h2,h3,h4,h5,Opi gunzip {}
```
Basically the same program layout but with “gunzip” instead of “gzip”. The result?
```
Computers / CPU cores / Max jobs to run
1:local / 4 / 64
2:Opi / 5 / 41
3:h1 / 4 / 38
4:h2 / 4 / 37
5:h3 / 4 / 43
6:h4 / 4 / 37
7:h5 / 5 / 39

Computer:jobs running/jobs completed/%of started jobs
ETA: 0s 0left 0.19avg  local:0/9 Opi:0/9 h1:0/9 h2:0/10 h3:0/9 h4:0/9 h5:0/9
```
The “spare” job went to h2 instead. A Pi M2 board.

Well, I’m not going to worry about it now. It’s just some scheduler choice made by the developer on how to alot jobs.

The nifty thing is that now I have bulk gzip and gunzip commands customized for the NFS file server where most of that data is stored, and another set customized to use all the cores on my local machine for when I’m doing things just on it.

Those two cases are going to be the big majority of all my use cases. There is unlikely to be much on a local disk (in the cluster) that is not on the NFS disk. I only have 2 USB hubs ATM, and one is the NFS file server while the other is on the XU4 “daily driver” as the USB 3.0 on it is a bit dodgy (software issue I think so hopefully fixed in a future update) which leaves only 1 USB 2.0 port for kb, mouse, disk, …

Which means that unless I get a Real Disk + USB Hub for ‘headend’, I’m not going to be doing anything bigger than the SD card that’s “local” and not on the NFS server… Maybe “someday”, but not for months at a minimum.

Well, back to the keyboard and that “polish work”… and trying to figure out what other commands I might use regularly could use pn_ and ps_ and p_ versions… Fortran compiles. Various compression / decompression commands. Maybe some search or sort things? Hmmm…
Another Ian says:

23 February 2018 at 8:42 pm

E.M.

Look out or in the current climate the PC mob might be after that” gunzip” term
Another Ian says:

23 February 2018 at 9:03 pm

E.M.

In keeping with the comment above

“In case You Forgot For A Minute That The Left Is Completely Insane”

“Student investigated after allegedly saying a maths symbol looked like a gun”

https://realclimatescience.com/2018/02/in-case-you-forgot-for-a-minute-that-the-left-is-completely-insane/
E.M.Smith says:

23 February 2018 at 9:06 pm
Well, this is an interesting one.

Something I regularly do is tote up how much disk space some particular directory is using. I have a little command that does most of it in an automatic and standardized way. Named “DU”.
```
du -BMB -s * .[a-z,A-Z]* | sort -rn > 1DU_`date +%Y%b%d` &
```
This does a summary total disk space used, in MB, on any file or any file starting with a “.” (so normally not seen) and a letter. This then gets sent to sort to make a reverse (biggest first) listing that gets put in a file starting with a 1 (so it sorts to the top of ls listings) and with the date today appended so they can accumulate from different days. This is all launched into a background task to stay out of my way (the “&”) as it is often long and slow on the I/O.

Here’s the nfs parallel form:
```
chiefio@DevuanPiM2:/LVM/data.giss.nasa.gov$ bcat pn_DU
ls -d $1/* $1/.[a-z,A_Z]* | parallel --eta --progress -Sh1,h2,h3,h4,h5,Opi 'du -BMB -s {}' | sort -rn > $1/1DU_`date +%Y%b%d` 
```
I put the “ls” inside this command and you pass in an NFS file server directory path. That list of file names get fed to parallel that ONLY farms it out the the headless nodes (no “:” in the -S list). Their results go into a pipe to a sort command in the same directory ($1 parameter) as above.

Tested it and it worked. Much faster than I expected. Usually these run very slow and clog up my desktop machine. As this is just a “housekeeping” chore and I use the files later, typically when cleaning up, I don’t really care how long it takes but don’t like it cluttering up the main machine. As it is pretty much universally I/O bound, it also ties up anything else I’m trying to run that does I/O to “disk”. By launching this off to the other systems in the cluster, they can have their I/O systems cluttered a bit. It also spreads the I/O action around so those systems can still be fast on other things. The only potential bog point really is the NFS server, and it would be getting whacked anyway.

So this is an example of distributing the I/O action more than the CPU, and having the headless nodes do all the grunt work without slowing my desktop machine at all. Just need to remember to only feed it a directory from the LVN nfs server pool as it doesn’t do any copy / return stuff but does call remote systems. Oh, and this one is running “as me” so only works on directories owned “by me”. I need to make the same arrangements for one that runs as root for those file systems not owned by me.
```
chiefio@DevuanPiM2:/LVM/data.giss.nasa.gov$ pn_DU /LVM/data.giss.nasa.gov 
ls: cannot access /LVM/data.giss.nasa.gov/.[a-z,A_Z]*: No such file or directory
```
That “cannot access” warning is common as most directories do not have files starting with “.” as that makes them invisible.
```
Computers / CPU cores / Max jobs to run
1:Opi / 5 / 5
2:h1 / 4 / 4
3:h2 / 4 / 4
4:h3 / 4 / 4
5:h4 / 4 / 4
6:h5 / 5 / 5

Computer:jobs running/jobs completed/%of started jobs
Opi:5/0/19% h1:4/0/15% h2:4/0/15% h3:4/0/15% h4:4/0/15% h5:5/0/19%
Computer:jobs running/jobs completed
ETA: 0s 0left 0.04avg  Opi:0/5 h1:0/4 h2:0/4 h3:0/4 h4:0/4 h5:0/50
```
Again we see that the Orange Pi boards are asking for one more job than cores.

The result:
```
chiefio@DevuanPiM2:/LVM/data.giss.nasa.gov$ cat 1DU_2018Feb23 
14636MB	/LVM/data.giss.nasa.gov/impacts
1028MB	/LVM/data.giss.nasa.gov/modelforce
414MB	/LVM/data.giss.nasa.gov/modelE
367MB	/LVM/data.giss.nasa.gov/swing2
158MB	/LVM/data.giss.nasa.gov/gistemp
51MB	/LVM/data.giss.nasa.gov/mineralfrac
23MB	/LVM/data.giss.nasa.gov/ch4_fung
22MB	/LVM/data.giss.nasa.gov/dust_tegen
9MB	/LVM/data.giss.nasa.gov/csci
7MB	/LVM/data.giss.nasa.gov/seawifs
7MB	/LVM/data.giss.nasa.gov/efficacy
6MB	/LVM/data.giss.nasa.gov/stormtracks
3MB	/LVM/data.giss.nasa.gov/mcrates
2MB	/LVM/data.giss.nasa.gov/rsp_air
2MB	/LVM/data.giss.nasa.gov/cassini
1MB	/LVM/data.giss.nasa.gov/sageii
1MB	/LVM/data.giss.nasa.gov/precip_dai
1MB	/LVM/data.giss.nasa.gov/precip_cru
1MB	/LVM/data.giss.nasa.gov/o18data
1MB	/LVM/data.giss.nasa.gov/landuse
1MB	/LVM/data.giss.nasa.gov/js
1MB	/LVM/data.giss.nasa.gov/index.html
1MB	/LVM/data.giss.nasa.gov/imbalance
1MB	/LVM/data.giss.nasa.gov/gistemp.1.html
1MB	/LVM/data.giss.nasa.gov/co2_fung
0MB	/LVM/data.giss.nasa.gov/1DU_2018Feb23
```
Which looks to me like it worked.
E.M.Smith says:

23 February 2018 at 9:16 pm

@Another Ian:

Sigh. Really? SAY the word “gun” and get your home raided by cops? What about the fact that in most places it is still legal for a kid to own toy guns and say the word gun…

“Yes, Sargent, we need to raid him! He has a zip-gun. Lots of them it seems. Keeps talking about a cluster of gunzips.” (HUMOR!!!! For the Humorless Left and Dumb In Blue).
jim2 says:

24 February 2018 at 1:36 am

It’s pretty awesome you can set up a parallel computing system with just some shell scripts. I wrote a few of those in the ’90’s and use Linux at home, but I know there’s a lot I don’t know about the rather massive shell script libraries. At any rate, my hat’s off to you for that one.
E.M.Smith says:

24 February 2018 at 7:33 am

@Jim2:

The “magic” is all in the parallel command. I’m just reading the man page and using it. So the h/t ought to go to the guys who wrote parallel. It seems to be built on Perl and ssh.

But yeah, it is a very neat tool. As LOTS of the stuff I do runs from scripts, it’s a big step up to put that on a micro-cluster. Figure I can now have about 24 processes, all running in a dedicated core, and doing something I want done. Nice!

FWIW, my next step up (along with just making more of my common command scripts parallel) is to start whacking on parallel FORTRAN programming. Starting with co-array FORTRAN. Then adding a message passing interface layer. Finally ending at multiple instances across cores if needed.

Typical parallel programming stuff. First make your basic code as efficient an approach as possible. Then use parallel features of the language. Then add message passing as it splits into multiple images. Then add multiple instances working different parts of the problem space (if possible). Somewhere in there use GPU processing as possible. And so it goes.

The problem must be amenable to decomposition into sub-units, but many things are. In theory, one could make a climate model where each grid cell was a multi-core SBC and just exchanged physical information with the adjoining neighbor cells. Grid cells on a grid network…

FWIW, one of the interesting bits for me is that in these various runs, the $16 OPi boards have done well. Not driven enough to heat limit (yet), but being ‘good enough’ compute modules in these real world examples. Though, as noted, the cluster is network limited at present so comparative core performance isn’t really valid yet. BUT, that makes it about $4 / core for OK “computes”. $400 getting a 100 core cluster… $4k for a 1000 core machine.

I know… network, problem decomposition, full use heat limiting. All that has to be sorted right first. Still, it’s pretty clear that even a very modest budget can make a nice cluster for an average Joe to use, play with, learn from.
Larry Ledwick says:

24 February 2018 at 8:09 am

Hmmm let’s see how many grid cells on the earth?
How many Rpi systems would you need to have a core for each grid cell?
You would also need some extra cores for message handling general system over head. Way beyond your and my budget but a dedicated climate modeling cluster would be possible.

All you need is the budget of the NSA

/sarc
p.g.sharrow says:

24 February 2018 at 3:56 pm

A SBC to handle 1 cell or 10 makes no difference, Each board doing it’s thing, working together as a fully integrated system sounds like a solution for real world computing problems. Giga flops of a one track test is great for bragging rights. But it is, Costs of energy and equipment for computed solutions of real world problems is the real measurement.
If multicore SBCs can become a cluster and the clusters can be networked, both in hardware and software, then any scale up becomes doable…pg
E.M.Smith says:

24 February 2018 at 4:54 pm

@Larry:

Last gen models have 8000 grid cells ( 2000 R.Pi ) and GIStemp was raised to 16k a couple of years back. So 4000 R.Pi boards. Call it $40 ea. w/ power supply. $160,000 so more like one year of the local university salary for the professor, or 3 mos. salary of the chancellor…. or the rounding error in the NSA budget.

Using Orange Pi One, it would be $25 with power supply and good heat sinks, so only $100,000 or about the janitor fully burdened rate for a year, or one trip by Pelosi back to a S.F. Pride Parade…

I think that with bulk purchase and using the even smaller boards with a common PSU bus, I could get it down to about $40,000 but that ignores the grid network costs that start to dominate then…

(Yeah, I saw the sarc; … but ruthlessly driving costs down is “what I do”…)

In the real world, I’d be more likely to use octo-core systems to reduce network costs by 1/2 and because their $/compute is less. Call it 2000 XU4 boards at $65 ea w/PSU or $130,000 so even less than the PiM3 cost, then 1/2 the network gear needed, and the cores are at a higher clock rate while 1/2 of them are A15 cores so much faster too!

Oh, and actual data rate per second between cores is needed to spec the interconnect. IFF the cores are mostly playing with local data and only rarely sending something to a neighbor, slower cheaper networking can be used. Clusters of hubs on a switch backbone. If data is frequent and latency an issue, then a big expensive switch is needed and 4k nodes of switch is not cheap… (but as the Pi is only 100 Mb and that is obsolescent in networks, likely available for little cost at the local e-junk reseller.) An interesting option with the Pi M3 is using WiFi built into the boards. Have some of them be access points to the rest. Don’t know how many nodes are possible on one A.P., but call it 100. Then you just need a 40 port switch (really it would be 48) with 40 of the boards plugged in and being WiFi access points to 100 of the rest each. Cheap and reasonably fast IF data exchanges are intermittent and bursty not continuous and large.

FWIW, I’m intending to play with a small subset version of that Pi Cluster idea just to learn from it. I’ve got 28 cores in the mix right now. The XU4 can be added with moving one wire to that network giving 36 cores. So I’m one more 4 core board away from a 1/400 scale model of the “toy system of 4k boards or 16k cores”. That is enough to assess performance, issues, costs, etc. etc. Right now I have 3 boards in the cluster that can be WiFi connected and I have 2 x WiFi dongles available. That’s enough to test traffic effects from using WiFi as the backbone network.

I do need to emphasize that this IS a toy system. For real work, one would want to use faster boards with way faster cores and Gb ethernet switched backbone at least. In many “cluster systems”, the network costs, latency, and bandwidth limits bite first and hardest. Amdahl’s Law has not been repealed and nothing is as effective as a Damn Big CPU talking to real local memory. Every step away from that is compromising. Data exchange becomes more remote and slower. Networking is orders of magnitude slower than direct memory access speeds. Latency causes CPU idle time. etc. etc. Only “cheap” pushes that direction (out of the “Fast, good, cheap – pick any two.” truism…)

My hope is that a reasonably scaleable proof of concept model can be made to run showing just what is needed to make a “just big enough” home climate model cluster. Something where I can run one “simulation” in a week or so. Perhaps at reduced resolution (grid cell count). Then, perhaps with some optimizing and tuning, it could reach the point where a $10,000 budget would be enough for “our side” to run some contra-models and poke at “their side’s” foibles. That cost would be attainable for many folks…

But “we’ll see”. I’m into it about $400 and still need to get the network upgraded and add another board or two. So at the 40 cores level I’m running about $10 / core. Near that $160,000 / 16,000 cost point; but at a 4/1600 = 1/400 scale. That makes it about another $1200 to reach 1/100 scale. But I think I can reach some conclusions even at this scale. Plus, at this point, making some model software run at all is more valuable than more hardware not doing anything.

Meaning my next important task is that FORTRAN coarray lesson, the message passing FORTRAN layer, and some actual sample routines doing something. (So I’ll likely take a part of one of the models and port just it. – Say the mass flow or the radiant flux. Essentially a grid of something like 400 cells and with only one parameter being modeled.) Once that is working, I can layer on the rest of the complexity pretty fast and measure the performance parameters at each step. But all that depends on “One Guy” getting the code to go…

Well, I’ve now got the platform to be worth the effort. I can spread a task over 6 boards. I can measure what is faster and what is not. I’ve got 4 major board types to test / characterize. The whole cluster is demonstrated to work.

I’m a bit worried that the answer will be “these boards suck” (which from a pure competitive performance POV, they do – don’t expect a $40 board to do what a $4000 one does.) My first tests of MPICH FORTRAN were not encouraging. No increase in speed at all. I’m hopeful that the coarray FORTRAN will be better and that perhaps there’s some performance tricks I just didn’t know in my first test. But it really is possible that Debian is just not tuned to do mulit-core nor high performance cluster things well. That’s one of the “issues” in High Performance Computing. It is very easy to lose any potential gains in overhead, so tuning the overhead parts is important; but a generic OS is often tuned for single desktop “glitz’ and not HPC backend speed over all else.

But I’m getting ahead of my self. While it is possible that I’d need to drop back on the OS choice to one of the cluster oriented distributions, there’s no evidence of that yet. Similarly, while it is possible the lousy network design of the R.Pi (low speed, shared with USB “disk” I/O) is limiting, there’s also no evidence of that yet either. Until I’ve got a sample case running and it is clearly saturated on network with idle CPU cycles, that is an unproven worry about nothing. Those are things to be tested on this test / toy cluster.

Besides, by the time I have a real model running, there will be 16 and 32 core boards for $1 each with Gb network built in ;-)
kneel63 says:

25 February 2018 at 2:02 am

For massively parallel, compute intensive stuff, is there any advantage to getting boards with 2 ethernet and creating a tree/hierarchy of clusters? So where you require some sub-section of a massive data file to do a “sub area” it is cached at one level up and farmed out to cores the next level down, most “next door” cells you might want data from are “local” and not clogging the network at every node, just a “few” in the local cluster.
IIF your network is the bottleneck, then maybe that needs to be paralleled as well. IOW, a cluster of 100+ core “master”, that is really cluster of several boards, and several masters also clustered, rinse and repeat while there is advantage. Maybe “cross connects” if that helps. Yes, very unconventional networking/clustering, but for the task at hand (climate models), may be worth it if it scales better, or scales more “easily”. Need to tune that at several levels of course – best core/master ratio, where data xfer becomes a bottleneck etc.
Just thinking out loud…
E.M.Smith says:

25 February 2018 at 3:43 am

@kneel63:

Right idea.

There are whole books written on the network topology considerations in cluster computing. Various classes of computer cluster are defined by their connection methods. Endless considerations of star vs ring vs tree vs matrix vs…

So yeah, any given problem type is best served by a particular network speed and graph. In practice, folks just try to get the generally fastest most interconnected network they can as rarely can you configure your network “best for each class of problem” you run.

At one extreme end is the COW. Cluster Of Workstation. Just on whatever old network exists. Things like SETI At Home running over the internet. That kind of stuff. VERY embarrassingly parallel problems with infrequent and relatively small data transfers.

At the other extreme are things like the Parallela Board with an on-chip matrix interconnect between all the cores / memory in the Epiphany chip. Folks try to arrange “any CPU to any memory” matrix connections. In practice there’s usually some kind of hierarchy and the need to occasionally ask your “neighbor” processor to please ask his neighbor to hand over some data…

In between are the things like a Beowulf with a fast Switch Backplane. Any board to any board in one switched connection. All kinds of “switched fabric” faster things exist if you have enough money to throw at it.
https://en.wikipedia.org/wiki/InfiniBand

InfiniBand (abbreviated IB) is a computer-networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used as either a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems.

As of 2014 it was the most commonly used interconnect in supercomputers. Mellanox and Intel manufacture InfiniBand host bus adapters and network switches, and in February 2016 it was reported that Oracle Corporation had engineered its own Infiniband switch units and server adapter chips for use in its own product lines and by third parties. Mellanox IB cards are available for Solaris, RHEL, SLES, Windows, HP-UX, VMware ESX, and AIX. It is designed to be scalable and uses a switched fabric network topology.

As an interconnect, IB competes with Ethernet, Fibre Channel, and proprietary technologies such as Intel Omni-Path.

As a “Home Gamer” I’m not in that infiniband league nor Fibre Channel. So I’m looking at a switched Ethernet.

Most of the time, folks in the Ethernet tranche of performance try to get as many switched ports as possible all on one switch (so all computers are direct switched to each other). For systems up to 256 nodes, that isn’t too hard. It can be expensive. The really big CISCO chassis can take something like a dozen board each of 24? connections each. Prepare to pay $Thousands…

For folks in the cheaper seats, you put as many ports as possible per switch, and then connect those via (hopefully) only one big switch above them. As 48 port switches are not that hard to find, we’re talking a 48 x 48 size for only a one deep switched tree network. 2304 nodes. But buying 49 switches of 48 ports each, even at the “used and abused” store for last gen tech, is going to be measured in the low $Thousands.

Ok, on to work flow… Now comes the fun of “Do I try to optimize or not?” Generally not. Folks just dump in their program and if it completes “soon enough” never look. IFF they need to speed it up (i.e. it won’t be done in time for going home early…) then they profile the code, look for slow nodes, and eventually get around to asking about network speed and latency issues. Only latency matters to the switch tree (well, and IF you have managed to saturate total bandwidth on any ports).

Then someone might look to structure their code so that things on the first 48 nodes talked to each other more than with those over on the next 48 nodes. Depending on how your cluster “dispatches” jobs, you may or may not have any control ability over which image runs on what computer. But it’s worth a look. IF it’s a problem.

For Me:

As I’m running on real honest to gosh junk (most of it inherited from site shutdowns or bought cheap at the junko shop) of no more than 100 Mb for my FAST switch, I’ve got a very very easy upgrade path whenever I need to “go there”. Go to the local junk shop and buy somebodies “disposed” 48 port GigE switch. (IF really desperate, buy a new one from Fry’s or Amazon. For unmanged ones, they are pretty cheap now). These often show up in quantity at the local junk shops. Why? Folks go out of business and it’s the dumpster or sell it to Weird Stuff & similar. The guy doing the shutdown is just told “Get Rid Of It Now!” and nobody cares about the price. Or a new IT guy wants a new toy so the shop gets “upgraded” and the old stuff (sometimes due to no longer having warrantee or support but working fine) ends up at the junk / clearance store.

As an example, I have the 8 port 100 Mb version of this and I’m looking for the power brick. This is the 16 port, unmanaged meaning no software just a dumb switch, Netgear Gigabit switch. More than I’m likely to need for a very very long time. (Figure I’m doing XU4 for performance from here on out. I’ve got 8 boards total at present for my 36 cores. So 16 x 8 more cores = 128. So I’d need to be adding 128 more cores of boards before I filled up this switch.)

It is selling for $55 right now on sale. Not going to break the bank.

So for the foreseeable future, I’m just not going to be big enough to need to worry about a two level deep switch topology… Even longer if I got a 48 port switch instead (about the same price at the junk store, or maybe cheaper…). That would be 384 more cores using XU4 boards. At present, I can’t even load up my main 20 cores of boards unless I use dummy loads…

But yes, you are correct in your instincts that as you need more speed, you go for more network and structure it as needed. Until you are running enough to need things like Infiniband, that typically means GigE (or similar) switches (any port to any port at full speed NOT shared) with as many ports as possible on one switch, only resorting to “uplink to another switch” when you can’t get a big enough switch backplane. Oh, and for the uplink, you can use Fibre for more speed or gang multiple Ethernets together. For that matter, you can gang the Ethernet from one board for more speed to it (often used on things like file servers) or “dual home” (or “multi home”) them. So a heavily used file server may have several GigE outlets onto several switches for direct delivery of data.

Just a matter of applying money…

For my toy system, as the Pi has at most 100 Mb (and in real world use more like 50 Mb) and that will be less than 100% used, I’m not going to be worried about the network until I’ve got more than 16 boards worth of cluster and most of them are 1 Gb Ethernet. That’s a ways off… Year or two at least.
E.M.Smith says:

25 February 2018 at 4:02 am

@PG:

You might find this interesting for the kid. Comparison of ways to go parralel.
https://www.gnu.org/software/parallel/parallel_alternatives.html#DIFFERENCES-BETWEEN-ClusterSSH-AND-GNU-Parallel

Or the tutorial:
https://www.gnu.org/software/parallel/parallel_tutorial.html
jim2 says:

25 February 2018 at 4:12 am

Or you could go optical fiber :)

http://www.instructables.com/id/Building-POF-home-user-network/
E.M.Smith says:

25 February 2018 at 7:03 am

@Jim2:

Interesting stuff. Switches are expensive (or the ONE I could find, out of Switzerland). Then I’d need an Ethernet / POF adapter / interface gizmo for each board. Likely to cost more than the board…

I think I’ll stick with GigE Switches until they start putting optical connectors on the boards…
p.g.sharrow says:

25 February 2018 at 7:56 am

@EMSmith; the “kid” is 21 and a gamer computer geek. I have been nudging him toward the Pi based system almost as long as you but he seems to still prefer Intel. But there is hope! He asked for our little 4 board Raspi stack to play with so there is some interest there.and from time to time I point him in your direction.
We have lots of cast off compute stuff to play with and the Pi form factor stuff is cheap.
As for me, I’m an old farmer that has used desktop computers for 30 years. I know enough to be able to follow what you are working on, barely ;-) I can see where things are headed, but not much help in getting there.
I am looking forward to attempts to utilize wifi capabilities of these boards for network them. In the long run this may be a very important part for some of the connectivity communication needs.

This blog is most useful news site on the internet as well as being entertaining and my favorite place to hit every time I sit down ..pg
Simon Derricutt says:

25 February 2018 at 1:32 pm

Logically, the way you’d connect a set of processor boards for a simulation of the globe would be a geodesic sphere, where each board is connected to the 6 (or occasionally 5) nearest neighbours. At each simulation cycle, each board would only need to pass data to its neighbours on what conditions it’s feeding across the borders of its “local patch” as to air volume, momentum, humidity, clouds, variation with height, etc., and to receive from its neighbours the equivalent data in to run the next cycle. Each board would also need to have a link to a single point for loading in the initial data and feeding out its current situation. Within each board’s “patch” you can of course subdivide into smaller geodesic patches to get better resolution. This should cut down the communication requirements since the datafiles between boards should be fairly small and so Ethernet speeds aren’t even needed – I2C may be sufficient and USB more than fast enough. Setting initial conditions and reading the data out will of course be a heavy I/O requirement, depending on the granularity needed, but this is all to and from central storage so a simple hub may be enough rather than a switch. Once you’ve started the sim, then all that should be needed is a synchronising signal so the time-steps don’t get out of step. Each board will only be talking to its 6 (or 5) neighbours and you won’t need central storage again except for reading out the results.

This sounds like a specific board would be more useful, with enough processor and memory onboard with enough serial I/O ports to do the job too. It’ll need a time-tick port, too, to signal it’s finished and waiting for the next cycle to begin, and those signals from all boards would be ANDed to produce the signal to start the next time-tick. If you want more processing power, simply increase the number of boards in the array (and spend a long time getting the physical wires plugged in to get the logical geodesic array). This appears like it won’t need a distributed Fortran, since each processor will only be dealing with its own patch of air and looking at what comes into it and what goes out of it, and mainly talking to only another 6 or 5 boards.

Wishful thinking….

On the other hand, the sorts of array processors used to mine Bitcoin are very competent and have a lot of processors for the money. Since it seems fairly likely that the Bitcoin bubble will burst pretty soon, in a year or so maybe those unusable mining boxes will be on sale for some ridiculously-cheap price, given that they aren’t really useful for much else in a normal computing environment. Maybe CGI rendering, but there’ll be a limited market for that, and unless someone reprograms them for some virtual reality app I see the mining machines as being surplus to requirements and thus almost given away. You probably wouldn’t need that many of them to achieve the 4000 core target. They eat around 1.5kW as far as I’ve seen, so electricity costs are significant. Maybe OK in the Yukon or Iceland, not so good in Florida or California.
kneel63 says:

27 February 2018 at 2:37 am

“So for the foreseeable future, I’m just not going to be big enough to need to worry about a two level deep switch topology…”

While this is fine for a more general purpose “super cluster”, with the emphasis on climate modelling, my thought was that a hierarchy would allow you to parallelise the data download as well – cache input data at level 1 machines, and the level 2 machines under can download without a direct link to the actual machine holding the file required. So 8 level 1 machines are all downloading to each of their 8 level 2 machines at the same time (sequentially – so 8 downloads from “server”, then 8 each from level 1 to level 2, should be faster than 64 downloads from the central server). Useful for “big I/O data” jobs perhaps.
Then again, I suppose you are not too worried if it takes a little longer, so no big deal – at least, until you decide you really do need to have 4K cores to do it “soon enough”. :-) Then 3 levels of 16/level gives you 4K, with data transfer happening at 256 downloads at the same time…
David says:

8 March 2018 at 9:06 pm

Offtopic:

This is an answer to your thread «For deep security, use ARM, avoid Intel & AMD processors». I am answering here because original thread is closed to answers.

You stated «I am waiting for a Linux on a clean open source CPU, but it doesn’t exist yet», and well, now we now that open soruce CPU: a 64 bit RISC-V CPU/SOC (Freedom U540 SOC) from SiFive. SiFive has presented HiFive, the first RISC-V Linux capable board with the first Linux-ready RISC-V Chip: https://www.sifive.com/products/hifive-unleashed/

Of course, first development boards won’t be cheap: the boards start at $999 and will ship at the end of June 2018. Early Access boards will be available March 2018.

I hope when a future bigger production arrives there will be a similar board but much less expensive, although it won’t have 8 GB of RAM of course.

On other side, this development board doesn’t have video-out so, if you want to use it as a personal computer, you will need to add a video option (PCIe, USB or custom interface for example). I am thinking about using Zero (without wifi nor BT) as a video module (RPI booting with a custom OS will only receive commands from main computer and can’t communicate to outside world).
E.M.Smith says:

8 March 2018 at 9:30 pm

@David:

Nice to know!

You can be sure someone will fab up a board with video on it and prices will drop.

At a $kilobuck I’m unlikely to be an early adopter, but I’d pop a few $Hundred for one. Usually that happens one design cycle after the first adopters get their designing done… or about a year.

Looking at the board, I notice there is no GPU. Video will be done by the main CPU or you get to build an outboard video processor…. As video is lots of small calculations done best in a GPU, I’d lean toward a small adjunct board with a built in GPU. Getting it to work well with this board might be an issue… then again, I suppose one could just make it a dedicated X-Windows server with this board as the client. Would just need to get X running on it and route all login sessions via the front end board and fixed X session protocol. (Think of the outboard system as the equivalent of a stupid slow PC running X windows on a backend Sun or similar server ala. 1990s model…)
E.M.Smith says:

8 March 2018 at 11:23 pm

Oh, and a note per “My Network Switch”:

I found the power supply… moved the boards to it, plugged it in… nothing.

Seems the power supply is dead, per my voltmeter… So I need to buy a 5 VDC 3 Amp power supply with the right plug on it… I’ve gone through my whole box of PSUs and not one matches. I do have one at that V & A but the plug is smaller. Sigh.

The first circuit I ever made was a 5 VDC power supply with a 5U4 tube to power tube filaments. Now I find myself wishing I still had it… Somehow it would be curiously attractive to have a set of computers and their network running off a Tube PSU ;-) “Somewhere” in the garage I have a bunch of surplus 5 VDC voltage regulators in TO-3? packages. I’m tempted to take a DIY approach to the problem, but … that would mean cleaning out the garage ;-)
jim2 says:

9 March 2018 at 1:28 am

I re-abused a computer switching power supply for my “electronics lab.” Got good current @ 3.3, 5, and 12 volts. But you probably don’t want something that big with a fan :)
Simon Derricutt says:

9 March 2018 at 9:56 am

EM – I hesitated to suggest it, but it’s possible to use scissors to cut the wanted connector off the dead supply and solder it to the working PSU…. Even better to use an intermediate (standard) plug/socket to rejoin, so that you can switch the working ends if needed.

Comments are closed.

Recent Posts

	another ian on W.O.O.D. – 6 May 2024…
	another ian on W.O.O.D. – 6 May 2024…
	Ossqss on W.O.O.D. – 6 May 2024…
	another ian on W.O.O.D. – 6 May 2024…
	jim2 on W.O.O.D. – 6 May 2024…
	another ian on W.O.O.D. – 6 May 2024…
	jim2 on W.O.O.D. – 6 May 2024…
	another ian on W.O.O.D. – 6 May 2024…
	E.M.Smith on W.O.O.D. – 6 May 2024…
	E.M.Smith on EV Future Path Revisited
	E.M.Smith on W.O.O.D. – 6 May 2024…
	another ian on W.O.O.D. – 6 May 2024…
	beng135 on W.O.O.D. – 6 May 2024…
	Keith on W.O.O.D. – 6 May 2024…
	another ian on W.O.O.D. – 6 May 2024…