Just a bit of a personal Woo-Hoo! moment. I just now got threads to stop sending me stupid error messages about ‘world times’ being different (though I’m not sure exactly how ;-) and I’ve run the skeleton Cell Climate Model top level flow of control on 6 processes (threads sort of) on the Odroid N2.
One unexpected problem was the difficulty in slowing the processing down enough to see the cores load up…
So, OK, this code is UGLY and still has a lot of commented out bits I was playing with to see what caused which broken thing, so not laughing, OK? (Well, at least not too much…)
I had to resort to “stubs” to test things. Seems attempting to call a function inside the place you wrote it can cause differences…
So first up, the stub to run the spitting out of tasks bit:
ems@OdroidN2:~/Cmodel/ZZTest$ cat stubRCells.jl include("RCells.jl") RCells() #sleep(30)
Note the “sleep(30)” is commented out. Briefly used when I was playing around with the @async macro to delay finish of the stub so the child tasks could complete (not yet up on locking and such…)
This basically says “bring in the RCells.jl program, then run the “RCells()” function from it.
Then, the thing that does spit out the tasks.
ems@OdroidN2:~/Cmodel/ZZTest$ cat RCells.jl # # Climate Model via Cells # 16 Dec 2020 - E.M.SMith using Distributed; include("Rcell.jl") addprocs(4) # @everywhere function RCells() function RCells() println(" ") println("This is Cells - It launches per cell model runs ") println(" ") global celllist = (1:3) println(celllist) for year in 1:2 for month in 3:4 for day in 5:6 # @sync begin # @parallel for cell in celllist @sync for cell in celllist println("Spawn Cell ",cell," in year ", year," in month ", month," on day ", day) @time Dcell(cell) end # end end end end end #sleep(30)
“using Distributed” pulls in the Julia libraries with all the parallel code stuff so you can spit out threads on multiple processes. Rcell.jl (note lower case C and singular) is the main worker process that gets spit out for each cell. We pull that code in and say to make it available.
“addprocs(4)” says to use 4 new processes in which to run things (default is 1). I don’t know if it wins, or the “julia -p 6” wins. Testing / tuning “soon” ;-)
This first ran as an @everywhere function, but when I backed that out, it still ran, so isn’t needed. @everwhere says make this function available to all threads, but since it is just the top level “out spitter”, isn’t necessary.
The function has the for loops cut back from HUGE to dinky so the test output is manageable. I used unique integers for each loop just to make it easier to see what was working right.
I was playing with @parallel, @sync, and @async. For @async, if the top level ends before the workers are done, they get scrapped (thus the timer). @sync worked fine. I’ve not done anything with @parallel yet and need to re-read exactly what it does.
Then, you can see I spit out a message saying what it’s doing in that loop cycle, and then with the @time measure how long a cell takes as I invoke the cell itself. I still need to get all the parameter passing stuff in there, but at least now things cycle.
You invoke it with a statement as to number of processes available to run at one time:
time julia -p 7 stubRCells.jl
Here’s the end of the run statistics on time used:
After all of that, I'm done with Tropospheric for a while. 0.163094 seconds (94 allocations: 61.038 MiB, 17.92% gc time) real 0m21.130s user 0m57.412s sys 0m6.604s
The 0.163094 etc. is a @time macro inside one of the Julia programs for just it. The “time” in front of the julia -p 6 at the Linux level gives us the summary for the whole run at 21 seconds. Note we used 57 CPU seconds in that 21 seconds. Parallel is good ;-)
The Rcell.jl does the work, one per cell. What I think got rid of the “world time” conflict was putting the include() statements just after using Distributed and with @everywhere in front of them. I’m going to try commenting out “using Distributed” as it ought to work based on the calling program (I think, maybe). Then also the @everywhere (possibly)…
I briefly thought pf playing with this as a “module” instead of a function, so there is a commented out “module” definition line. (Playing for later ;-) Then there’s some commented out stuff from earlier trials.
ems@OdroidN2:~/Cmodel/ZZTest$ cat Rcell.jl # # Rcell.jl # Climate Model via Cells # 16 Dec 2020 - E.M.SMith using Distributed @everywhere include("Surface.jl") @everywhere include("SubSurface.jl") @everywhere include("AirLayers.jl") function Dcell(cellno) #module Dcell println("You have a cell! ",cellno) # Commenting out Surface() fixes world age for that one fn. Why happening? # Surface() # This didn't change anything # # Base.invokelatest(Surface()) Surface() SubSurface() AirLayers() end
So now that works, and it properly calls surface, subsurface, and airlayers that call all their functions as well.
I briefly tried telling things to sleep to get different completion times:
ems@OdroidN2:~/Cmodel/ZZTest$ grep sleep * RCells.jl:#sleep(30) Stratos.jl:# sleep(rand(0:2)) SubSurface.jl:# sleep(rand(0:4)) Surface.jl:# sleep(rand(0:4)) Tropos.jl:# sleep(rand(0:8))
Which it did, but without any clue as to CPUs in use. So I added some workload. I also removed printed blank lines in the output as I was getting a LOT in testing ;-) You can also see the prior run “sleep(foo)” commented out as load was being added.
ems@OdroidN2:~/Cmodel/ZZTest$ cat Surface.jl # # Climate Model via Cells # 16 Dec 2020 - E.M.SMith function Surface() # println(" ") println("You have reached Surface Physics. ") # sleep(rand(0:4)) # println(" ") A=rand(1:100,1000,1000) A * 2.4 end # Debugging self call. Works. #Surface()
The odd thing is the “self call” works in single threads, but seems to cause some kind of world time issue in multi-threads. Concurrency and multiple instances can be weird.
Adding math like that in 4 places only barely slowed it down. Create the array A and stuff it with random numbers between 1:100, in a 1000 x 1000 array. So 1 million elements. Then multiply each element by 2.4 (and take an int to float hit too).
I tried bumping that up to 10000 x 10000 but it turns out that 100,000,000 x 8 bytes for 64 bit is just a shade under a GB, then having a half dozen of those… well, I hit a memory / swap wall ;-}
So added the math work.
So now, over THAT particular hump, I’m at the point where I need to start filling in this “toy” framework with actual physics. One Lump At A Time.
I’m going to start with adding parameter passing (a bit trickier than usual as Julia really loves to make everything local to functions and inside loops and… So a bit of work just to figure out where your variable FOO turned into a 2nd FOO that doesn’t hold what you put into FOO as it is local (again) where you wanted to work on FOO #1 data…
Then it will be Sun Time. So I’ll work on “build world” to make an actual world, and then put sun into each panel (world not turning yet, just doing trig work). Then set it in motion in an airless dry world of more or less hexagonal facets jumping one hour at a time…
Then add radiative physics in the airless world.
After that, it will be just one step at a time adding substance to each of the other chunks called (AND working out all the interprocess communications…)
FWIW, run time for this was still somewhat dominated by Julia set up / compile time. Had to add those array math bits to get it slow enough. You get a big load of all cores for about 2 seconds as it has a bit of a think, then it starts cranking on work product. It took 4 of them Array / Math blocks to slow it down enough to get decent cpu use observable, and even then it looks like it finishes the math and hangs around a while as the print to screen happens.
ems@OdroidN2:~/Cmodel/ZZTest$ grep "A=rand" * Stratos.jl: A=rand(1:100,1000,1000) SubSurface.jl: A=rand(1:100,1000,1000) Surface.jl: A=rand(1:100,1000,1000) Tropos.jl: A=rand(1:100,1000,1000)
So, with that, I’m off to bed. (Always head to bed after a big success at coding. If you decide to “try one more thing”, before you know it it will be 6 am and the spouse is asking “Have you been up all night again?”. Why? Because you are SURE it is just one more little “one line fix” and you will have it!!! And it isn’t.