Graph Of Global Thermometers

Well, I’ve managed to make a rather better graph this time than last time. And via simpler machinations too ;-)

Scatter Diagram of GHCN Stations LON vs LAT with Python

Scatter Diagram of GHCN Stations LON vs LAT with Python

Big Empty in the center of South America and Africa. Thin in the Outback of Australia. North Asia a bit off. Canada with a big hole in the center. Alaska moth eaten. The poles sparse and the seas worse. Interesting visualization of thermometer density.

My first graph was modeled on an example that stuffed an array with two values then plotted the array. I wanted a plot of Longitude vs Latitude… and somewhere along the way thought I ought to be able to address individual data items in the plot call itself, skip the array. It worked.

In the prior posting is my trail of tears getting data loaded, dealing with the way Debian has to you install Python in several parts, and then making a crummy first graph. I think I’m getting the hang of it now ;-)

If you’ll remember, I’d loaded the data (from a CSV file) into a Pandas Dataframe named “df”. Here’s all it took to make this graph, after I had a bit of a think about it all:

>>> plt.scatter(df["LON"],df["LAT"])


I’m going to play around with adjusting the dot size smaller (so things are not just a blob as the big spots blend) and add some labels and such. Cut the range back to the actual limits of the data. That ought to keep me busy the rest of the evening. But this graph is good enough for you to see that there’s nearly nothing at sea or at the poles.

UPDATE – With Labels

Here’s one with some titles / legends added. The controls in the “dot” are many so I’ve not adjusted it yet.

GHCN v3.3 Thermometers w/headings

GHCN v3.3 Thermometers w/headings

The setting of Title, Labels, and limits are all done at the Python prompt via functions. I typed a at one point and that seemed to both block access to the function and then a crash out of the Python interpreter when I said to plot.. so I got to reenter and reload things too:

root@odroidxu4:/SG2/ext/chiefio/SQL/v3# python3
Python 3.4.2 (default, Sep 26 2018, 05:38:50) 
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib.pylab as plt
>>> import pandas as pd
>>> import numpy as np
>>> plt.title('GHCN Global Thermometers v3.3')

>>> plt.xlim(-180,180)
(-180, 180)
>>> plt.ylim(-90,90)
(-90, 90)
>>> plt.xlabel('Longitude')

>>> plt.ylabel('Latitude')

>>> df = pd.read_csv('invent.csv')
>>> plt.scatter(df["LON"],df["LAT"])


I think I’ll leave playing with the “spot” for tomorrow… I keep finding things where folks were playing just a bit too much like:

rng = np.random.RandomState(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '', 's', 'd']:
    plt.plot(rng.rand(5), rng.rand(5), marker,
plt.xlim(0, 1.8);



Well I couldn’t resist trying one more thing… so here it is with smaller dots. Now you can see the real density inside the continents.

Code change is just adding a s=x for size:

>>> plt.scatter(df["LON"],df["LAT"],s=1)

GHCN v3.3 Thermometers Small Dot

GHCN v3.3 Thermometers Small Dot

Essentially we have lots of data from the USA, Germany, China, Turkey, Japan, South Korea and the South of Australia. Everywhere else, a bit thin.


An area proportional sine projection as discussed down in comments. A bit hard to see the edges of the world, but that’s for another day. Just happy that it is now “area correct”.

GHCN v3.3 Sine Projection area correct thermometers

GHCN v3.3 Sine Projection area correct thermometers

Subscribe to feed


About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in AGW Science and Background, Global Warming General, NCDC - GHCN Issues, Tech Bits and tagged , , , , , . Bookmark the permalink.

74 Responses to Graph Of Global Thermometers

  1. jdseanjd says:

    Any thoughts on UAH vs radiosondes vs RSS?
    John Doran.

  2. rms says:

    Well done.

  3. E.M.Smith says:


    All the really good tech is too new and too short a record to be usable.
    All the long old records are with too sparse a data collection area and variable instruments to poorly usable.
    Splicing data together is horrible in calorimetry so that’s guaranteed to give bad results.

    I think that about covers the problems….

  4. E.M.Smith says:


    Thanks! Glad you liked it.

    I may bitch and moan about things in a tool that I find a bother, but that doesn’t stop me from learning how to use it… and using it for what it is good at.

  5. rms says:

    So that you can focus on the data analysis (and not on the computer setup), invest a few minutes to install and use Anaconda. Most (all?) the libraries you need (pandas, matplotblib, scipy, etc.) are all there. You can specify which version of Python you want (I see you are using an older version). Eliminates a lot of “bother”.

  6. E.M.Smith says:


    Thanks for the “tip”. I’ve been pondering what environment I ought to choose… FWIW, that’s the newest Python on this system build. The ARM support often runs a bit late… Maybe there’s an Anaconda package… I’ll have to “go fish” and see.

    FWIW, I’d also be interested in suggestions about a REPL or IDE for Python. There seem to be several and I’m really not interested in another round of “search time”. Or is Anaconda a working environment complete bundle?

    2nd FWIW: Now that I can get data in, graph it, and adjust the graphs: Next up on my task list is integrating the MySQL database. The CVS thing is just a short term kludge on one file to get me over the hump to “able to make graphs”. So it has done it’s job.

    I think I’ll take a couple of days to ponder what to graph from the temperature portion of things. That will also require a table join to get “Inventory” data connected to temperature items. Maybe time to get back to the DBA stuff for a couple of days…

  7. rms says:

    Re version of Python, with Anaconda you can install a number of them. I have python 2.7 as my stuff still at that version. Planning to upgrade to Python 3 this summer as I can’t afford to break things now–no time. I’m migrating to 3 very slowly as I come across things and occasionally running it under Python 3 to see how much breaks–too much for me to attend to now.. I guess I would suggest installing the version of Python you know works on your computer, but try the “latest” also in a separate Anaconda environment. My hunch is it will work, but what do i know… if it doesn’t work then delete that Anaconda environment. I’m running all my stuff on Mac, but when I get time I want to make it all work in Anaconda on my Ubuntu virtual env so I can pass it all on to my successor. (so much to do, with so little time).

    In addition to helping you with “bother”, it avoids the risk of messing up your system which may well depend on a certain version of Python (as does Mac OSX) to do things.

    Re IDE, I don’t use one. Just an editor and a terminal window. But my understanding is that Jupyter and its notebook is the way to go and what “most” (whatever that means) use. That *also* included with Anaconda (see the pattern here?). I have experimented with Jupyter. I get what they are trying to do and it looks good. I was thinking that when i got around to doing what you are doing (analysing climate data), I’d use Jupyter.

    Even if you don’t like Anaconda setup (but I can’t think any basis for not liking it since it eliminates so much “bother”), you can try and then just get rid of all of it.

    You may run across something not in Anaconda. Then use “pip” inside Anaconda. Works, but Anaconda has article(s) on the issues (of which I’m run across none) mixing pip installs. I have dozens of packages installed and only a handful of “pip” installed ones.

    I’m keen to learn more about pandas as I’m really a data junkie, not a programmer. Time shortage again prevents real progress. I have gotten a lot out of reading Wes Mckinney’s (inventor of Pandas) book “Python Data Analysis”.

  8. rms says:

    Oh, and should have mentioned the IPython shell, which is older and very popular. Again, i don’t use as finger memory doesn’t need it, but you might find useful. Mckinney prefers it. Covers it in his book. also in Anaconda.

  9. Heber Rizzo says:

    Very well done.
    I suppose that a map like this at sea at a given moment would be very difficult, maybe impossible, to make, isn´t it?

  10. rms says:

    Probably simple by filtering the time series data for the “given moment” in time of interest. Or make a movie and show the change of where thermometers are as time passes.

    Another thing that might be of interest is to get data on urban areas by latitude and longitude and count number of thermometers by urban or not urban … as indicative of risk of Urban Heat Island effect on the data. Not sure where data on urban areas is available but maybe somebody might know.

  11. A C Osborn says:

    Really good display considering it is only your 2nd & 3rd runs.
    Of course a really big version with the same dot size should also show how sparse the data actually is.
    As rms says a movie type action of working through the plot by date (year) would show how the data has changed and been changed over the years.

    ps after all your hard work any chance you will make the database available to others?

  12. Steven Fraser says:

    Hm. The Negative space of the Indian Ocean area kinda looks like a Pokemon Pikachu, (apparently hangs with Nixon.). Also interesting… the number of stations in central Africa. Seems I recall that monthly ‘record temp’ charts from NASA show records in south-central Africa where the accompanying charts have ‘no data’. Your chart seems to contradict the ‘no data’ claim.

    Also of some interest… Vertical scale goes to 100. North of the pole?!

  13. Steven Fraser says:

    @EM: what’s with the nearly horizontal line of sites in northeren Canada?

  14. rms says:

    RE “Vertical scale goes to 100. North of the pole?!” … all it is that he’s plotting numbers and the data stops at 90 deg, but Matplotlib made a “pleasing” scale to 100. I don’t see any dots above 90 deg.

  15. jim2 says:

    I went through the trail of tears with R. Of course, there is still a ton I don’t know about it as there is a huge number of libraries. But R graphics are awesome, but still R-cane like everything else in R. R also used a dataframe which is the primary data structure. One can use one program to create the data files and R to visualize them. I use RStudio.

    Of course, there are probably many easy-to-use graphics programs. Anaconda looks good as it can include just about any program you like.

  16. jim2 says:

    When it comes to “fixing up” missing records, it seems a good approach would be to define areas with similar weather, then use those to correct/in fill data missing from thermometers in those areas. To me this makes more sense than using arbitrary thermometers 1000 km away or whatever GISTEMP uses.

    But in the final analysis, the data is crap when it comes to climatology purposes, and whatever adjustments are applied probably don’t make much difference.

  17. Hifast says:

    @Steven Fraser:

    “@EM: what’s with the nearly horizontal line of sites in northeren Canada?”

    That’s the former DEW line, now North Warning System radar sites.

  18. Pingback: Graph Of Global Thermometers – Climate Collections

  19. E.M.Smith says:


    Was thinking about the time axis….

    A movie would be best, but that will be a while… Many stations are a short record, like 30 years, so you would see stations wink in and out. What I think I can do soon is either color by years or size by duration of record. Or just different frames by decade….

    This data includes the ship spots. But at some point figuring out what Hadley SST sources look like might be good.

    FWIW, the data are public and I’ve publish my “step by step” with code so anyone can follow and do what I’ve done. If needed, I suppose I could “dropbox” an assembled version (after some cleanup ;-)


    As there is an urban flag, I was thinking a map of just rural. Then maybe one with rural dots blue, urban red.

    @Steven F.:

    See the second update where I set the size to 90 x 180. Python seems to prefere some whitspace around the data by default.

    Remember that this graph is “station ever existed” so shows about 7000 dots, yet in 2015 or so there were only about 1200 active after The Great Dying Of Thermometers. So often the temps were recorded during a colonial era, not during wars, then badly post revolutions.


    There is no way you can “fix up” data for calorimitry. It is notorious for giving errors if not done with clean and complete data. IMHO, that IS what Global Warming is. Adjustment errors.

    Congrats on the R skill! On my someday list, but so far no joy…


    Thanks for that! I was wondering too :-)

  20. PJW says:


    Great work. The over-contribution of small parts of the globe to the the GHCN dataset is actually understated on this map since the Mercator projection dramatically distorts (enlarges) the further from the equator you go. I assume the Python library was written to use the Mercator projection. It would be interesting to see the same dataset graphed onto a more cartographically correct map. There are several, but all of them dramatically distort the poles, except the AuthaGraph. I doubt a library exists to plot on this projection, but the thermometer locations at the poles would end up closer together, while the areas of the US, Europe, Canada, and China would shrink in size.

  21. John F. Hultquist says:

    Good Morning,

    A C Osborn Wrote . . .
    “Of course a really big version with the same dot size should also show how sparse the data actually is.

    There would still be the problem of map exaggeration.
    Using the projection, often the Mercator, that seems to be the default with Python(?) and many world visualizations, the area “expands” as one moves toward the Poles. This assumes a centered map with the standard parallel at the Equator.
    This is often described thusly, Alaska takes as much area on the map as Brazil, when Brazil’s area is nearly five times that of Alaska.
    The mathematical relationship is a function of the angle of the Latitude: cosecant squared; (cosec = reciprocal of the cosine).

    Where EM wrote “Canada with a big hole in the center.“, that is both true, and also a misrepresentation, because Canada is shown larger that its true size relative to Brazil.

    I wonder whether the “dot size” could be adjusted (made larger) using the proper function, thus visually filling in some of the white area.

    I have an old fashioned globe (actually 2, one physical and one political) by my desk. These show their age as names and boundaries change, but are still helpful. I still use a watch with hands, too. And, yes, I’ve been called an old curmudgeon.

    Regarding “urban”
    I think one of the USA government agencies used night time light signatures to estimate urban size. I’ve not seen this mentioned for about 5 years, but it should be easy to find. Easy to use? Maybe not.

  22. PJW says:

    Turns out matplotlib supports a number of different projections.

  23. E.M.Smith says:

    @John H.F.:

    Yes, the dot size can change with a computed field. I’ve seen an example but can’t do it yet. Maybe tomorrow :-)

    There is a nightlight field in the Inventory data so I already have it. What to do with it is a different question….

    I’m entering that problem space that is the goal of a new ability… deciding what to do with it now as the potential manifests to consume months of time… so first ideas expand, then reality constraints narrow the plans…

    What to visualize? Decisions decisions….

  24. E.M.Smith says:

    FWIW, I’ve seen another site with 3d map onto globe for GHCN 4, so on my todo list is learn that tool too… It is a warmista site so I’m keeping a low profile there.

    One could also keep the rectangular LAT LONG borders above, but adjust x and y to a different projection. Visually it would distort but area would be right. Could also do 2 x polar views. Lots of choices…

  25. cdquarles says:

    @John H.,
    Well, GISS does use urban ‘night lights’ or did do that. I think others do, too; yet a catch is light can increase even though population didn’t; given that night light is a function of size and economic health plus light source change over time, in addition to population change.

  26. Larry Ledwick says:

    Yes for example look at central Texas night light high resolution image and you will see a huge number of spots in sparsely populated areas, with a little zoom in your can verify those lights are yard lights at oil drilling pads and pumping stations and each represents between 0 and maybe 5 people, instead of a shopping center or similar commercial lighting.

    Same goes for commercial strips like those areas that have miles of car dealerships lit up like a carnival all night long.

  27. I wonder why Turkey?

  28. E.M.Smith says:

    Contemplation is a wonderful thing….

    When I get time, I want to make one graph with only Baseline stations, the other with only current. Then the intersection is what is real, the rest is fantasy comparing Apples to Oranges….

  29. E.M.Smith says:


    For a while I was thinking “places with big US military presence” then thought maybe NATO (but Spain is a bit thin). Perhaps residual cultural influence from the Roman / Byzantine empire? Guess we would need to ask the Turks.

  30. ossqss says:

    @EM, are you going to do some Kriging to fill in those hole? ;-)

  31. E.M.Smith says:


    Snicker ;-)

    Wonder what the word is for REMOVING bogus infill non-data and highlighting the holes?… perhaps dekrigging? Or derigging…

  32. DaveH says:

    If you want ocean temperatures, Project Argo covers this planet:
    Scroll down for the position of the 3,900 floats in the last 30 days.

  33. John F. Hultquist says:

    The ‘night lights’ to urban and then UHI has become even more complicated because of “light pollution” and the shielding of some of the sources.

    As EM indicates, thinking of things to do is a lot less time consuming than doing them.
    [Time: For my wife, we are about to have an initial consultation with a shoulder-replacement team at the University of Washington medical complex, north side of Seattle. We live 2 hours away and Seattle is an unfriendly place regarding traffic. Assuming this operation happens, she is not supposed to drive for 4 weeks. Car time is about to expand bigly!]

  34. ej says:

    Gridded population data here:
    Global Rural-Urban Mapping Project (GRUMP), v1

  35. jim2 says:

    One nice thing about R is there are tons of help for the novice. Here is an animated spherical globe map.

  36. E.M.Smith says:

    So, Jim2, was that volunteering to be the globe plotter?….

    Or just showing off to motivate me?

  37. jim2 says:

    My dance card it full, or I would jump in. But the link does show the code.

    I don’t think maps should be an issue because we know the surface area of various regions. A spherical globe map probably isn’t necessary or even desirable.

  38. rms says:

    The first version of his graph was simple x-y linear (no projections involved).

  39. nickreality65 says:

    Just in case you missed earlier missives, allow me to summarize this science yet again.

    1) 288 K – 255 K = 33 C warmer with the atmosphere is rubbish. 288 K is a WAG pulled from WMO’s butt. NOAA/Trenberth use 289 K. The 255 K is a theoretical S-B temperature calculation for a 240 W/m^2 ToA (w/ atmosphere!!) ASR/OLR balance (1,368/4 *.7) based on a 30% albedo.

    By definition no atmosphere includes no clouds, no water vapor, no oceans, no vegetation, no ice, no snow an albedo perhaps much like the moon’s 0.15. 70% of the lit side would always be above freezing, 100 % for weeks due to the seasonal tilt, not that it matters since there would be no water to freeze.

    Without the atmosphere the earth will get 20% to 40% more kJ/h depending on its naked albedo. That means a solar wind 20 to 30 C hotter w/o an atmosphere not 33 C colder. The atmosphere is like that reflective panel behind a car’s windshield.

    2) The 396 W/m^2 upwelling ideal BB LWIR that powers the RGHE is, as demonstrated by experiment, not possible. If this upwelling energy does not work – none of RGHE works.

    3) The 333 W/m^2 up/down/”back” GHG energy loop is thermodynamic nonsense, i.e. it’s calculated energy appearing out of nowhere, a 100% efficient perpetual energy loop, energy from cold to hot without work. (396 – 333 = 63) “net” radiation is thermodynamic nonsense.

    4) 1) + 2) + 3) = 0 RGHE & 0 GHG warming & 0 man caused climate change.

    I’ve got the science. If you have some anti-science, BRING IT!!

    Nick Schroeder, BSME CU ’78, CO PE 22774

    P.S. According to NOAA the current rate of sea level rise is 3 mm/y. That’s not even a foot per century.

    P.P.S According to JAXA and DMI the sea ice and ice cap volumes have not deviated significantly from decades of natural variability.

  40. Steven Fraser says:

    @EM: I am not able to volunteer for the coding, but I can offer some suggestions on (what i think may be) useful animations if you should choose to try.

    You mentioned above that the current rendering is an ‘ever existed’ one, which is a 3-dimensional data( Long/Lat/time) folded into 2D. (Long/Lat). What might be instructive is a representation where ticks of monthly time show up as frames, and sites appear and disappear on a frame-by-frame basis. With each available frame (consider it a sample for discussion purposes) the average temp of the frame’s readings are calculated and shown, and perhaps mean/median and mode as well. Across the bottom of the display, a vertical-scale temp graph point is shown to represent the frame. Horizontal scale ticks are semi-decades. Putting the sequence of frames in motion, each frame shows the available stations for the frame, the calc’ed averages and stats, and the plotted avg temp. Roll frames…

    Additional refinements, in no particular order:
    1) Calculate the average temp of the entire selected data set, and show the temp plots in relationship to it.
    2) Using geography of the stations, assign them to select-able continents, oceans, and latitude zones, and do runs using just those locations, including the avg temp line optionally.
    3) Using timestamps, constrain runs to particular year ranges.
    4) Color the station dots on a temp scale
    5) Make the frame size variable, and select-able for high temp, low temp and average.
    6) Make the display rate of the frames variable in 10ths of a second.
    7) Pause/step/run/FForward/FReverse controls.
    8) Anomaly rather than absolute temp scale.

    In places with high station count density, the combination of the refinements should allow display of propagating fronts, motion of air masses of different temps, and other fun stuff.

    Just some ideas.

  41. tom0mason says:

    Great work E.M.,
    The only problem I see is that the map you use does not really show the true area of each country.
    For instance the USA, or Europe, or Australia can fit within the Antarctic land area but South America is a bit larger than the Antarctic land area, and Africa is about 2x the Antarctic land area.

    From this I see that African station are in reality very sparse on the ground.

  42. Larry Ledwick says:

    @Steven Fraser says:

    What might be instructive is a representation where ticks of monthly time show up as frames, and sites appear and disappear on a frame-by-frame basis. With each available frame (consider it a sample for discussion purposes) the average temp of the frame’s readings are calculated and shown, and perhaps mean/median and mode as well.

    I think borrowing the method used by Bill Warner for his 1400 year battle map of Islamic conquest might fit that idea, have the stations appear as white dots and then as they age shift to another color, with decade shifts. 20 dark gray or something like that.

    Bill Warners battle map video clip

    Also select groups by altitude or proximity to large oceans or lakes or by population density might also show some interesting trends.

  43. E.M.Smith says:


    Yeah, that’s the problem of a simple LAT vs LONG “projection”. To really fix it requires some equal area projection or a globe. I think I could just multiply LAT by cosine(LAT) and get an area adjustment, but it would lose all appearance of a recognizable map. (Essentially everything above, for example, 10 deg would be in 0.17364818 of the width of the graph near the top narrowing to 0 at the top. It would look funny. I think this is roughly it, though the formula is a bit different ( I think because it allows for some meridian other than 0)

    x =(LONG-LONG_meridian) x cos(LAT)

    So if your meridian is zero it becomes: LONG x cos(LAT)

    It would be easy to do, even if strange.


    It is also possible to have dot sizes & shapes change and transparency change. It is work, but doable. ( I have the “how to” roughly learned so with “consult reference examples” and error correct iterations could do it now. After a few dozen uses it will be like nothing…)

    So one I was thinking of was plotting the “Baseline” stations as 50% transparent pink X, then plot the last decade of stations as dark dots. Only the soft X with a dark dot in the middle is a “real” comparison to baseline. Everything else is an apples to oranges…

    That kind of thing can be generalized to some extent (limited by how many ‘overlays’ become a muddy mess) so you could have dot, x, circle, triangle for each of, say, 30 year periods and cover 120 years. As GHCN is roughly 200 of “worthwhile” data that’s getting close (and covering all the parts that really matter)

    See some examples here:

    these same libraries ought to work in Julia (hint hint ;-) then I’d be encouraged to use it along with Python… Though I do find it interesting to observe how after a week of “learning the language & MySQL” part time, on the first real graph: ideas for variations expand the workload to weeks more that could be done with just that latest data view (map of world)… It is interesting that the capacity granted by learning the tool causes an exponential increase in the work function desired… And people are worried about automation taking their jobs… it just changes them into other jobs…

    Oh, and one

    Having played with this for a little bit, I need to abandon my original intent. I started out thinking I’d just load all like data into like structures for GHCN v1, v2, v3, v4 etc. then “compare and contrast”. With that partly done, I was looking at “reconciliation” – basically mapping them onto ONE key (so making country uniform when it is different between v1 and (v2, v3). I’m now thinking I need to place v1, v2, and v3 each in a unique table (more normalized) to make things like individually mapping each one easier, but then use a JOIN when I want to find differences… Essentially swap VERSION from a field to a Table_Name attribute and have separate tables. Basically a choice of “Do I filter for one Version on every report & graph? Or: Do I JOIN versions for compare & contrast?”

    There’s also a choice of “do both”. Since data load seems to take nearly nothing, and having multiple schema is trivial, I can also just use different loads for different purposes.

    But what is clear is that the original vision of “put them in one big table then work it” is not ideal.

    But that is what makes the rapid prototype method good. I’ve done a load of GHCN V3 data, and both v1 and (v2, v3) country codes / inventory; so basically not much. To change to multiple tables by version is still a low time-cost activity. It’s about a 20 minute job to dump those, remove the redundant fields from the table schema, recreate the tables and reload them. Then I can still do the original goal step of “compare and contrast” but via JOIN rather than a simple table scan; while making all the other graphing and analysis for any given version simpler and cleaner.

    Well, I need to talk about it less and do it more for a little while ;-) so y’all knock it around while I do some coding ;-) Back as soon as I have something interesting…

  44. E.M.Smith says:

    FWIW it looks like Python cos() is radians so I’d need to do:


    Return the cosine of x radians.
    9.2.4. Angular conversion
    Convert angle x from radians to degrees.
    Convert angle x from degrees to radians.

  45. Larry Ledwick says:

    Hmmm interesting.

    The windows environment is just such a kludge I am not going to mess with it much, although it will be handy for a high precision calculator that can process normal algebraic formulas.

    I got the sysadmin to install Julia on one of the test servers today so I can try some things to use it for work functions.

    I will probably set up a virtual box here on the windows machine at home, to run Julia in a Linux environment but my learning curve is very steep right now, so don’t expect a lot of progress in the short term.

    For practical purposes I am starting from scratch. Unlike you I have not done much coding or troubleshooting of code, I just beat ksh into submission occasionally for simple tasks at work.

    The reason I am looking at Julia is to see if it is a better option for my needs than Perl. There are times the things I want to do with scripts, bump into the limitations of ksh and bash scripts.

  46. E.M.Smith says:


    OK, despite the manual page saying “math” is always available, you must first import it…

    >>> plt.scatter(df["LON"],math.degrees(math.cos(math.radians(df["LAT"]))))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'math' is not defined

    Then, for reasons beyond me, it doesn’t like a DataFrame field as input to a math function even when that object is a float:

    >>> import math
    >>> plt.scatter(df["LON"],math.degrees(math.cos(math.radians(df["LAT"]))))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3/dist-packages/pandas/core/", line 69, in wrapper
        "cannot convert the series to {0}".format(str(converter)))
    TypeError: cannot convert the series to <class 'float'>
    >>> df["LAT"]
    0     36.93
    1     36.83
    2     36.72
    3     36.52
    4     36.80
    5     36.72
    7275    66.00
    7276    30.00
    7277    50.00
    7278    47.00
    7279    34.00
    Name: LAT, Length: 7280, dtype: float64
    >>> RLAT=math.radians(df["LAT"])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3/dist-packages/pandas/core/", line 69, in wrapper
        "cannot convert the series to {0}".format(str(converter)))
    TypeError: cannot convert the series to  <class 'float'>

    For a “dynamic type” language it sure is picky about types…

    So, OK, I’ve got to learn some new trick or other. Maybe stuff the dataframe field into a float array and hand that to the math functions? Whatever…

    For the non-programmers: This is the kind of stuff that Programmers Do all the time. It’s a bother and it’s largely a waste of time, but it is what the “art” consists of. Finding the “rules” and “limits” to some language or library function, then finding the tools to get past them. That I’m a Noob to Python means I get to learn a lot of these. Colloquially “bouncing off the walls” as you run into them and learn the various “tricks” and idioms that the “old hands” in that language all do without thinking about it. Designing a new language is largely based on putting into your new language facilities to get rid of the need for various common ‘work arounds’ of such issues. No language has yet succeeded at eliminating them…

    I’m putting in comments showing me “bouncing off a wall” for 2 reasons, really. First, so folks who don’t program (much) can see what it really is like. Unlike the norm of only posting the end state when everything is worked out and perfect. (Managers complaining about how long it takes to write a program take note…) The second is so folks who do program can see that if they chose to do some of this in these languages/tools; this is the “Here there be Dragons!” spot. So, for example, you already know you need to import the math library and that math.cos() “has issues” using a dataframe field…

    Well, and sort of a 3rd: If anyone already knows how to “fix it”, holler ;-) Not expecting that, but if you happen to know you could cut short the joy of watching me bounce off walls by giving a pointer ;-)

  47. rms says:

    No way I try try to replicate here nor have I ever done this sort of computation, but a few thoughts.
    : The error message is trying to print the value of the field “converter” and there appears to be nothing there. Is this a clue? maybe look at the source in /usr/lib/python3/dist-packages/pandas/core/ and see what “converter” is expecting to be?

    : Does give any useful advice?

    Out of ideas on this one.

  48. E.M.Smith says:


    If you already write ksh / bash / sh scripts then you already have all the needed programming skills. It’s just another language…

    FWIW, most of the decades between about 1990 / 2005 most of my programming was shell scripts of one sort or another. SysAdmin is a LOT of that. I did manage some projects in other languages (including C++ and Java) but other than reading a few pages of folks stuff to get a feel for it, didn’t write any.

    IMHO it isn’t correct to use “just” with “shell programmer”. It is in fact an incredibly flexible and interesting language. The basic idea is very similar to FORTH where you have a “dictionary” (your bin directory) where you define new words (commands) that can use any other already existing words. This is a very powerful concept that, IMHO, is superior to library based extensions of a language. “All the usual stuff” is just on your search path (so no ‘import foo’) and any additions just show up. Then, being interpreted, much of it is open to inspection / learning.

    On ANY system where I first get an account, there are a couple of “trivial” things I toss into my bin directory. First is named “allow”. Trivial and largely just because I don’t like typing “SHIFT +” in the “chmod +x [foo]” command. Oh, and “bcat” cats or prints out any command in the bin directory:

    root@odroidxu4:/SG2/ext/chiefio/SQL/bin# bcat allow
    chmod +x $*
    root@odroidxu4:/SG2/ext/chiefio/SQL/bin# bcat bcat
    cd /usr/local/bin
    cat  $1

    Then I have “cmd” that makes new commands:

    root@odroidxu4:/SG2/ext/chiefio/SQL/bin# bcat cmd
    cd /usr/local/bin
    vi $1
    allow $1

    Note how “cmd” uses “allow”…

    There’s a dozen more in my normal “bootstrap me” set. But this “each building on the other” is what makes it a fast and pleasant environment for me. I can, just by saying “cmd frog”, create a new command named ‘frog’ that incorporates any of my prior stuff into it. No need to worry about where it goes, type long path names, remember to chmod it to add eXecute permissions. I use that more than anything else… Everything I do gets a command where the specifics get built and debugged.

    root@odroidxu4:/SG2/ext/chiefio/SQL/bin# bcat fort
    vi $HOME/fortran/$1
    gfortran $HOME/fortran/$1

    Lets me say “fort invent.f” and it will toss me into the editor to write, then when I exit the editor, compile it and if there are errors, I can just to “!!” (repeat last command) to cycle again. (I often use ${2-/path/name} for the place in scripts so by doing “fort invent.f ./v3/bin” I could change where things go, but it isn’t in this particular version)

    At its core, this threaded interpreted command reuse along with pipes is the core of what makes *Nix so flexible, useful, and powerful. That ought never be “just” anything… IMHO. In fact, I often chaff at there being no such facilities inside other “environments” for more “advanced” languages…

    So all the “usual” things are in script languages. Assignments, substitutions, macros (of a sort), control methods for flow of control, input, output. They are largely complete and functional languages. The use of ../bin/… is very much a “subroutine library” with “passed parameters”. Just a lot of the “bother” of compilers and binary modules is rinsed out ;-)

    Translating those skills to any other language is not hard. Mostly, IMHO, getting over the fact that they are less flexible and offer less opportunities for “dictionary building”…

  49. E.M.Smith says:


    Thanks for the ideas. I’m not “expecting” at anyone. If someone happens to know, great. If not, well, I’m expecting that I’m going to do all my own debugging anyway, so no loss / no worries. Anything given is a gift and gravy, nothing is expected of others.

  50. rms says:

    I know. Offered in same spirit! (I find someone over my shoulder is useful also!)

  51. E.M.Smith says:

    The core question I see is “Why is it trying to do a type conversion on a float64 anyway?”. Is it float vs float64? That ought not happen in a dynamic typed language. Is it a limit of the input to the function? (That the “df[“LAT”]” is being passed in and it thinks “LAT” is a string literal and doesn’t understand that whole df dataframe function reference bit? I think that’s the most likely… (but I am also likely wrong… more walls to explore ;-)

    So FWIW, my first approach will just be to try and suck a value out of the df and into a float and see if I can make it work at all… Clearly it works with a literal:

    >>> math.cos(2)

    So I’m just going to build up from there, one brick at a time, until the error returns.

    >>> math.cos(math.degrees(45))


  52. E.M.Smith says:
    >>> math.degrees(math.cos(math.radians(45)))

    Works and looks more or less right… and I have math.degrees where it really belongs unlike the prior example…

    while this fails:

    >>> math.degrees(math.cos(math.radians(df["LAT"])))
    Traceback (most recent call last):
      File "&lt:stdin>", line 1, in <module>
      File "/usr/lib/python3/dist-packages/pandas/core/", line 69, in wrapper
        "cannot convert the series to {0}".format(str(converter)))
    TypeError: cannot convert the series to <class 'float'>

    So it’s something about the passing in of a df bit. As my best guess ATM…

    >>> FLAT=df["LAT"]
    >>> FLAT
    0     36.93
    1     36.83
    2     36.72
    7277    50.00
    7278    47.00
    7279    34.00
    Name: LAT, Length: 7280, dtype: float64

    But despite reference to FLAT to see what’s in it, at the bottom it calls it LAT, so I’m wondering if this is one of those “assignment of a pointer not the values” issues…. and this fails:

    >>> math.cos(math.radians(FLAT))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3/dist-packages/pandas/core/", line 69, in wrapper
        "cannot convert the series to {0}".format(str(converter)))
    TypeError: cannot convert the series to <class 'float'>

    The other option is that maybe math.radians() can’t take an array and being passed an “array by reference” causes it to barf when it tries to do a type conversion on the array name / reference…

    Ah, the joys of working with a library function without the technical manuals for the library…

  53. rms says:

    See the link above where it was suggested about sending values vs. arrays to math functions, with the suggestion of using numpy.

  54. E.M.Smith says:

    Ah, I think this is clue:

    >>> TEST=np.zeros(10)
    >>> type(TEST)
    <class 'numpy.ndarray'>
    >>> math.cos(TEST)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: only length-1 arrays can be converted to Python scalars

    So it is explicitly expecting a scalar and even a know-to-be-float-array barfs, but with a better error message.

    Then this discussion:

    point out that the “feature” in Python of putting heterogeneous things in a variable array makes them really just all be “lists” and so not what you think of as an array of floats… (which is why I thought to try that numpy array approach).

    For some unfathomable reason, Python calls arrays “lists”. The “everyone-knows-what-this-is-called-so-we’re-going-to-call-it-something-else” school of language design. It’s a particularly bad choice of name, since it looks like a linked list rather than an array.

    @Glenn Maynard: probably because in C-like languages arrays are fixed length while Python lists are not. Its more like STL vector in C++ or ArrayList in Java.

    It’s called a list, because it’s a list. [A(), 1, ‘Foo’, u’öäöäö’, 67L, 5.6]. A list. An array is “an arrangement of items at equally spaced addresses in computer memory” (wikipedia).

    Nothing about the universally-understood term “array” suggests a fixed length or anything about the content; those are just limitations of C’s particular implementation of arrays. Python lists are equally spaced (pointers to objects, internally), or else __getitem__ wouldn’t be O(1).

    @Glenn, from : “the elements of an array data structure are required to have the same size” (true for Python’s arrays, not true for Python lists) and “set of valid index tuples and the addresses of the elements (and hence the element addressing formula) are usually fixed while the array is in use” (not true in Python for either list or array).

    I find the use of wiki as an authoritative source for Computer Science Jargon laughable, but the discussion did enlighten me as to why this returned ‘type’ as ‘list’:

    >>> TEST=[1,2,45,45,5]
    >>> type(TEST)
    <class 'list'>

    So basically I think I’m being “bit” by the interaction of “list processing” with “only give me ONE float” function spec.

    OK… So I likely need to do something to NOT depend on that nifty feature of handling the whole list as an item non-procedurally and instead do an equivalent of FOR I = 1,7280 DO Floatvar=df[LAT of I] COSLAT=math.cos(Floatvar) ENDDO; ENDFOR; (or whatever that pseudocode turns into in Python…)

    Guess it’s time to learn / practice those looping control structures in Python… and indexing a dataframe…

  55. E.M.Smith says:


    Yes. They were talking about math.log not math.cos but it does look like the limitation is consistent.

    The “go fish in numpy” is perhaps useful. From what I’ve read about it (not much) it says it is array math extensions to Python; so IF it has an arraycos() function that might do it. Or the “old school” direct iteration I described just prior.

    I guess it comes down to which I want to learn more, the numpy library extensions or the Python flow control operators ;-) Maybe I’ll do both, as this IS mostly a “learning experience” with the “work product” mostly just a side effect. (in a few weeks to months that will shift to product focused, learning as needed).

  56. rms says:

    Numpy and Scipy may well become your friends.

  57. E.M.Smith says:

    Pointer to scipy and search for numpy cos yields The Answer ( I think…)


    numpy.cos(x, /, out=None, *, where=True, casting=’same_kind’, order=’K’, dtype=None, subok=True[, signature, extobj]) = <ufunc ‘cos’>

    Cosine element-wise.

    x : array_like

    Input array in radians.
    out : ndarray, None, or tuple of ndarray and None, optional

    A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned. A tuple (possible only as a keyword argument) must have length equal to the number of outputs.
    where : array_like, optional

    Values of True indicate to calculate the ufunc at that position, values of False indicate to leave the value in the output alone.

    For other keyword-only arguments, see the ufunc docs.


    y : ndarray

    The corresponding cosine values. This is a scalar if x is a scalar.

    So there IS a numpy.cos function that handles the whole array. Now I just need to find a numpy version of radians and degrees ;-)

  58. E.M.Smith says:

    Oh Boy! Actually having the documentation to know what I’m doing ;-)

    OK, so still “some assembly required” and no doubt a few more walls to bounce off of, but at least I know 1) What is the problem? and 2) Here there be answers.

  59. E.M.Smith says:

    Again with the import… but it looks like it works:

    >>> PLAT=df["LAT"]*numpy.cos(numpy.radians(df["LAT"]))
    Traceback (most recent call last):
      File "", line 1, in 
    NameError: name 'numpy' is not defined
    >>> import numpy
    >>> PLAT=df["LAT"]*numpy.cos(numpy.radians(df["LAT"]))
    >>> PLAT
    0     29.520740
    1     29.479381
    2     29.433540
    3     29.349268
    4     29.466914
    5     29.433540
    6     29.328014
    7272    31.959975
    7273    31.819805
    7274    31.044425
    7275    26.844618
    7276    25.980762
    7277    32.139380
    7278    32.053923
    7279    28.187277
    Name: LAT, Length: 7280, dtype: float64

    Despite still calling it “LAT” at the bottom, note the values are different from those listed above.

    Guess I need to try plotting it and see what I get…

  60. E.M.Smith says:

    OK, I had the formula for LAT vs LON being multiplied by the COS function a bit off. Here’s what made the correct graph (which I will add to the listing above… it does look kind of neat and is of that sine type)

    >>> PLON=df["LON"]*numpy.cos(numpy.radians(df["LAT"]))
    >>> plt.scatter(PLON, df["LAT"],s=1)

    So one takes the LAT in the dataframe, converts it to radians, hands that to cosine function then uses that to scale the LONgitude…

  61. E.M.Smith says:

    into each glow of success a little gripe must fall…

    So I set out to create a border by stuffing all the “LONG” 180 -180 values with a cos adjusted spot that I was going to color… only to be “reminded” that you can’t just do “for y in range(-90,90)” and then use y as an array index. The doc says that in Python all arrays start at 0. A negative number counts back from the long end (so an array of 100 elements, address -9 gives element 90 – remember that zero so 100 ends at 99… and folks wonder why I like “The American Way” of arrays starting at 1 and element 100 is number 100…) BUT I was forever corrupted by my Algol immersion at a young age… Where you can chose any start and end you want. So an array that starts at -90 and goes to 90 is Just Fine.


    I know, not the end of the world. I’ve dealt with it before in other languages. So I need to go from 0 to 179 and then from 0 to 359 and stuff the values in, and then pull them back out again as the correct coordinates and…

    Or just find some other way that doesn’t do a procedural iterative stuffing of an array… Maybe some kind of dataframe stuffing or…

    Or maybe I’ll just ‘let it all go” until tomorrow. After all, I’ve made an area proportional correct map and maybe that’s enough for one evening… making a border for the world and decorating it with axis labels and all can wait…

    The good news is that I did get nested FOR loops to work… ;-)

  62. Simon Derricutt says:

    EM – the “bouncing off the walls” with a new language was something I thought was just me not being smart enough when trying to get something working in a new language. I frequently come across that “you can’t get there from here” situation, and have to go dive into the manuals to find out where I need to be before I can get there. That’s largely why I used to write stuff in assembler because it did exactly what I asked of it and didn’t have surprises, and I had a library of functions that also did exactly what they were supposed to after a few years of that, and they just needed to be put in and connected. Most of the effort thus went into writing the new functions needed. Any new subroutines ended up being put into the “library” with comments saying what comes in, what goes out, and what it does in the meantime. Main problem with assembler is that you’re limited to one architecture. On the other hand, it does run very quickly relative to writing in C, by around an order of magnitude based on my experience of what the bosses said was possible and what was actually achieved.

    It seems the majority of the effort in understanding a new language is to get your head around the logic of the people who developed it, and their choices don’t always seem logical. In your examples above, the machine code has to get the latitude data from the database and pass it sequentially through the subroutines that read it as a sequence of bytes, convert that into float format, convert that into a cosine, multiply that by longitude, put that result into an array, then cycle through the output array and plot it. Piece of piss, really, and the effort goes into getting the syntax right so that the interpreter/compiler doesn’t barf or do what you don’t expect it to. The Python version of that fits in one line, whereas the assembler version might be more lines of calling subroutines and a few pages of subroutines (and programmer production rate is a few pages per day (normally 1 page) no matter what language is used, so a compact language will please the accountants). In the end it’s going to be bits in registers and in memory locations, and simple operations on those. It’s nice if a language doesn’t try to cover up those basic mechanics of what’s actually happening. Still, that does seem to be where people try to head with OO programming. Likely when you’ve internalised the logic of the language you can write programs a lot faster, so programmer productivity goes up – but you need a faster/bigger machine to run it ever after because the nested subroutines have a lot of overhead in passing the data between them.

    There’s a problem of whether you pay for something up-front or later on. Using a high-level language costs less up front and costs more each time you run it. Doesn’t just apply to programming, though. I once took an autorouted board with 6 layers and around 6000 vias (including buried microvias) and spent a couple of weeks hand-laying it down to 4 layers, 3000 vias of minimum size 10 thou, and all through-vias. Lower technology needed to make the board and it was a bit less than half the cost per bare board. That also reduced the number of bare-board failures, too, and increased the reliability over time, and meant we weren’t limited to one supplier who could actually make the bare board. It’s a bit odd that the engineers who initially produced that board were pleased with the result because it was high-tech and pushed the boundaries of what was possible, whereas that’s not really what you should aim for unless it’s the only way you can get a result. Still, autorouting cost them only around a day of machine time where most of the effort was an hour or so of human time. Maybe that’s also a pointer that what seems logical to someone else may not seem logical to me, and hence the difficulties I have with a new programming language.

    It’s useful to see the process from another point of view, and to realise it’s not just me.

  63. Pingback: W.O.O.D 5 February 2019 | Musings from the Chiefio

  64. E.M.Smith says:


    It isn’t just you. It’s every programmer I’ve ever known or taught. ( I taught at a local Community College for a few years). Everyone explores the limits of the new language and discovers their expectations are not always met. Glad I can help confirm that for you 8-)

    FWIW I always like assembly language too. It isn’t hard at all (for my anyway). You do need a machine in mind, but that’s not hard either, really. Yeah, it is a lot of lines, but each line is only a dozen or so CHAR long. Never really understood the folks who thought it was hard or a problem to maintain.

    FWIW, one of my first programs was for an Altair MITS 8800 in assembly. Hand loaded as binary via the panel switches as one of the first tests of the box post construction from the kit:

    Started at mem zero, copied itself incrementally to the top of mem then halted as it exited the memory space. IIRC it was about 10 lines…

    I wonder if anyone writes “Hand Assembly” for the various byte code VMs out there…

  65. Larry Ledwick says:

    Quick summary of Julia as it compares to other commonly used languages for mathematical programming.

    Click to access julia%20intro%20slides.pdf

    Note difference in quick sort speed between Julia, python and R slide 9 or 19

  66. E.M.Smith says:


    I like the graph of relative speed too. With FORTRAN being THE fastest on almost everything. Then folks wonder why so much supercomper stuff is still coded in FORTRAN… Because when you are costing $2000 / hour to run, saving a few hours matters!

    That Julia gives those numbers a very good challenge is very interesting to me. That it is natively parallel and the ARM FORTRAN does crappy parallel implementations says it might just be THE fastest thing on my Arm Stack for a good long while and well suited to what I’m trying to do. (Parallel math heavy codes on a stack of cheap SBCs). That it is already running scaled to 1000s of CPUs says it will have no problem on a gaggle of such boards.

    Then, that all the library stuff I’m learning for use with Python is portable, and that I can ‘bust out’ to a chunk of C code or FORTRAN if for some reason I need it, just gravy over the whole thing.

    I figure about a month, maybe two, I’ll have enough done with Python Pandas & Graphs that I can take on my first shot at Julia. (Maybe less if I get some spare time or Python tosses a log in my path ;-)

  67. Larry Ledwick says:

    That bugaboo of finding references which match your logic/learning style is one of the things about programming languages that I often have issues with.

    Some authors simply are talking in greek to me, just jargon word salad that does not say anything to a beginner, others assume you are already fully competent in other languages and use references to those languages to explain features. That is useless if you have never addressed that issue before. A very few approach it as if you are a first time learner to this specific language, and actually talk in conversational english. The Sams books tended to fit my learning style better than most, as they used a step by step modular approach to common tasks.

    For example the book that helped me most with learning basic unix commands is just a dictionary style book listing all the major commands with simple short paragraph statement of what the command does and a syntax reference for the most common usage. It was a great reference to quickly find out how to get there from here or at least know what commands to dig into in the man page. Sometimes its human readable english summary made more sense than the man page.

    I simply scanned through the book and read through the summaries of all the commands so I had a base knowledge of what commands were available then started playing with the 10-15 most needed commands until I got the tools scripts I needed to write for my job to work. I never got into the more obscure features – being a pragmatist, – – – I need to do xyz – – – Oh this command does Y and this command does X now all I have to do is figure out how to do Z and glue them together.

    If anyone knows where I can find a canonical list of all the commands that Julia recognizes I would appreciate a link it. That would allow a quick scan of the list to see if a familiar command is supported and save a bit of time.
    A web search starts off with a few julia programming language links but quickly branches out into endless links to Julia Childs books or some such. (unfortunate to use a common first name as the name of the language in that respect)

  68. jim2 says:

    LL – may not be exactly what you have in mind, but …

    Click to access Julia-cheatsheet.pdf

    You have to follow a couple of links to get to the info, but it’s pretty complete …

  69. Larry Ledwick says:

    Yes already got those and so far as close as I have come to what I am looking for. Looking for the Julia language equivalent to O’Relie’s Unix in a Nutshell or Sybex Unix Desk Reference.
    Problem is that Julia is just too new for some of that stuff to be available yet.

    The man page equivalent in Juila is the help function which unfortunately requires you to ask for help on a specific command, but like man pages it does list related commands.
    If they had an index listing of all the supported commands that would be a first step.


    help?> read
    search: read read! readdir readlink readline readuntil readlines readchomp readbytes! readavailable ReadOnlyMemoryError isready

    read(io::IO, T)

    Read a single value of type T from io, in canonical binary representation.

    read(io::IO, String)

    Read the entirety of io, as a String.


    julia> io = IOBuffer(“JuliaLang is a GitHub organization”);

    julia> read(io, Char)
    ‘J’: ASCII/Unicode U+004a (category Lu: Letter, uppercase)

    julia> io = IOBuffer(“JuliaLang is a GitHub organization”);

    julia> read(io, String)
    “JuliaLang is a GitHub organization”


    read(filename::AbstractString, args…)

    Open a file and read its contents. args is passed to read: this is equivalent to open(io->read(io, args…), filename).

    read(filename::AbstractString, String)

    Read the entire contents of a file as a string.


    read(s::IO, nb=typemax(Int))

    Read at most nb bytes from s, returning a Vector{UInt8} of the bytes read.


    read(s::IOStream, nb::Integer; all=true)

    Read at most nb bytes from s, returning a Vector{UInt8} of the bytes read.

    If all is true (the default), this function will block repeatedly trying to read all requested bytes, until an error or
    end-of-file occurs. If all is false, at most one read call is performed, and the amount of data returned is device-dependent.
    Note that not all stream types support the all option.



    Run command and return the resulting output as an array of bytes.


    read(command::Cmd, String)

    Run command and return the resulting output as a String.


    So to find out if a command you are familiar with is supported you need to run

    help?> (command name)

  70. Pingback: GHCN v3.3 vs v4 – Top Level Entry Point | Musings from the Chiefio

Comments are closed.