A GHCN Temperature Database Install Script

As I’ve had to do this at least thrice now, and was doing it again to get the Ordoid XU4 back up to speed, I decided to consolidate the “what to do” into a script. This is a little less than polished as there are variations needed depending on the OS you are using and it really ought to check first to see if an application is installed before installing it. Yet apt-get is smart enough to tell you if you already have the newest version and do nothing so that part is fine anyeay.

There’s also a manual step inside the database that could possibley be automated, but as it is uniqe to each user and all of one line anyway, what’s the point again?…

Since I use FORTRAN for the data pre steps, it starts with installing (assuring it is installed…) the gfortran compiler. Then it goes on to install MariaDB, Python3 (that is often there already), and a bunch of detailed stuff. I’m pretty sure this is complete, but as I “shake down” the present installation I’ll update this script if needed. When I ran it, Python3 was already installed and it didn’t need the Raspberry Pi specific “intall libmariadbclient-dev” and just said it could not be found.

root@odroidxu4:/SG2/xfs/chiefio# cat builddb 
echo
echo "A Script to install MariaDB -mysql replacement- along with the"
echo "needed Python3 stuff do do the temperature database stuff"
echo "I've developed"

echo
echo apt-get install gfortran
echo

apt-get install gfortran

echo
echo apt-get install mariadb-server
echo

apt-get install mariadb-server

echo
echo "Now do the security set-up"
echo
echo "mysql_secure_installation"
echo

mysql_secure_installation

echo
echo "Does it still have the root Unix socket login block?"
echo
echo "su 'your id' and find out"
echo "mysql -u root -p"
echo

su chiefio
mysql -u root -p

echo
echo "If so, then use another user.  Su to root, then login w/o password"
echo "or reset the use of the socket like:"
echo "$ sudo mysql -u root"
echo 
echo "[mysql] use mysql;"
echo "[mysql] update user set plugin='' where User='root';"
echo "[mysql] flush privileges;"
echo "[mysql] \q"
echo "And more here:"
echo "https://chiefio.wordpress.com/2019/03/08/mariadb-mysql-login-failure-new-install-fix/#comment-109342"
echo 
echo "what I do is just add 'me' as a user inside the DB and grant access:"
echo "So like:"
echo "MariaDB [(none)]> CREATE DATABASE temps;"
echo "Query OK, 1 row affected (0.28 sec)"
echo "MariaDB [(none)]> GRANT ALL ON temps.* to chiefio@localhost"
echo "    -> IDENTIFIED BY 'LetMeIn!';"
echo "Query OK, 0 rows affected (0.21 sec)"
echo 

mysql -u root

echo
echo "Make sure that you have python3 - it ought to already be there"
echo
echo apt-get install python3
echo

apt-get install python3

echo
echo "Some OS releses requried idle3 to match python3"
echo "Others just use idle.  So here we try to install both"

echo
echo apt-get install idle3
echo

apt-get install idle3

echo
echo apt-get install idle
echo

apt-get install idle

echo "On the Odroid XU4 with Armbian / Devuan, this was NOT needed"
echo "It is included here as I needed it on the Raspberry Pi  OS version"
echo
echo apt-get install libmariadbclient-dev
echo

apt-get install libmariadbclient-dev

echo
echo apt-get install python-pip
echo

apt-get install python-pip

echo
echo apt-get install python3-pip
echo

apt-get install python3-pip

echo
echo pip install mysql-connector
echo

pip install mysql-connector

echo
echo pip3 install mysql-connector
echo

pip3 install mysql-connector

echo
echo apt-get install python3-pandas
echo

apt-get install python3-pandas

echo
echo apt-get install python3-matplotlib
echo

apt-get install python3-matplotlib

echo
echo "That ought to do it.  But different OS versions"
echo "Often have different things included in the dependency"
echo "tree.  You may find other bits need explicit inclusion"
echo "or you might find some of these are not needed"
echo
echo "I've left out most of the straight Python 2.x install" 
echo "stuff as I've basically moved everything to Python3"
echo "If you want to use Python 2.x then most of the install "
echo "python3 stuff needs a version with the 3 removed."
echo

Here’s the manual step (described in comments in the script) run after the install. I do a “show databases” so you can see that the temps temperature stuff isn’t there yet. I’ve bolded the bit that needs doing.

chiefio@odroidxu4:~$ mysql -u root -p 
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 37
Server version: 10.0.38-MariaDB-0+deb8u1 (Debian)

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
+--------------------+
3 rows in set (0.01 sec)

MariaDB [(none)]> CREATE DATABASE temps;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> GRANT ALL ON temps.* TO chiefio@localhost
    -> IDENTIFIED BY 'LetMeIn!';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> exit
Bye

I have a command set up named “mystart” that launches the database with my normal login:

mysql -u chiefio -p --local-infile temps

Here you can see me login as me, and then load one of the table descriptions. This pretty much shows that the whole database part is installed OK. I’ll do a Python run next (after all the data loading finishes) and verify it works (so everyting there is installed too).

chiefio@odroidxu4:/SG/xfs/chiefio/SQL$ mystart
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 39
Server version: 10.0.38-MariaDB-0+deb8u1 (Debian)

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [temps]> source tables/continent
Query OK, 0 rows affected (0.65 sec)

MariaDB [temps]> DESCRIBE continent;
+-------------+----------+------+-----+---------+-------+
| Field       | Type     | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+-------+
| version     | char(5)  | NO   |     | NULL    |       |
| ascension   | char(10) | YES  |     | NULL    |       |
| region      | char(1)  | NO   |     | NULL    |       |
| region_name | char(25) | YES  |     | NULL    |       |
+-------------+----------+------+-----+---------+-------+
4 rows in set (0.01 sec)

I’ve also made at script that sucks in the other table making scripts. You would need to change the path to match whatever you used:

chiefio@odroidxu4:/SG/xfs/chiefio/SQL/tables$ cat tables
source /SG/xfs/chiefio/SQL/tables/continent
source /SG/xfs/chiefio/SQL/tables/country
source /SG/xfs/chiefio/SQL/tables/invent3
source /SG/xfs/chiefio/SQL/tables/invent4
source /SG/xfs/chiefio/SQL/tables/temps3
source /SG/xfs/chiefio/SQL/tables/temps4
source /SG/xfs/chiefio/SQL/tables/mstats3
source /SG/xfs/chiefio/SQL/tables/mstats4
source /SG/xfs/chiefio/SQL/tables/anom3
source /SG/xfs/chiefio/SQL/tables/anom4

Here’s the run result:

MariaDB [temps]> source tables/tables
Query OK, 0 rows affected (0.45 sec)

Query OK, 0 rows affected (0.23 sec)

Query OK, 0 rows affected (0.23 sec)

Query OK, 0 rows affected (0.24 sec)

Query OK, 0 rows affected (0.24 sec)

Query OK, 0 rows affected (0.30 sec)

Query OK, 0 rows affected (0.23 sec)

Query OK, 0 rows affected (0.26 sec)

Query OK, 0 rows affected (1.09 sec)

Query OK, 0 rows affected (0.26 sec)

MariaDB [temps]> 

So that’s a fairly fast way to do the set-up steps.

I’ve mounted the same old source data disk so I don’t need to do the FORTRAN steps again (formatting for load) and instead can just load it up from the pre-made files.

So this script ought to do all the heavy lifting:

chiefio@odroidxu4:/SG/xfs/chiefio/SQL/bin$ cat LoadTables 
source /SG/xfs/chiefio/SQL/bin/lcontinent.sql
source /SG/xfs/chiefio/SQL/bin/LCxu4.sql
source /SG/xfs/chiefio/SQL/bin/l3xu.sql
source /SG/xfs/chiefio/SQL/bin/l4xu.sql
source /SG/xfs/chiefio/SQL/bin/ltemps3xu.sql
source /SG/xfs/chiefio/SQL/bin/ltemps4xu.sql
source /SG/xfs/chiefio/SQL/bin/lmstats3.sql
source /SG/xfs/chiefio/SQL/bin/lmstats4.sql
source /SG/xfs/chiefio/SQL/bin/lanom3.sql
source /SG/xfs/chiefio/SQL/bin/lanom4.sql
source /SG/xfs/chiefio/SQL/tables/mkindexs

As this might run a very long time… I’m going to launch it and do something else for a while. All the programs are the same as prior versions with only the path name to the disk changed.

When it’s done, I’ll update here with the resutlts. I think I have all the steps in the right order…

The lmstats that loads the monthly statistics needs the other tables loaded first to work. Then the lanom versions that create the anomaly tables depends on both the prior data and the statistics tables, so comes after that. Then I have a script that makes indexes on the tables typically used for reporting. It is likely that it would be faster to split that into one step that makes indeses on the temps tables prior to making the anomaly tables, but it is what it is and I’m not that interested in splitting it up. (It was originally done as a ‘glue on’ after the database was already there. It ought to be redone to make the creation step more efficient. Maybe someday ;-)

If all this works right, I’ll be about 3/4 of the way to an “end to end build” process (instead of a dozen “go fish” postings scattered in the past). “Someday” I’ll assemble a full end-to-end process (including the FORTRAN steps that are done once). Or maybe just links to those steps would be enough…

So back some time after lunch ;-)

Update!

Doing the data load I got to correct a couple of typos and spelling errors. Then did a drop table on everything (with a new script to dump them all named “dropall’). Now I’m running a new data load run. Here’s the thing that dumps all of it just via “source tables/dropall” at the MariaDB prompt:

chiefio@odroidxu4:/SG/xfs/chiefio/SQL/tables$ cat dropall 
DROP TABLE continent;
DROP TABLE country;
DROP TABLE invent3;
DROP TABLE invent4;
DROP TABLE temps3;
DROP TABLE temps4;
DROP TABLE anom3;
DROP TABLE anom4;
DROP TABLE mstats3;
DROP TABLE mstats4;

Then I did a fresh “source tables/tables” to recreate them all fresh, and started the load. The first 8 very fast lines are loading the 8 continent descriptions of one line each. Then we have the 240 country records being loaded into the “countries” table. I already know that one warning is one country with a name one character too long, so ignoring it for now.

Next are the two loads of “inventory” thermometer data at 7200 rows / 2.4 seconds and 27,361 in 2.78 seconds. Then things start to slow down.

The v3 temperature records are loaded in 3 min 21 seconds, and the v4 records take 11 minutes. After that, it is on to the “inside the database” processes of creating the statistics tables and the anomaly tables. The v3 statistics in 1 minute and the v4 in almost 9. The anomalies can run about 1/2 an hour, so I’ll update those steps as they finish.

MariaDB [temps]> source bin/LoadTables
Query OK, 1 row affected (0.08 sec)

Query OK, 1 row affected (0.07 sec)

Query OK, 1 row affected (0.10 sec)

Query OK, 1 row affected (0.08 sec)

Query OK, 1 row affected (0.06 sec)

Query OK, 1 row affected (0.05 sec)

Query OK, 1 row affected (0.03 sec)

Query OK, 1 row affected (0.04 sec)

Query OK, 240 rows affected, 1 warning (0.13 sec)    
Records: 240  Deleted: 0  Skipped: 0  Warnings: 1

Query OK, 7280 rows affected (2.41 sec)              
Records: 7280  Deleted: 0  Skipped: 0  Warnings: 0

Query OK, 27361 rows affected (2.78 sec)             
Records: 27361  Deleted: 0  Skipped: 0  Warnings: 0

Query OK, 5554200 rows affected (3 min 21.68 sec)    
Records: 5554200  Deleted: 0  Skipped: 0  Warnings: 0

Query OK, 16809258 rows affected, 50 warnings (11 min 5.58 sec)
Records: 16809258  Deleted: 0  Skipped: 0  Warnings: 50

Query OK, 87360 rows affected (1 min 13.62 sec)
Records: 87360  Duplicates: 0  Warnings: 0

Query OK, 328327 rows affected (8 min 44.38 sec)
Records: 328327  Duplicates: 0  Warnings: 0

Query OK, 15448107 rows affected, 65535 warnings (12 min 40.75 sec)
Records: 15448107  Duplicates: 0  Warnings: 15444656

The anomalies table loads completed in 8 and 12 minutes for v3 and v4. Not bad. The XU4 was using roughly 1.5 of the slower processors for this. The faster “big” processers get some action, but not much. Bursting to 1/3 or 1/5 from time to time. I think it is largely disk limited, as it is reading one patch of disk and writing another, so takin head seeks.

Making the indexes went way fast in comparision to the Raspberry Pi M3. About 4 minuts per index x 8 indexes for about 1/2 an hour:

Query OK, 0 rows affected (3 min 40.53 sec)         
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (3 min 47.08 sec)         
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (3 min 11.12 sec)         
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (3 min 19.97 sec)         
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (4 min 13.83 sec)         
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (3 min 34.63 sec)         
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (4 min 8.58 sec)          
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (4 min 8.93 sec)          
Records: 0  Duplicates: 0  Warnings: 0

So there you have it. A “soup to nuts” re-install and rebuild of the database (minus the Fortran data formatting steps that are “do once and save the result”). Not too bad, really.

I’ve run one of the graph making Python3 programs and it did in fact make the expected graph, so looks like everything works OK again.

Subscribe to feed

Advertisements

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in NCDC - GHCN Issues and tagged , , , , , , . Bookmark the permalink.

3 Responses to A GHCN Temperature Database Install Script

  1. M Simon says:

    I have come a little way since my comment on the EV thread last June.

    Some photonics questions

    What are the photon absorption bands of CO2?
    What are the photon absorption bands of water vapor?
    What is the overlap?
    What does it mean?

    It means that water vapor (WV) is by far the predominant “greenhouse gas”. If that is the case on a planet 70% covered by water, then variations in CO2 are not causing warming. Worse. Variations in greenhouse gasses are not causing climate change.

    Water vapor (WV) and CO2 absorb/radiate in the same bands. (Except for 15um where there is not much energy.) There is 100 times as much WV as CO2.

    We have a water problem.

    Study: CO2 NOT causing climate change. The link leads to a review of the paper.
    According to the paper’s findings climate changes are due to other physical phenomena – not carbon dioxide – and such changes have always taken place and will continue to do so despite the recent claims at the UN’s Paris climate summit (COP21) to ‘limit’ global warming to two degrees.

  2. M Simon says:

    I think my comment was redirected or in moderation.

    [Reply: Was in spam, but no idea why. Nothing in it looks spammy to me. Perhaps warmista sites marking those URLs as “spam” since they don’t like it? -E.M.S.]

  3. E.M.Smith says:

    There’s a lot to be said for mature reliable software…

    So after all that, and getting the XU4 back up and installed and all, that “old issue” of not being able to plot the year across the bottom returned. Seems it isn’t just what is in which column of the fetch, it’s a question of which matplotlib maturity you are running.

    The one on the XU4 Armbian/Wheezy doesn’t like it. The one on Devuan 2.0 (Debian Ascii) does and works well. Basically, something about plotting a load of DECIMAL values on a vertical axis causes it to barf and throw an exception and whatever it was is fixed in Ascii, so fixed in Devuan 2.0 release.

    OK….

    Well, the XU4 was mostly using about 1.5 CPU anyway and that was of only modestly faster clock cycle / same A53 type cores, so not a huge speed gain. Enough that I’d like to use it, but no so much that it’s a PITA to drop back to the Pi. Which I have done.

    The above effort (build scripted) is still very valuable as it will let me evaluate other boards very quickly now. Basically in about an hour I can toss the whole she-bang onto any board, run some tests, and see what issues there are / measure performance.

    As time (and ambition) permits I’m going to try some of the other boards. The Odroid C1 (that’s also a v7 32 bit instruction set but slightly faster clock), the Odroid C2 (64 bit cores and 1.5 GHz clock), etc.

    Unfortunately, as is often the case, the “Hot Board” that is new (or even older but the software maturity / stability hasn’t caught up due to small user / coder base) is more of a “has issues” experience while the “slow but popular” board has most of the issues ironed out due to millions of folks pushing the bug tracking system.

    If I had to speculate it would be that the folks doing the port figure nobody would ever ever use a small “embedded” system type SBC for a Data Base Server, so as long as the code compiled they just moved on to doing QA on things they cared about. A “newer version” would require more porting work and it will be “in queue” after all the other “important” stuff is done… Might even be in the Ascii based port of the XU4 OS, which would be great if it worked, but it wouldn’t install right for me… So I’ll wait a few more months and try to do the upgrade install fresh again.

    Until then the XU4 is a great browser station. Just not suited to making the graphs I want to make.

    Oh Well….

    Might be interesting to dust off the old Pentium PC and put Devuan on it… I’m not worried about chip risks in something behind two firewalls and you just know all the code was ported / patched / running on the PC release… I find it an unpleasant thought, but as a dedicated DB processing box not too offensive ;-)

    FWIW I’ve just finished making the 7 Python programs to make the region comparison of v3 vs v4 graphs and have the Region 1 (Africa) graph on the screen. In a couple of hours I’ll have all of them posted in a new posting for comparison.. Yes, Africa also shows the “change of history” we saw in the Region 5 sample… It also has the “narrow waist” in the baseline period ( I think that matters…) and the hockey stick end after about 1995 (when the electronic thermometers and airports dominate).

Anything to say?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.