GIStemp STEP1_v2_to_bdb

The Python program v2_to_bdb.py

Here is the listing. The explanation will follow some months or weeks from now below, after the === bar.

   
 
#! /usr/bin/python

import sys, struct, string, bsddb
import stationstring

BAD = 999.9
IBAD = 9999

def fill_dbm(f, dbm, info, ids, sources):
    line = f.readline()
    ids = []
    while line:
        lines = []
        id = line[:12]
        lines.append(line)
##          print id
        ids.append(id)
        last_id = id
        line = f.readline()
        while line:
            id = line[:12]
            if id != last_id:
                break
            lines.append(line)
            line = f.readline()
        data, begin = stationstring.from_lines(lines)
        st_id = last_id[:-1]
        dict = info[st_id]
        dict['begin'] = begin
        if sources.has_key(last_id):
            dict['source'] = sources[last_id]
        else:
            dict['source'] = 'UNKNOWN'
        mystring = stationstring.serialize(dict, data)
        dbm[last_id] = mystring
    dbm['IDS'] = string.join(ids, ' ')
    dbm['IBAD'] = str(IBAD)
    dbm.close()

def split_title(line):
    template = '11sx30sx6sx7s5s5sc5s2s2s2s2sc2s16sx'
    keys = ('id', 'name', 'lat', 'lon', 'elevs',
            'elevg', 'pop', 'ipop', 'topo', 'stveg',
            'stloc', 'iloc', 'airstn', 'itowndis', 'grveg')
    try:
        tpl = struct.unpack(template, line)
    except struct.error:
        tpl = struct.unpack(template, line[:101])
    ell = len(tpl)
    hash = {}
    for n in range(ell):
        key = keys[n]
        if key == 'id':
            id = tpl[n]
            continue
        value = tpl[n]
        value = string.lstrip(value)
        value = string.rstrip(value)
        hash[key] = value
    return id, hash

def get_sources():
    sources = {}
    fs = {}
    fs['MCDW'] = open('mcdw.tbl', 'r')
    fs['USHCN'] = open('ushcn.tbl', 'r')
    fs['SUMOFDAY'] = open('sumofday.tbl', 'r')
    for source, f in fs.items():
        line = f.readline()
        while line:
            x, id, rec_no = string.split(line)
            id = id + rec_no
            sources[id] = source
            line = f.readline()
        f.close()
    return sources

def get_info():
    f = open('v2.inv', 'r')
    print "reading v2.inv"
    title = f.readline()
    info = {}
    while title:
        id, hash = split_title(title)
        info[id] = hash
        title = f.readline()
    ids = info.keys()
    ids.sort()
    return info, ids

def main():
    infile_name = sys.argv[1]
    bdb_name = infile_name + '.bdb'

    f = open(infile_name, 'r')
    print "reading " + infile_name
    info, ids = get_info()
    sources = get_sources()
    dbm = bsddb.hashopen(bdb_name, 'n')
    print "writing " + bdb_name
    fill_dbm(f, dbm, info, ids, sources)

if __name__ == '__main__':
    main()


=========================================================

The analysis of this program will have to wait a fair while as I’m still finishing another step. It looks like it’s a well done library of utilities so I’ll likely leave it to the end.

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in GISStemp Technical and Source Code and tagged , , , , , . Bookmark the permalink.

2 Responses to GIStemp STEP1_v2_to_bdb

  1. Peter O'Neill says:

    I have just added the following comment to the gistemp-step0-input-files page:

    “The *.tbl files: sumofday.tbl and ushcn.tbl are very similar in format and seem to be a list of an odd station ID and regular station ID followed by a single digit flag. Active flag? While mcdw.tbl has what looks like two regular station IDs and a flag byte.”

    In STEP0 only ushcn.tbl is used, to translate those “odd” station IDs to “regular” station IDs, and that “single digit flag” is not used.

    In STEP1 the three *.tbl files are read in get_sources(), found in v2_to_bdb.py, and that “single digit flag” turns out to be an offset which is combined with the station ID to select the appropriate raw data set from among multiple data sets for that station.

    i.e. mcdw.tbl
    103060355 10160355000 2 -> 101603550002
    103060360 10160360000 3 -> 101603600003
    103060390 10160390000 5 -> 101603900005
    103060402 10160402000 3 -> 101604020003

    Note that these offsets are all zero for ushcn.tbl

    See the comment I will now add to the gistemp-step1_v2_to_bdb page for a further comment on this in the context of the code shown there:
    —————————————————-
    Be careful if rewriting the Python code in another language:

    def get_sources():
    sources = {}
    fs = {}
    fs[‘MCDW’] = open(‘mcdw.tbl’, ‘r’)
    fs[‘USHCN’] = open(‘ushcn.tbl’, ‘r’)
    fs[‘SUMOFDAY’] = open(‘sumofday.tbl’, ‘r’)
    for source, f in fs.items():
    line = f.readline()
    while line:
    x, id, rec_no = string.split(line)
    id = id + rec_no
    sources[id] = source
    line = f.readline()
    f.close()
    return sources

    The line “id = id + rec_no” above concatenates two strings, one of 11 digits and one a single digit, to form a 12 digit string. I made the mistake initially of converting the strings to integer and adding, getting of course an 11 digit result, which did not then provide a valid key for use in fill_dbm(f, dbm, info, ids, sources)
    —————————————————-
    And I have a question on the code in this file:

    info, ids = get_info()
    sources = get_sources()
    dbm = bsddb.hashopen(bdb_name, ‘n’)
    print “writing ” + bdb_name
    fill_dbm(f, dbm, info, ids, sources)

    get_info() returns the sorted info keys as ids, and ids is then passed to fill_dbm. But fill_dbm starts:

    def fill_dbm(f, dbm, info, ids, sources):
    line = f.readline()
    ids = []

    and this appears to me to indicate that the ids values passed in are simply discarded. I am not so familiar with Python (I’m rewriting everything in C#), so am I misunderstanding something about Python here?

  2. E.M.Smith says:

    Unfortunately, I do not speak Python. I can sort of follow it, but I’m an old plain “C” guy who also does FORTRAN, so a C sharp guy is way ahead of me… ( I can also do ALGOL, grandad to Pascal that I can also write, after a fashion, and a few other languages; but Python is not among them…)

    With luck, others who do speak Python will also pass by and add comments. I’d like this to be a community effort, rather than just a lone wolf like me, baying at the moon…

Comments are closed.