I’ve been following the recent Darwin Airport thread on WUWT with some interest and thought that it was about time I did some auditing of the instrumental temperature record myself as several others (like GG, RomanM, EMSmith, hpx83 etc) have done recently.
As well as being a Physicist (educated to degree level), I also have significant database design and programming skills and have built several large decision support/data warehousing systems during the course of my career so far. I therefore thought the best place to start would be to follow hpx83’s (Savecapitalism’s) example and load the GHCN dataset into a relational database management system (RDBMS) and start to query the data.
My first port of call was therefore to the data and after searching various blogs I made my way to the download location for NOAA’s GHCN version 2 dataset.
My next step was the to follow the excellent documentation provided by a thread on the GHCN – Global Historical Climate Network on E M Smith’s ChiefIO blog
Having then familiarised myself with the GHCN data files, I was then ready to write some code to import the GHCN data from the various GHCN download files into a series of normalised and indexed tables. Because the GHCN data is stored in a series of non-normalised ‘flat files’ this was a non-trivial task. My next step was then to have a good look at the data by carrying out some simple ‘counting queries’ on the records stored in various tables in the database etc.
One of the first things I noticed was the rather (at least initally) confusing ‘station inventory’ data and its relationship to the raw and adjusted mean temperature data. Each record in the raw (v2.mean) and adjusted (v2.mean_adj) temperature data files has a field that is referred to in the GCHN documentation as the ‘station modifier code’. In fact this number in combination with the WMO station code represents a unique ‘station’ record in the GHCN ‘station inventory’ file (v2.temperature.inv).
So in fact there are often several ‘stations’ in the station inventory file for a given WMO station code. In effect these station modifier codes represent the different ‘stations’ that make up the station record for a given WMO station. They appear to represent the various ‘station moves’ that are often mentioned in the many threads on various blogs on this subject. In addition to the station modifier codes there are also often ‘duplicate no.s’ in the raw and adjusted data files which indicate that there can often be several (sometimes overlapping) data series for a given WMO station code/modifier code. combination.
If you want to familiarise yourself with a set of data it is often a good idea to see if you can reproduce someone else’s analysis that has been carried out using the same data. I decided therefore to use Willis E’s WUWT Darwin airport thread as an example and attempt to reproduce his results.
As you’ll see shortly, it proved to be a very useful exercise that then lead me on to look into the GHCN data in much more detail. First thing to note. In the GHCN dataset Darwin airport has a 5 digit WMO station code of 94120 and according to NOAA it has only one station modifier code of 000. This would seem to imply thatthgere have been no station movements at Darwin – see later?. Darwin has five different raw data series, each corresponding to a different ‘duplicate no’ (0, 1, 2, 3 and 4). For its adjusted data there are three ‘duplicate no’ series (0, 1, and 2).
This implies that the raw data Series 3 and 4 have for some reason been dropped from NOAA’s GHCN adjustment analysis. Presumably this is because of the fact that they have 20 years (Series 3) and 9 years (Series 4) of data only while Series 0,1 and 2 have 110, 69 and 42 respectively. Similarly the number of data points in the adjusted data for Series 0, 1 and 2 are 110, 69 and 42 respectively i.e. they are an exact match to the no. of points for each of the equivalent raw series. At this point it is worth charting the raw and adjusted temperature data series for Darwin as Willis E. did.
Figure 1 – Darwin raw mean temp. series Figure 2 – Darwin adj. mean temp. series
As can be seen by looking at Figure 2, it appears that NOAA have only chosen to adjust Series 0 and 2. For some reason almost no adjustments are made to the data for series 1 which includes the most recent data. Why should NOAA only adjust the older data and not the most recent data? It is also informative to see the effect (as Willis E did) that the adjustments have on the raw data for Series 0 only by looking at the following chart.
Figure 3 – Darwin Series 0 raw/adjusted mean temps
This is certainly a very significant adjustment that results in a +6 deg. C/ century warming slope in the adjusted Series 0 raw data post 1941 as highlighted by Willis E in his Figure 8 on the Darwin airport thread on WUWT.
The following conclusions my be drawn from my analysis in which I’ve attempted to reproduce Willis E’s analysis of the NOAA GHCN adjustments for Darwin.
1. It is informative to load the NOAA GHCN data into an RDBMS, as the GHCN adjustment data can then be usefully analysed to look for trends in the data for all and not just selected WMO stations.
2. Once all the data has been loaded and appropriate indexed, it then very easily to filter and export the data for selected WMO stations so that the GHCN data can be analysed and in further detail and charted using third party software packages e.g. MS Excel.
3. After exporting all the data for Darwin, it has been possible to independently reproduce all the charts produced by Willis Eschenbach on his WUWT Darwin thread and to confirm his conclusion that adjustments made to the raw Darwin data by NOAA can not be physically justified.
Is Darwin a special case? Has NOAA applied similar physically unjustifiable adjustments to the raw data for any other WMO stations? See my next thread on NOAA’s physically unjustifiable GHCN adjustments.