By KevinUK and Verity Jones
Zeke Hausfather’s first look at the new NCDC GHCN v3 beta dataset immediately plots some comparisons of V3 with V2 and shows just how little change (according to Zeke) the new dataset brings to the graph of Mean Global Land Surface Temperature Anomaly vs year. It’s therefore now time to have at this new GHCN dataset in some detail, even if it’s only ‘beta’ at this stage.
Station Inventory data
OK, I’ve now downloaded the NCDC GHCN V3 beta dataset and have started by looking at the data in the ‘unadjusted’ station inventory file ghcnm.v3.0.0-beta1.20100917.qcu.inv and have compared it to the GHCN V2 equivalent v2temperature.inv.It looks like Zeke has not done any basic station counts on this file as I make it that there are exactly 7280 records in both the V3 and v2 station inventory files which would appear to contradict Zeke’s statement ‘Version 3 added about 500 new stations (> 1000 post-2006), so no huge new data update quite yet’ as I can’t see any evidence that any new stations have been added to the station inventory file (unless some of those in V2 have been replaced by an exact same no. of new stations in V3 which doesn’t look to be the case).
I’ve also done a cross tabulation query of the no. of stations (records in the station inventory file) grouped by country/country code and again from what I can see each country has exactly the same no of stations (records) in the V3 station inventory file as in the V2 station inventory file. For example there are 1921 ‘UNITED STATES OF AMERICA’ stations in both files and 847 ‘CANADA’ stations in both files.
One thing I have also noticed is that for about 2/3 of the US (country code 425) stations, the WMO station code/imod combination (which represents a unique station record in the V2 station inventory file) appear to have been replaced with ‘Station IDs’ that are not in the 70,000 range (as they all are in the V2 file) but rather with ‘Station IDs’ that look like they’ve come from USHCN V2 dataset.
For now I’m therfore going to assume that the diferences between the no of stations by year chart for V2 versus V3 posted by Zeke on Lucia’s Blackboard are due to additions of further monthly average temperature data for EXISTING stations in the station inventory file and NOT due to additional stations being added to the GHCN dataset as Zeke’s statements seem to imply.
Now if you are reading this you’ve probably already worked out that I’m looking to compare the changes/additions made to the GHCN v3 beta dataset on an individual station basis as my main interest is in looking at how the changes/additions to the dataset have effected the warming/cooling trends for individual stations. I’m particularly interested to see whether or not NCDC have made any significant changes to how they adjust raw data for individual stations as a great many of the individual station V2 adjustments have no physically justifiable explanation IMO. Let’s see if things have remained much the same or for that matter have gotten even worse in this respect in going from V2 to v3. I somehow doubt that things have improved but lets wait and see.
I suspect it won’t be long now before Willis E has an updated thread on Darwin adjustments up on WUWT. You never know I might even be able to beat him to the punch.
I’ve also noticed that for most of the US stations the latitude/longitude coordinates in the V3 station inventory file use 5 decimal places for the lat/long values as opposed to the only 2 decimal used in the V2 file. sadly it still looks like the rest of the world (ROW) stations use only 2 decimal places for their lat/long values.
The Readme file also seems to indicate that they’ve done something quite different when it comes to quality control (new MFlag and QFlag) and handling duplicate series for the same station (new SFlag). But more of that in Part 2.