GHCN V3 Beta: Part 1 – A First Look at Station Inventory data

By KevinUK and Verity Jones

Zeke Hausfather’s first look at the new NCDC GHCN v3 beta dataset immediately plots some comparisons of V3 with V2 and shows just how little change (according to Zeke) the new dataset brings to the graph of Mean Global Land Surface Temperature Anomaly vs year. It’s therefore now time to have at this new GHCN dataset in some detail, even if it’s only ‘beta’ at this stage.

Station Inventory data

OK, I’ve now downloaded the NCDC GHCN V3 beta dataset and have started by looking at the data in the ‘unadjusted’ station inventory file ghcnm.v3.0.0-beta1.20100917.qcu.inv and have compared it to the GHCN V2 equivalent v2temperature.inv.It looks like Zeke has not done any basic station counts on this file as I make it that there are exactly 7280 records in both the V3 and v2 station inventory files which would appear to contradict Zeke’s statement ‘Version 3 added about 500 new stations (> 1000 post-2006), so no huge new data update quite yet’ as I can’t see any evidence that any new stations have been added to the station inventory file (unless some of those in V2 have been replaced by an exact same no. of new stations in V3 which doesn’t look to be the case).

I’ve also done a cross tabulation query of the no. of stations (records in the station inventory file) grouped by country/country code and again from what I can see each country has exactly the same no of stations (records) in the V3 station inventory file as in the V2 station inventory file. For example there are 1921 ‘UNITED STATES OF AMERICA’ stations in both files and 847 ‘CANADA’ stations in both files.

One thing I have also noticed is that for about 2/3 of the US (country code 425) stations, the WMO station code/imod combination (which represents a unique station record in the V2 station inventory file) appear to have been replaced with ‘Station IDs’ that are not in the 70,000 range (as they all are in the V2 file) but rather with ‘Station IDs’ that look like they’ve come from USHCN V2 dataset.

For now I’m therfore going to assume that the diferences between the no of stations by year chart for V2 versus V3 posted by Zeke on Lucia’s Blackboard are due to additions of further monthly average temperature data for EXISTING stations in the station inventory file and NOT due to additional stations being added to the GHCN dataset as Zeke’s statements seem to imply.

Now if you are reading this you’ve probably already worked out that I’m looking to compare the changes/additions made to the GHCN v3 beta dataset on an individual station basis as my main interest is in looking at how the changes/additions to the dataset have effected the warming/cooling trends for individual stations. I’m particularly interested to see whether or not NCDC have made any significant changes to how they adjust raw data for individual stations as a great many of the individual station V2 adjustments have no physically justifiable explanation IMO. Let’s see if things have remained much the same or for that matter have gotten even worse in this respect in going from V2 to v3.  I somehow doubt that things have improved but lets wait and see.

I suspect it won’t be long now before Willis E has an updated thread on Darwin adjustments up on WUWT. You never know I might even be able to beat him to the punch.

Latitude/Longitude data

I’ve also noticed that for most of the US stations the latitude/longitude coordinates in the V3 station inventory file use 5 decimal places for the lat/long values as opposed to the only 2 decimal used in the V2 file. sadly it still looks like the rest of the world (ROW) stations use only 2 decimal places for their lat/long values.

The Readme file also seems to indicate that they’ve done something quite different when it comes to quality control (new MFlag and QFlag) and handling duplicate series for the same station (new SFlag). But more of that in Part 2.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to GHCN V3 Beta: Part 1 – A First Look at Station Inventory data

  1. Pingback: Climate Blog and News Recap: 2010 09 24 « The Whiteboard

  2. Steven Mosher says:

    If you look at Zeke’s slide you see that he doesntindicate that they add stations.
    The station count is a calcualion that zeke and I both do.. it means valid stations
    or stations with enough years in the 1961-90

    we’ve known about V3 for some time and that it would not contain more stations beyond the 7280, but would recover some data.

    v3.1 due out in 2011 adds more stations. Mut the way we “count stations” the file
    ( not the inventory) does contain more stations.

    And more stations wont change the answer, not 5000, not 10000, not 20000

    Because the problem is not the sampling. the problem is not anomalies, or gridding or any of that stuff.

    the problem is uncertainty in adjustments ( see slide 24) and UHI.

    Their is a huge waste of effort on all the tangential issues. which irks me to no end.

  3. KevinUK says:


    Thanks for you comment and while I have the chance just now, apologies for implying that you are an ‘apologist for CAGW scientists’. I know you are not! I should have put you name before Nick S’s (that’s Professor Nick Stokes BSc, MSc and PhD to us mere mortals who I’m talking about by the way) as my somewhat ‘snide’ comment was meant for him and not for you.

    I concede your point aboutthe station counts but Zeke’s statement on Lucia’s blackboard is nonetheless ambiguous and implies that more stations have been added to teh station inventory whne thus far none have been. As alwalys is easy to gets the wording wrong and for one’s meaning to be misinterpreted as a result.

    As you may have seen by now I’ve process all the GHCN v3 beta for all bar the US stations and have put up some interactive DIY Flash based maps on my main climate data web site (

    Veriyy and I will be doing a series of threads on DITC in which which look at the differences between the GHCN V2 and V3 beta. The first one will be a revisit to the Top 30 ‘Cooling turned in Warming’ and ‘Warming turned into Cooling’ lists I did for the GHCN V2 dataset earlier in 2010.

    It’s gratifiying to see that NCDC are taking notice of ‘non traditional scientific sources (non peer reviewed)’ like ourselves as I’ve already seen that several of the stations that were on my V2 lists are no longer on the V3 lists e.g. Darwin airport, Edson, Alberta and Mayo, Yukon Territory. I’m sure Willis E woudl have liked soem form of ackowledgement given the impact his seminal Darwin thread had on WUWT, but never mind, as Steve M would say ‘This is climate science’.

    And also just so you know I agree with you on ‘the problem is uncertainty in adjustments ( see slide 24) and UHI.’ but disagree with you on ‘Their is a huge waste of effort on all the tangential issues….’.

    I don’t think what bloggers like Verity and I (and others like EMS and JeffID to name just two) do is a ‘huge waste of effort’ and it would seem based on their recent actions neither do NCDC or even the UK Met Office. Both organisations seem quite content for us to ‘shine a light’ on the issues with the datasets, they just don’t want to directly and openly acknowledge our contribution.

Comments are closed.