One of the most shocking things about examining the GHCN data that goes into global climate models has been the inconsistency of the data. Not only is there loss of stations, but within each set of station data, there may be considerable loss of monthly data. This post asks – how bad is this? (answer – much worse than I thought – see the last graph).
GISS stated methods and QC
The methods used by NASA GISS for the calculation of the global average temperature using the GIStemp programme can be found here. Basically, deriving station annual mean temperatures relies on first calculation of the long term monthly averages of the data. These are then used to derive the monthly, seasonal and annual anomaly values. For Mactan for example, the long term monthly average for January is 26.92 degC, which means the anomaly value for January 2009 was (27.2 – 26.92 = 0.27). NASA says:
“The trick was to find the anomalies first and then compute the absolute values from the anomalies: Whereas the absolute monthly and seasonal temperatures may have a definite seasonal cycle, the monthly and seasonal anomalies do not; hence whereas a seasonal mean may be totally distorted if we leave out the warmest or coldest month, seasonal anomalies are less impacted by dropping any monthly anomaly.”
Really? (Hmm, there’s that word ‘trick’ again). Well, I worked through the calculations for Mactan and I have to say I was convinced – the anomaly calculation actually does a good job of filling in for any missing data. And it makes some sense to do this – to maximise the data that is there and avoid large gaps. But then I thought – the temperature variations in Mactan are small. The annual average temperature for the station is 28.01 degC and the aeasonal averages vary from 27.03 to 28.38. The temperature plot for the reporting period (1974-2009) (Figure 2) also shows a relatively flat trend.
A quick eye cast over the values of the temperature anomalies for Mactan showed that most of the monthly variation is small, less than +/- 0.5 degC off the monthly mean, with exceptional months exceeding +/- 1.0 degC. The highest monthly anomaly was +1.87 degC (March, 1985). But what would happen in a station with large variations?
The annual temperature record for Jiuquan, China (Figure 3) fails to show that there is an annual temperature variation in this region of more than 25 degC across the average year. The overall annual average is 7.79 degC, but the season averages vary from -7.06 (DJF) to 21.19 (JJA) degC. The plot does show clearly that there has been a strong cooling trend at Jiuquan from 1941 to 1968 and then warming from 1970 to present.
Now I should say at this point that there is very little missing data in the Jiuquan record; six individual months over the record, with no more than one month missing in any one of the six affected years. But what if there were? Does the greater variation in temperatures make a difference?
Well, it has been quite instructive playing with the data. Taking out any one month of data in the Jiuquan record can affect the annual anomaly quite significantly. I was surprised. Removing any one month in Summer (June/July/Aug) can affect the anomaly value for that year +/- 0.03-0.08 degC on average, and up to to +/- 0.18 degC maximum, but removing any Winter (Dec/Jan/Feb) month can result in a change in annual anomaly of +/- 0.2-0.3 (up to a max. of 0.6) degC for that year. Repeating this for Mactan the maximum differences I observed were +/- 0.10 and 0.15 degC respectively for Summer and Winter. So Jiuquan (and by implication cooler stations like it with large temperature variations) can be very sensitive to missing values, even when calculating anomaly values not absolute temperatures.
Why do I think this is important? Simply this: when you have a missing month, the ‘filling in’ by using average anomaly values is just WRONG. Look at the Jiuquan Feburary record. The temperatures are all over the place. Typically if January is warm, there is no guarantee that February will be warm also. So with all those missing values we are creating even more uncertainty in the data record by spreading the existing data to cover those months – averaging the data. And the main point is this – we know that Winter warming has played a major part in warming the global average temperature, and part of that has been fewer extreme lows, but the cooling/warming cycle apparent in Jiuquan’s record is far from unique (see Mapping Global Warming for examples of maps of worldwide warming, cooling and warming cycles). So, if we are now entering a cooling cycle (negative PDO./AMO etc.) with more extreme lows, and we miss them though missing months, the record will be warmer than it should be (conversely extreme warm months may be missed and the record will be cooler than actual).
So what else to do when there are missing months? Well, I believe BOM (Australian Bureau of Meterorology) does not compute an Annual Mean Temperature for years with even one missing month of data, however I have been unable to find a specific reference to this on the BOM site [if some kind soul can point me to it in comments I’ll update with a link]. This is the QC also applied by my collaborator Kevin, who is responsible for the wonderful maps I linked to above. Kevin has quantified the missing data (Fig. 5) and it is quite shocking:
[Update 8th March 2010. Just realised I’d missed something E.M. Smith had picked up in the NASA FOIA emails release – he quotes a couple of emails and it seems NASA is concerned about infill after all!]