[I was tempted to call this "Post Normal Distributions" for reasons that will become apparent, however it wasn't entirely appropriate. ]
Guest post by Charles Duncan
Anthony Watts’ excellent www.surfacestations.org website raises questions over the extent to which station changes (such as the addition of nearby pavement and air-conditioning units) affect the US, and by implication, global, temperature datasets.
I don’t pretend to understand the intricacies of the homogenisation processes used by the big players, but for some time I’ve had doubts over how well they compensate for these station changes, and have wanted to identify potential bias myself.
I’m not up to speed with statistical packages such as R, and am therefore constrained by the limitations of Excel; having said that, the newer versions can handle the whole GHCNv3 dataset in one bite. For each station I can readily calculate the annual delta T (‘anomaly’) by comparing its annual (December to November) average temperature to the previous year.
One way of looking for sources of potential bias is to look at the distribution of the anomalies of all stations in each year, which I initially assumed would be ‘normal’:
However some meteorological effects, such as the blocking event the UK experienced in November, are regional. For the most part these anomalies are balanced; if one part of the globe is cold, another is hot. In November it was southern Greenland and north eastern Canada that was hot.
If one had a uniform coverage of weather stations around the globe, the distribution for 2010 might then look something like this:
The stations that are unaffected are represented by the green line, with cold Europe shown in blue and hot Greenland and NE Canada in red. The resulting distribution is shown by the black line. Whilst not ‘normal’, the distribution is still symmetrical.
Unfortunately, stations are not uniformly spaced around the globe, with many more in Europe than Greenland and NE Canada. This introduces a potential error in any one year, but averaged over the life of the relevant cycle the asymmetry should balance out.
An easy way to quantify asymmetry is to compare the different types of average. There are three types of average used by statisticians; mean, median and modal. In a ‘normal’ distribution (indeed any symmetrical distribution) these three averages are the same.
- The mean is the most widely used average; in a school test, the mean result would be the total of all the marks awarded divided by the number of students.
- The median is the ‘one in the middle’. In a class of 21 pupils it would be the marks that the 11th in the class got; he had 10 pupils better, and ten pupils worse than him.
- The modal average is the most commonly occurring number – the peak of the distribution curve. If you were to make clothing for sale it would be the size that would fit the most people.
With less than 2,000 stations each year it is difficult to see the modal average; there just aren’t enough data points to get a sensible result. There are, of course, enough to calculate mean and median. Median is less affected by a subset of the data shifting. In the class analogy, the 11th pupil in a class of 21 will still be the 11th even if everyone brighter than him gets 100%. Only pupils changing from below to above him (or vice versa) affect his position in class.
The anomaly for 1917 is almost a perfect ‘normal’ distribution. To help see asymmetries I have also plotted normal distributions with the same standard deviation as the annual data, but with means of the mean, median and modal values:
The mean anomaly for 1917 was -0.623°C, and the median was -0.658°C. No particular significance should be placed on the fact that these happen to be negative numbers – the mean anomaly for 1910, for example, was 0.0304°C, and the median was 0.0300°C – the point to note is that there is only a very small difference between the mean and the median.
In contrast, in 1983, the mean was 0.61°C and the median 0.4°C, a difference of 0.21°C:
What is surprising, however is that this shape suggests it is just a subset that is warming. What we actually see is that shape of the left hand side seems to be preserved. If the planet as a whole is warming, would we not expect either the whole curve to shift to the right, or for the mean to shift to the right and the whole curve to flatten?
Summing the mean deltas (anomalies) for each year gives us the familiar shaped anomaly graph as reported by the major climate websites, and is shown in red in the chart below. For comparison, the summed medians and modal averages is shown in blue and orange respectively, with the difference between the mean and modal shown in green:
The two anomalies track well from 1900 to 1940 (yes, there is a slight step around 1925, but this corrects itself by 1940). However from about 1940 onwards there is a divergence (shown in green) that grows until about 1990.
What is interesting is that the skewed distribution is present in both cooling (1940 to 1970) and warming (1970 to 1990) phases, and is therefore not a manifestation of the warming per se.
If the divergence does represent change to a proportion of stations and/or their surroundings, then it would appear that the 1940 to 1970 cooling has in part been masked by such changes and was more severe than the mean suggests.
The decline since 1990 may be a manifestation of the loss of stations that happened around that time, and deserves further investigation.