[I was tempted to call this “Post Normal Distributions” for reasons that will become apparent, however it wasn’t entirely appropriate. ]
Guest post by Charles Duncan
Anthony Watts’ excellent www.surfacestations.org website raises questions over the extent to which station changes (such as the addition of nearby pavement and air-conditioning units) affect the US, and by implication, global, temperature datasets.
I don’t pretend to understand the intricacies of the homogenisation processes used by the big players, but for some time I’ve had doubts over how well they compensate for these station changes, and have wanted to identify potential bias myself.
I’m not up to speed with statistical packages such as R, and am therefore constrained by the limitations of Excel; having said that, the newer versions can handle the whole GHCNv3 dataset in one bite. For each station I can readily calculate the annual delta T (‘anomaly’) by comparing its annual (December to November) average temperature to the previous year.
One way of looking for sources of potential bias is to look at the distribution of the anomalies of all stations in each year, which I initially assumed would be ‘normal’:
However some meteorological effects, such as the blocking event the UK experienced in November, are regional. For the most part these anomalies are balanced; if one part of the globe is cold, another is hot. In November it was southern Greenland and north eastern Canada that was hot.
If one had a uniform coverage of weather stations around the globe, the distribution for 2010 might then look something like this:
The stations that are unaffected are represented by the green line, with cold Europe shown in blue and hot Greenland and NE Canada in red. The resulting distribution is shown by the black line. Whilst not ‘normal’, the distribution is still symmetrical.
Unfortunately, stations are not uniformly spaced around the globe, with many more in Europe than Greenland and NE Canada. This introduces a potential error in any one year, but averaged over the life of the relevant cycle the asymmetry should balance out.
An easy way to quantify asymmetry is to compare the different types of average. There are three types of average used by statisticians; mean, median and modal. In a ‘normal’ distribution (indeed any symmetrical distribution) these three averages are the same.
- The mean is the most widely used average; in a school test, the mean result would be the total of all the marks awarded divided by the number of students.
- The median is the ‘one in the middle’. In a class of 21 pupils it would be the marks that the 11th in the class got; he had 10 pupils better, and ten pupils worse than him.
- The modal average is the most commonly occurring number – the peak of the distribution curve. If you were to make clothing for sale it would be the size that would fit the most people.
With less than 2,000 stations each year it is difficult to see the modal average; there just aren’t enough data points to get a sensible result. There are, of course, enough to calculate mean and median. Median is less affected by a subset of the data shifting. In the class analogy, the 11th pupil in a class of 21 will still be the 11th even if everyone brighter than him gets 100%. Only pupils changing from below to above him (or vice versa) affect his position in class.
The anomaly for 1917 is almost a perfect ‘normal’ distribution. To help see asymmetries I have also plotted normal distributions with the same standard deviation as the annual data, but with means of the mean, median and modal values:
The mean anomaly for 1917 was -0.623°C, and the median was -0.658°C. No particular significance should be placed on the fact that these happen to be negative numbers – the mean anomaly for 1910, for example, was 0.0304°C, and the median was 0.0300°C – the point to note is that there is only a very small difference between the mean and the median.
In contrast, in 1983, the mean was 0.61°C and the median 0.4°C, a difference of 0.21°C:
The 1983 example looks quite like the chart below, made up of two normal distributions in the ratio 3:1 and 1.5°C apart, the mean is 0.31°C, and the median is 0.21°C:
A chart this shape could be explained by a subset of the stations having warming over time, as would be the case with a change in the station’s environment.
What is surprising, however is that this shape suggests it is just a subset that is warming. What we actually see is that shape of the left hand side seems to be preserved. If the planet as a whole is warming, would we not expect either the whole curve to shift to the right, or for the mean to shift to the right and the whole curve to flatten?
Summing the mean deltas (anomalies) for each year gives us the familiar shaped anomaly graph as reported by the major climate websites, and is shown in red in the chart below. For comparison, the summed medians and modal averages is shown in blue and orange respectively, with the difference between the mean and modal shown in green:
The two anomalies track well from 1900 to 1940 (yes, there is a slight step around 1925, but this corrects itself by 1940). However from about 1940 onwards there is a divergence (shown in green) that grows until about 1990.
What is interesting is that the skewed distribution is present in both cooling (1940 to 1970) and warming (1970 to 1990) phases, and is therefore not a manifestation of the warming per se.
If the divergence does represent change to a proportion of stations and/or their surroundings, then it would appear that the 1940 to 1970 cooling has in part been masked by such changes and was more severe than the mean suggests.
The decline since 1990 may be a manifestation of the loss of stations that happened around that time, and deserves further investigation.
Charles, that was a very nice article, thank you.
I have great difficulty with the notion of a ‘global’ temperature, I don’t think it is possible to have such a thing unless you have sufficient stations in the first place and these same ones are used all the time in calculating anomalies.
I think it is also problematic to believe we can parse historic temperatures to hundredths of a degree. Before the introduction of digital weather stations in the late 1980’s temperatures taken at individual stations were often highly flawed due to instrument inaccuracy or inadequate observer methodology. Putting them through a computer in the manner that GIss or CRU do doesn’t make them any more accurate.
As can be seen with the surface station project, even modern stations can be highly suspect. My best guess is that the world has been generally warming since 1607 but there are numerous counter cyclical cooling trends such as those Verity and I observed in the article ‘in search of cooling trends.’
I think your observation that the cooling from 1940 to 1970 has been ‘masked’ is probably spot on. Hubert Lamb was convinced of a coming ice age and often made references to the cooling period you identify as being the proof of it.
This is a very nice demonstration of the corruptive influence of UHI-affected records. That’s why vetting of intact station records, rather than indiscriminate manufacture anomaly sausages, is indispensible for reliable scientific estimates of the course of global temperatures.
Another explanation for the drift in the ‘mean minus median’ in your last figure might be due to the expansion (then contraction) of the number of stations in the network. You hint that the recent decline may be due to this, maybe the increase upto the mid-80’s is also explained by this. It could be that the absolute number of thermometers that came online was greater in the warmer parts of the world than the cooler parts. It certainly looks like the polar regions are under represented even now. It would also make sense that the increase in thermometer number has been towards the higher end in the tropical and sub-tropical regions.
Maybe some analysis of station number by latitudinal bands might throw light on that.
Isn’t this one of the reasons that gridding of the temperature record is performed? To try to limit the effect of changing thermometer placement.
It works the opposite, at least in the way GISS does gridding. First the grid mean anomaly is determined by what they think the anomaly would be in the exact center of the grid. To determine that they use every station with 1200km of that center point, with the closer to the center the station is the more “weight” it carries in the calculation.
Now here is the problem with gridding and something the proponents of it overlook. The fewer stations within 1200km of that center point you start with the bigger effect losing stations have on the mean. To put it into perspective if you have a grid box near where I live on the East Coast of the US centered near Washington DC, you will find well over 20 stations (closer to 50 in real life) within 1200km of it . So if you lose 3 stations in that grid in the last 4 years that will not effect that much. However if you only had 4 stations within 1200km of that center point at the most in that grids history and you lose 3 stations over the last 4 years, that lone remaining station’s annual and monthly means becomes by default the grid means. Now to complicate it even more lets say that lone remaining station is on the coast of the Atlantic Ocean which has the Gulf Stream just about 90 miles off the coast. Guess what that does to temperatures year round? It moderates them. It is cooler in the summer and warmer in the winter then the stations that used to be further inland. So in this theoretical case you will flatten your trend out artificially after you lost the other inland stations, thus introducing a bias.
Same thing happens if you move stations, change the environment around the stations or even change instruments. The more stations that make up the grid mean historically, that don’t change, basically dampen the effect of any changes to a few stations. When you got very few making up the grid mean the changes have a much greater effect.
This doesn’t even get into the GISS use of making up grid means were they got no data for that grid since there is no stations in that grid in the first place (think of the Arctic). Or that changing instruments in the past were alot of times undocumented or poorly documented. They basically state they replaced the max or min gauge but they don’t tell you what type or model of gauge they replaced it with so you got no clue what the error range of the actual gauge is.
If you go and look at scanned in copies of the old (circa 1900) US COOP station paper records you will find that they are all in either whole degrees or at most to a tenth of a degree. You never see them in hundredths of a degree because the instruments back then were not that accurate. Liquid in Glass (LIG) gauges today are for them most part only rated to being to half a degree accuracy, Platinum Resistance Thermistors (PRT’s) are accurate to hundredths of a degree and are what you find in automated systems. So once you start that swapping around you can get all kinds of changes in trend.
“Another explanation for the drift in the ‘mean minus median’ in your last figure might be due to the expansion (then contraction) of the number of stations in the network.”
This seems logical, however, the whole thing about using anomalies is that they are SUPPOSED to be independent of the actual temperature of the station. I do think it is possible there is some effect of this and your suggestion of looking at latitude bands is a good one.
I too think station moves (and how they are handled), and homogenisation have a lot to answer for.
Well you suggested latitudinal bands. Since the Arctic and Antarctic show the largest anomalies….we thought that might be a place to start and whaddya know – it shows something interesting – well I think anyway. We’ve a bit more digging to do, then it’ll either turn out to be a ‘hmm’ and I’ll post the graph here, it it’ll turn into post in its own right.
Like HR I thought that the whole idea of “gridding” was to give each square the same weight regardless of how many thermometers may lurk within its boundaries.
Then I read boballab’s comment and confusion takes over.
Here are a couple of problems. Firstly, most of the earth’s surface is covered by water with almost no thermometers until recently. How can you grid data over long periods of time when there is little to be had?
Likewise, in high latitudes there are very few thermometers so we rely on James Hansen’s assurance that it does not matter because anomalies don’t vary much over distances of 1,000 km or more. If that is so why don’t we retire 90% of the army of thermometers in the low latitudes? Maybe when it comes to surface stations we should prefer quality over quantity.
Given the unsatisfactory nature of surface station records it seems inevitable that the future belongs to satellite measurements.
You are not the only one that gets fooled by that point about Gridding and weights. Anyone can see how changing the number of stations that make up the Grid anomaly actually changes the value of the anomaly by just looking at the data that is used in the GISTEMP Mapmaker program. In it you have a choice of 1200km or 250 km “smoothing”, basically that sets the radius from the grid center a station needs to be within to be included in the calculation. It also mimics station loss. When you decrease the radius from 1200km to 250km you have for mathematical purposes just lost all the stations between 250km and 1200km.
Here is an example taken from GISTEMP:
We will use grid cell centered on 83N by 63W or as the data from GISS puts it -63 by 83. For some reason GISS goes against normal convention and gives the Longitude first followed by the Latitude in the Gridded data. This Cell only contains 1 single station in the entire history of the GISS record: Alert, Canada. Every other station is over 250 km away with the closest being Eureka:
YEAR GRID 1200 GRID 250
1947 3.0455 x
1948 1.5131 x
1949 -0.3284 x
1950 -0.512 x
1951 -0.1485 -0.2284
1952 0.1464 -0.1325
1953 0.8444 0.8591
1954 0.9805 1.0925
1955 0.3627 0.2383
1956 -0.0972 -0.2492
1957 0.1697 0.1716
1958 0.664 0.7425
1959 0.4527 0.4633
Now notice that at the 250km radius the data goes back only to 1951 but at 1200km radius the data goes back to 1947. The reason for that is there is one station within 1200km of the grid center that dates back to 1947 and that stations anomaly has become by default the anomaly for that grid. Basically in 1947 that station was given 100% weigh,t in 1951 it was given very little weight since the Alert station within 250 km of the grid center was opened. That changing of weight is an artificial bias in the data that warps the trend for it. Now imagine how many other grids have similar weighting problems.
A couple of years ago I thought I understood how GISS temperatures were worked out. I then realised it was far more complex and confusing than I first thought. To this day I have no clear idea as to how data is collected and ends up as a definitive figure and it is therefore very difficult to comment on it.
So for those that do understand how the data is collected, what the rest of us need is a clear route map of how you get from A to Z-in other words how is ‘raw’ data collected and turned into the figures that would be pored over by the worlds climate scientists?
I agree with galloping camel; gridding is just another opportunity to introduce errors. In an earlier post https://diggingintheclay.wordpress.com/2011/01/04/which-was-the-warmest-decade/ I estimated than only about 20% of 5° lat x 5° long cells have any stations; it’s like trying to make sense of a jigsaw with only one quarter of the pieces…
Satellites aren’t perfect, but I believe them a whole lot more than surface stations.
Pingback: No Average Year | Digging in the Clay
Pingback: Week in Review: April 2, 2011 | Climate Etc.