The Shape of the Data
How many thermometers do you need to report accurately on climate change in a country the size of Canada? Let me remind you that is 3.5 million square miles – or 6.7% of the land area of the earth, and covering latitudes from 45N to 85N. If I said 500 would that sound about right? As long as they are representative of changes in the whole area and track the changing climate, perhaps we could manage with less – half that perhaps? A third? Ten percent?
From over 600 individual temperature series and more than 540 combined series with records of more than 20 years, the thermometer record in Canada peaked in approx. 1975 (see map, left), but has since been decimated by station dropout.
By 2009 there are less than 30 locations reporting temperature that are used by the Global Historical Climate Network (GHCN) prepared by the U.S. National Climate Data Center (NCDC); this data is also used as the input to NASA’s GIStemp program.
You can see the locations of the stations on the map (left) and the most obvious ‘hole’ is the lack of stations above latitude 60N. Yukon, Nunavut and Northwest Territories make up 39% of Canada, but between them have only four stations: Dawson and Whitehorse (Y), Eureka and Coral Harbour (NT).
However, much of what is strange about Canada’s temperature record in GHCN is not immediately obvious. E.M.Smith’s marathon effort of examining records by country produced a “hair graph” for Canada using his dT method. Here is a simplified version of it:
This shows both the gradual increase in numbers of temperature stations included in the GHCN record and the change in the data over time. Note how, when the number of thermometers in the station record falls of a cliff in 1990, there is a massive increase in the rate of change dT. It just suddenly takes off. It is also worth looking at E.M.Smith’s original graph (here) as he also plots the monthly data (the “hair”); in 1990 Canada gets a ‘haircut’ – suddenly the temperature record has less variability. This dT method is useful for looking at the data, although the data is not weighted in any way in this analysis.
Now it’s not just one sudden loss of stations. There is a gradual rise in the station count, but some of the stations are very short-lived and dropout after 20-30 years of reporting (GIStemp has a cut off of 20 years so that stations reporting for any period shorter than this will not be used in the final GIStemp output).You can see the effect of this in the graph below – a few stations are dropped each year, particularly in the 70s, then that sudden loss of almost 200 stations after 1989.
Note that the graph also reports whether the stations have a warming or cooling trend (having the data in a database [TEKtemp] is very useful). Deriving a trend can be a quick and dirty way of seeing ‘shape’ in the data, but trend can be very sensitive to the start and end years used and can be misleading. For example 1940-1970 is generally acknowledged as a period when global temperatures fell and records ending within or shortly after this period may be disproportionally affected by it. On the other hand it is worth noting that a large proportion of the records that dropped out in 1989/90 had an overall cooling trend. Their loss from the record leaves behind those with a warming trend. Is that what allows the dT to ‘take off’ after 1990? (it is worth reading what E.M. Smith says about “The Reveal” here). So just how important are those stations that had a cooling trend up to 1989/90? Were they in any way representative of their locality at the time? Would they still be showing a continuing cooling trend now? Of this, more, later.
This climate stuff is challenging me to come up with alternate ways of looking at the shape of the data. Scanning the datasets it looked as if there were a lot of short-lived data sets in the record for Canada, but how to show this visually? One of the graphs I came up with is this one which plots the year when a station begins reporting against the length of time it reports in the record:
I find this quite revealing. The GIStemp use of the records in GHCN v2.mean starts in 1880, and many of the stations that commence in the period 1880-1900 are still active until 1989/90, but only four of these survive in the current set for 2009. Does this matter? On the other hand many stations report for a short period of less than 40 years. It is notable that a lot of short-lived stations start reporting between 1950 and 1970 only to drop out rapidly after 20-30 years. What value do these add to the record?
I thought the proportion of short-lived stations did not seem normal, and, compared with the rest of the world it is not:
The dataset for Canada seems to have proportionally fewer long-lived stations and more short-lived ones than in the complete global GHCN dataset. This strikes me as odd. Why so many short-lived stations in Canada? I mean why bother including them if you only need a small number as representative of the whole country?
And what of the Adjustments?
We know there have to be adjustments. The purpose of these in GIStemp is for Urban Heat Island correction and increased homogeneity. This is how the GIStemp documentation explains it:
“The goal of the homogenization effort is to avoid any impact (warming or cooling) of the changing environment that some stations experienced by changing the long term trend of any non-rural station to match the long term trend of their rural neighbors,..”
Correction for any warming due to the growth of an urban area, rather than adjusting current temperatures, warms the older part of the record. This reduces the slope of the graph and decreases the warming trend.
“If no such neighbors exist or the overlap of the rural combination and the non-rural record is less than 20 years, the station is completely dropped; if the rural records are shorter, part of the non-rural record is dropped.”
So, adjustment can cause truncation of the station data if the adjusting rural record is shorter, and this can affect trend (which I have been looking at), however there is very little truncation in the Canadian record – lots of rural stations (<10,000 population) are present to adjust the urban records. However, a lot of rural stations are adjusted as well. This was a surprise to me. What’s happening here – increasing the homogeneity?. OK, a few examples (from TEKtemp). The table below lists the most extreme adjustments – either increasing or decreasing the trend of the station data (thumbnails of the graphs are below):
In just those eight stations there are five rural stations that get major adjustment, presumably because other rural stations in the local area tell a different story. Well, if you have two relatively close rural stations, one cooling and one warming, unless you examine them in great detail, how can you say which one is representative of the area? Hmm, I might come back to that.
Here is the overall shape of the adjustments (graph right). Note I have highlighted the adjustments that increase the warming trend in the data. There are a lot of very small adjustments, in fact 279 stations have either no adjustment or one that makes only a very minor difference to the slope (-0.01> <0.01 deg C / decade), but there are 78 that have an adjustment of at least 0.05 deg C / decade (0.5 deg C / century). That is more than 10% of all the stations in the Canadian record, and remember we are looking at a GLOBAL increase of only slightly more than this over the last century. Now I know I’m not a climate scientist and perhaps I’m just being stupid, but I really can’t see how or why you can validly make adjustments to data that cause an increase in the warming trend of say 2 degrees C per century [comments please…].
The effects? Well, while Toronto gets an appropriate adjustment for urban growth, others, such as Prince Albert, Saskatchewan, do not. Other stations in Canada seem to suffer this ‘wrong way’ adjustment too, but some of these have already been corrected with the reworking of the data following GISS’ updated nightlights adjustment (more here).
In summary it looks as if the shape of the climate data for Canada differs somewhat from many of the other areas I have looked at (but not yet written about). The ‘oddness’ includes: an overabundance of shortlived stations reporting into the dataset; lots of gaps and missing years, (which I did not cover, yet); quite a lot of wrong-way adjustment; a shifting base of stations with drop-out that reduces the current numbers to less than 5% of all stations. I have started to look at the data that is available from Environment Canada and this will be the focus of Part 2 (when I can get to it).