Temperature stations : how many have data adjusted?

A previous post looked at the how the number of stations used in reporting climatic temperature data through the NOAA GHCN database has varied since 1880. Here, before starting to examing the effects of adjustment, I’m simply looking at how much of the data is adjusted. It’s a lot actually. I’m not going to do too much discussion here; I just really want to let the graphs speak for themselves.

Figure 1. Number and breakdown by WMO Region of stations for which there is [A] raw data and [B] adjusted data in the NOAA/GHCN database.

Figure 1 shows how the total number of stations is divided by WMO region for both ‘raw’ and adjusted data over the period 1880 to 2009.  It is probably easier to look at Figure 2 to see how much of the data is adusted. On a percentage basis more than 70% of the data is adjusted in some way up to 1990, when there is not only a massive drop-off in the number of stations, but also a drop in adjustment.  After 2006, we are not only left with about 800 stations reporting temperature to the database from across the globe, but the percentage of those that are adjusted in some way falls to 20-30%. Why? Is it just ‘high quality’ stations that remain? 
Figure 2. Graph showing the total number of stations in the NOAA/GHCN database and the
(black line) percentage that are adjusted by year.

Looking at the adjustments by region is quite revealing too.  In Figure 3 there is quite a bit of variability in the percentage adjustments by region.  Antarctica starts with 100% adjustment becuase data from Base Orcadas (60S) is modified and pasted into the record prior to any stations below 64S reporting data in 1945.  Data from Europe, Asia and the South-West Pacific (Australia, New Zealand, Indonesia, Malaysia etc.) are adjusted most, whilst African data is adjusted least.
Figure 3. Regional percentages of adjusted data: numbers of adjusted stations for each region
are expressed as a percentage of the total number of stations.
The data for individual regions is perhaps easier to see in Figure 4. What intrigues me is why it is so variable by region (Africa vs Asia) and the reasons may come to light when we examine the adjustment of rural vs urban data.  And then what is up with 1990? Why that sudden drop off? (I must go and look at some station metadata).

A guest post at Die Klimazwiebel today from Reinhard Böhm of the Central Institute for Meteorology and Geodynamics (ZAMG) in Austria, discusses the need for adjustment (homogenization) and he is concerned that access to unadjusted data can result in its ‘misuse’ by those who do not understand the inherent biases that require adjustment. What he says is important and I agree with his reasoning for a lot of it. He then says:

“I can advise everyone to use original data only for controlling the quality or the respective homogenization attempts but not for analysis itself if the goal is a timeframe of 20 years or more – a length usually necessary to gain statistically significance at the given high frequent variability of climate.”

Just one thing to point out here, there are adjustments and adjustments.  The NOAA GHCN ‘Raw’ data is already adjusted (for time of observation, station history etc.). There is then a further set of homogenisation done by either GHCN or GISS and these adjustment have been the focus of our analysis. [Update 23rd Jan. After checking NCDC documentation here I can see I was wrong – v2.mean is ‘raw’ data.]

Adjustments are an integral part of temperature station data and climate analysis, but they should be necessary and appropriate.  So far in our analysis we have found a lot of adjustment that seems to be neither, or at least it is not clear how some of the adjustements we see can be justified as either.  However, what our approach allows us to do is to isolate sub-sections of data very rapidly for comparison and analysis.  So far what we have found is interesting….. (but you’ll still have to be patient, Andy).

This entry was posted in Uncategorized. Bookmark the permalink.

5 Responses to Temperature stations : how many have data adjusted?

  1. TheSkyIsFalling says:

    I am fascinated to see where all this goes. Compliments to you for your work.

  2. Anonymous says:

    I think of a possible statistical test. Adjustments and real variability should be statistically independent, shouldn’t they ? it means that the variance of RAW data series should be the sum of the variance of the ADJUSTED (supposed to be “real”) data and the adjustments. If adjustments are made “in some direction”, that shouldn’t be true anymore. I don’t know how to incorporate the adjustments done by a mere suppression of data, but i’m not really an expert in statistics, do you have an idea ?

  3. kuhnkat says:

    Sorry to be so late, but, I notice you changed your mind that v2 is RAW instead of adjusted. I followed the link you provided and it seems to be describing how the data is homogenized and manipulated to clean it up and make it useable. isn’t that considered adjustments?

    • Verity Jones says:

      The pdf says: “There are 3 raw temperature datasets (monthly maximum, monthly minimum, and monthly mean) and 3 adjusted datasets. Each of the 3 datasets has a corresponding station inventory file. There is also one country code file containing a 3 digit id for each country in this dataset.
      The three raw data files are:
      v2.mean; v2.max; v2.min
      The versions of these data sets that have data which were adjusted to account for various non-climatic inhomogeneities are:
      v2.mean.adj; v2.max.adj; v2.min.adj”

      I always thought that v2.mean ‘raw’ data was still adjusted for time of observation and station moves, but this paper doesn’t say so. The homogenisation (adjustment for homogeneitiy with ‘nearby’ stations) kicks in in the adjusted data set.

Comments are closed.