A previous post looked at the how the number of stations used in reporting climatic temperature data through the NOAA GHCN database has varied since 1880. Here, before starting to examing the effects of adjustment, I’m simply looking at how much of the data is adjusted. It’s a lot actually. I’m not going to do too much discussion here; I just really want to let the graphs speak for themselves.
(black line) percentage that are adjusted by year.
are expressed as a percentage of the total number of stations.
A guest post at Die Klimazwiebel today from Reinhard Böhm of the Central Institute for Meteorology and Geodynamics (ZAMG) in Austria, discusses the need for adjustment (homogenization) and he is concerned that access to unadjusted data can result in its ‘misuse’ by those who do not understand the inherent biases that require adjustment. What he says is important and I agree with his reasoning for a lot of it. He then says:
“I can advise everyone to use original data only for controlling the quality or the respective homogenization attempts but not for analysis itself if the goal is a timeframe of 20 years or more – a length usually necessary to gain statistically significance at the given high frequent variability of climate.”
Just one thing to point out here, there are adjustments and adjustments. The NOAA GHCN ‘Raw’ data is already adjusted (for time of observation, station history etc.). There is then a further set of homogenisation done by either GHCN or GISS and these adjustment have been the focus of our analysis. [Update 23rd Jan. After checking NCDC documentation here I can see I was wrong – v2.mean is ‘raw’ data.]
Adjustments are an integral part of temperature station data and climate analysis, but they should be necessary and appropriate. So far in our analysis we have found a lot of adjustment that seems to be neither, or at least it is not clear how some of the adjustements we see can be justified as either. However, what our approach allows us to do is to isolate sub-sections of data very rapidly for comparison and analysis. So far what we have found is interesting….. (but you’ll still have to be patient, Andy).
I am fascinated to see where all this goes. Compliments to you for your work.
I think of a possible statistical test. Adjustments and real variability should be statistically independent, shouldn’t they ? it means that the variance of RAW data series should be the sum of the variance of the ADJUSTED (supposed to be “real”) data and the adjustments. If adjustments are made “in some direction”, that shouldn’t be true anymore. I don’t know how to incorporate the adjustments done by a mere suppression of data, but i’m not really an expert in statistics, do you have an idea ?
Sorry to be so late, but, I notice you changed your mind that v2 is RAW instead of adjusted. I followed the link you provided and it seems to be describing how the data is homogenized and manipulated to clean it up and make it useable. isn’t that considered adjustments?
The pdf says: “There are 3 raw temperature datasets (monthly maximum, monthly minimum, and monthly mean) and 3 adjusted datasets. Each of the 3 datasets has a corresponding station inventory file. There is also one country code file containing a 3 digit id for each country in this dataset.
The three raw data files are:
v2.mean; v2.max; v2.min
The versions of these data sets that have data which were adjusted to account for various non-climatic inhomogeneities are:
v2.mean.adj; v2.max.adj; v2.min.adj”
I always thought that v2.mean ‘raw’ data was still adjusted for time of observation and station moves, but this paper doesn’t say so. The homogenisation (adjustment for homogeneitiy with ‘nearby’ stations) kicks in in the adjusted data set.
Thank you, apparently I didn’t read it all!! 8>)