For anyone stumbling on this new…the Harry_read_me text file is 300-odd pages of commentary from a programmer at the Hadley CRU at the center of the Climategate leaks. This is one of the documents to make sense of in understanding the ‘issues’ at Hadley.
Having read this off and on in the last few days, I have taken a different approach while travelling today and used search terms as many have done with the emails. The term ‘synthetic’ threw up some intriguing hits.
Now there may be a perfectly innocent explanation for this (you see Prof Jones et al – this is why we like transparency). I have only the vaguest of gleaned knowledge on programming and I expect to be proved wrong on this, but here is what I found and what questions it has brought up in my mind (my ‘take’ on this is at the bottom).
“There is another problem: the values (vj – temperatures?) are anomalies, wheras the ‘public’ .grim files are actual values. So Tim’s explanations (in _READ_ME.txt) are incorrect..”
“Had a hunt and found an identically-named temperature database file which did include normals lines at the start of every station. How handy – naming two different files with exactly the same name and relying on their location to differentiate! Aaarrgghh!!”
vj – up to 16 then seems to deal with getting the temperature processing to run (anomdtb), which it does with a little modification.
vj – problems start with precipitation
vj – searched file using search function for ‘synthetic’
vj – point 31. details problems with synchronisation of temperature databases.
vj – Point 34. repeated in 35. – The program ‘says’:
I will create a set of absolute grids from a set of anomaly grids (in .glo format)….
“Lots of screen output, and lots of files. A set of synthetic grids in ‘syngrid_frs/’ as requested,… Having read the program it looks as though the latter files are absolutes, whereas the former are anomalies.”
vj- next a lot of references to ‘synthetic cloud data’
vj- I think this next section is describing that ‘synthetics’ make up the initial early years and are then gradually infiltrated and superseded by the increasing quantities or ‘real’ data as the years progress:
“Now it gets tough. The current model for a secondary is that it is derived from one or more primaries, plus their normals, plus the normals for the secondary.”
“The IDL secondary generators do not allow ‘genuine‘ secondary data to be incorporated. This would have been ideal, as the gradual increase in observations would have gradually taken precedence over the primary-derived synthetics.”
“Not particularly good – the bulk of the data being recent, less than half had valid normals (anomdtb) calculates normals on the fly, on a per-month basis). However, this isn’t so much of a problem as the plan is to screen it for valid station contributions anyway.”
“Then.. wait a minute! I checked back, and sure enough, quick_interp_tdm.pro DOES allow both synthetic and ‘real’ data to be included in the gridding.”
WELCOME TO THE DATABASE UPDATER
“So we will try the unaltered rd0 process on vap. It should be the same; a mix of synthetic and observed.”
“Though I already know what I’m going to find, don’t I? Because glo2abs isn’t going to do anything unusual, it just adds the normal and there you go. So if the absolutes are very similar, the anomalies will be, too.. hmm. Well, I *suppose* I could try producing two more copies of the output files – one with just synthetic data and one with just observed data? It’s only a couple of re-runs of the quick_interp_tdm2.pro IDL routine..”
vj – two runs produced and compared:
“Oh, ***. What is going on? Are we data sparse and just looking at the climatology? How can a synthetic dataset derived from tmp and dtr produce the same statistics as an ‘real’ dataset derived from observations?
Let’s be logical. Here are the two ‘separated’ gridding runs:”
“Those anomalies are mighty tiny, given that the absolutes are three-digit integers! Hardly surprising they’re not really appearing on the radar when added to normals typically two orders of magnitude higher! Even with the *10 in the glo2abs prog, we’re still looking at values around 0.06.”
“Looked at the observed anomalies (output from anomdtb.f90) – here the anomalies are larger! Between -5 and +5, roughly, which is what I’m used to seeing in .txt files.” (vj – I guess this would be -0.5 to +0.5C)
“I’m not actually convinced that the ‘country box’ approach is much cop. Better to examine each land cell and automagically mark any with excessions? Say 5 SD to begin with. Could then be extra clever and pull the relevant stations and find the source of the excession? “
“Data sources – the observed/synthetic split for secondary parameters.”
I kinda got fed up after this. My thoughts are:
‘Harry’, it seems, got the temperature side to work relatively quickly, but ran into problems with ‘cloud’ and other bits, because this is not just a temperature model but a climate model, to which cloud/sun and precipitation have been added. A surprise to ‘Harry’ (and to me) was that the programme produced anomalies first (+ or – deviations from a mean) rather than actual values, and it seemed to do this using ‘synthetic cloud data’ to which the actual data was added afterwards. This is not unusual for a model. But this set me thinking – what if each part of the model was derived like this?
Now I’m more familiar with the NASA Model Gistemp where it seems to start with actual temperatures and modify them by homogenisation both at an individual and gridbox level. At the end the Hadley Ocean data is blended in to give the familiar warming maps and warming graphs.
Starting with synthetic data makes sense when you have peripatetic climate stations and regions with sparse data for which coverage improves over the years. Yet such a model is both very open to accidental bias and allows for easier ‘fit’ to ‘conformity’ should the need arise. You’ll note I choose my words carefully.(Aside from any technical problems actually splicing in new stations or data updates from stations – don’t I know about trying to work out station locations – but more on that eventually).
I was also sent a link (by Tony Brown) to this blog and this file which I initially puzzled over, but I now think I have an insight into what the graphs say – and it is more than just regional temperature….
I’ll try to get to that real soon. Happy Thanksgiving Folks.