Another new manuscript of ours out for review, this time on ESDD. The topic is as the title suggests. This work grew out of our trips to Hamburg and later Stockholm though it wasn’t really the original purpose of our collaboration. However, we were already working with a simple climate model and the 20th century temperature record when the Cox et al paper appeared (previous blogs here, here, here) so it seemed like an interesting and relevant diversion. Though the Cox et al paper was concerned with emergent constraints, this new manuscript doesn’t really have any connection to this one I blogged earlier though it is partly for the reasons explained in that post that I have presented the plots with S on the x-axis.

A fundamental point about emergent constraints, which I believe is basically agreed upon by everyone, is that it’s not enough to demonstrate a correlation between something you can measure and something you want to predict, you have to also present a reasonable argument why you expect this relationship to exist. With 10^6 variables to choose from in your GCM output (and an unlimited range of functions/combinations thereof) it is inevitable that correlations will exist, even in totally random data. So we can only reasonably claim that a relationship has predictive value if it has a theoretical foundation.

The use of variability (we are taking about the year-to-year variation in global mean temperature here after any trend has been removed) to predict sensitivity has a rather chequered history. Steve Schwartz tried and failed to do this, perhaps the clearest demonstration of this failure being that the relationship he postulated to exist for the climate system (founded on a very simple energy balance argument) did not work for the climate models. Cox et al sidestepped this pitfall by the simple and direct technique of presenting a relationship which had been directly derived from the ensemble of CMIP models, so by construction it worked for these. They also gave a reasonable-looking theoretical backing for the relationship, which was based on an analysis of a very simple energy balance argument. So on the face of it, it looked reasonable enough. Plenty of people had their doubts though as I’ve documented in the links above.

Rather than explore the emergent constraint aspect in more detail, we chose to approach the problem from a more fundamental perspective: what can we actually hope to learn from variability? We used the paradigm of idealised “perfect model” experiments, which enables us to generate very clear limits to our learning. The model we used is more-or-less the standard two layer energy balance of Winton, Held etc that has been widely adopted, but with a random noise term (after Hasselmann) added to the upper layer to simulate internal variability:

The single layer model that Cox et al used in their theoretical analysis is also recovered when the ocean mixing parameter γ is set to zero. So now the basic question we are addressing is, how accurately can we diagnose the sensitivity of this energy balance model, from analysis of the variability of its output?

Firstly, we can explore the relationship (in this model) between sensitivity S and the function of variability which Cox et al called ψ.

Focussing firstly on the fat grey dots, these represent the expected value of ψ from an unforced (ie, due entirely to internal variability) simulation of the single-layer energy balance model that Cox et al used as the theoretical foundation for their analysis. And just as they claimed, these points lie on a straight line. So far so good.

But…

It is well known that the single layer model does a pretty shabby job at representing GCM behaviour during the transient warming over the 20th century, and the two-layer version of the energy balance model gives vastly superior results for only a small increase in complexity. (This is partly why the Schwartz approach failed). Repeating our analysis with the two-layer version of the model, we get the black dots, where the relationship is clearly nonlinear. This model was in fact considered by the Cox group in a follow-up paper Williamson et al in which they argued that it still displayed a near-linear relationship between S and ψ over the range of interest spanned by GCMs. That’s true enough as the red line overlying the plot shows (I fitted that by hand to the 4 points in the 2-5C range) but there’s also a clear divergence from this relationship for larger values of S.

And moreover…

The vertical lines through each dot are error bars. These are the ±2 standard deviation ranges of the values of ψ that were obtained from a large sample of simulations, each simulation being 150 years long (a generous estimate of the observational time series we have available to deal with). It is very noticeable that the error bars grow substantially with S. This together with the curvature in the S-ψ relationship means that it is quite easy for a model with a very high sensitivity to generate a time series that has a moderate ψ value. The obvious consequence being that if you see a time series with a moderate ψ value, you can’t be sure the model that generated it did not have a high sensitivity.

We can use calculations of this type to generate the likelihood function p(ψ|S), which can be thought of as a horizontal slice though the above graph at a fixed value of ψ, and turn the handle of the Bayesian engine to generate posterior pdfs for sensitivity, based on having observed a given value of ψ. This is what the next plot shows, where the different colours of the solid lines refer to calculations which assumed observed values for ψ of 0.05, 0.1, 0.15 and 0.2 respectively.

These values correspond to the *expected* value of ψ you get with a sensitivity of around 1, 2.5, 5 and 10C respectively. So you can see from the cyan line that if you observe a value of 0.1 for ψ, that corresponds to a best estimate sensitivity of 2.5C in this experiment, you still can’t be very confident that the true value wasn’t rather a lot higher. It is only when you get a really small value of ψ that the sensitivity is tightly constrained (to be close to 1 in the case ψ=0.05 shown by the solid dark blue line).

The 4 solid lines correspond to the case where only S is uncertain and all other model parameters are precisely known. In the more realistic case where other model parameters such as ocean heat uptake are also somewhat uncertain, the solid blue line turns into the dotted line and in this case even the low sensitivity case has significant uncertainty on the high side.

It is also very noticeable that these posterior pdfs are strongly skewed, with a longer right hand tail than left hand (apart from the artificial truncation at 10C). This could be directly predicted from the first plot where the large increase in uncertainty and flattening of the S-ψ relationship means that ψ has much less discriminatory power at high values of S. Incidentally, the prior used for S in all these experiments was uniform, which means that the likelihood is the same shape as the plotted curves and thus we can see that the likelihood is itself skewed, meaning that this is an intrinsic property of the underlying model, rather than an artefact of some funny Bayesian sleight-of-hand. The ordinary least squares approach of a standard emergent constraint analysis doesn’t acknowledge or account for this skew correctly and instead can only generate a symmetric bell curve.

One thing that had been nagging away at me was the fact that we actually have a full time series of annual temperatures to play with, and there might be a better way of analysing them than to just calculate the ψ statistic. So we also did some calculations which used the exact likelihood of the full time series p({Ti}|S) where {Ti}, i = 1…n is the entire time series of temperature anomalies. I think this is a modest novelty of our paper, no-one else that I know of has done this calculation before, at least not quite in this experimental setting. The experiments below assume that we have perfect observations with no uncertainty, over a period of 150 years with no external forcing. Each simulation with the model generates a different sequence of internal variability, so we plotted the results from 20 replicates of each sensitivity value tested. The colours are as before, representing S = 1, 2.5 and 5C respectively. These results give an exact answer to the question of what it is possible to learn from the full time series of annual temperatures in the case of no external forcing.

So depending on the true value of S, you could occasionally get a reasonably tight constraint, if you are lucky, but unless S is rather low, this isn’t likely. These calculations again ignore all other uncertainties apart from S and assume we have a perfect model, which some might think just a touch on the optimistic side…

So much for internal variability. We don’t have a period of time in the historical record in which there was no external forcing anyway, so maybe that was a bit academic. In fact some of the comments on the Cox paper argued (and Cox et al acknowledged in their reply) that the forced response might be affecting their calculation of ψ, so we also considered transient simulations of the 20th century and implemented the windowed detrending method that they had (originally) argued removed the majority of the forced response. The S-ψ relationship in that case becomes:

where this time the grey and black dots and bars relate not to one and two layer models, but whether S alone is uncertain, or whether other parameters beside S are also considered uncertain. The crosses are results from a bunch of CMIP5 models that I had lying around, not precisely the same set that Cox et al used but significantly overlapping with them. Rather than just using one simulation per model, this plot includes all the ensemble members I had, roughly 90 model runs in total from about 25 models. There appears to be a vague compatibility between the GCM results and the simple energy balance model, but the GCMs don’t show the same flattening off or wide spread at high sensitivity values. Incidentally the set of GCM results plotted here don’t fit a straight line anywhere nearly as closely as the set Cox et al used. It’s not at all obvious to me why this is the case, and I suspect they just got lucky with the particular set of models they had combined with the specific choices they made in their analysis.

So it’s no surprise that we get very similar results when looking at detrended variability arising from the forced 20th century simulations. I won’t bore you with more pictures as this post is already rather long. The same general principles apply.

The conclusion is that the theory that Cox et al used to justify their emergent constraint analysis, actually refutes their use of a linear fit using ordinary least squares, because the relationship between S and ψ is significantly nonlinear and heteroscedastic (meaning the uncertainties are not constant but vary strongly with S). The upshot is that any constraint generated from ψ – or even more generally, any constraint derived from internal or forced variability – is necessarily going to be skewed with a tail to high values of S. However, variability does still have the potential to be somewhat informative about S and shouldn’t be ignored completely, which many analyses based on the long-term trend automatically do.