# On the meaning of independence in climate science

At long last jules and I have managed to submit the written version of a talk that I have given (bits of) no fewer than four times over the last few years (at NCAR, UKMO/HC, Schloss Ringberg and EGU). It had to get re-written several times and sit at the back of my mind on long bike rides and runs before it became acceptably coherent (to us, at least). I’m curious as to what people will think of it – it seemed to go down ok at the talks but sometimes it’s hard to tell…

The topic is “independence” as it pertains to both the understanding of ensembles such as CMIPn, and also to constraints on climate sensitivity. Our main point overall is that if you want to talk about independence in any context, you really need to present a mathematical/statistical formalisation that relates directly to the standard probabilistic definition: events A and B are independent iff P(A,B) = P(A)P(B). That is, the probability of A and B is the probability of A multiplied by the probability of B. This generalises to conditional independence: events A and B are conditionally independent given S iff P(A,B|S) = P(A|S)P(B|S). A more practically useful (but mathematically equivalent) formulation is that events A and B are conditionally independent given S iff P(A|S) = P(A|B,S) – that is, the conditional probability of A given both S and B is the same as the conditional probability of A given S. What this means in practice to an individual researcher is that, starting from their (probabilistic) prediction of A given knowledge of S, conditional independence of A and B rests on whether knowledge of B, in addition to knowing S, does or does not change their prediction of A. While this is no more than elementary probability theory, it seems to be an intuitively attractive way of addressing the question, eg in the case where A and B are observational constraints on the equilibrium climate sensitivity S. A point we also make in the paper is that these P()s are fundamentally subjective things just as much as a Bayesian prior is – there is no way of validating what observations should be seen in the case where S takes a value different from the real world, this counterfactual can only really exist in our heads and not in reality. In practice we often use models for this, which are themselves subjective creations, and in the paper we present an example to show how independence and non-independence of constraints can be investigated in the context of a toy model.

As for the question of model independence, the notation may be easier to interpret if we change the symbols and write something like P(M1|T) = P(M1|M2,T). This equation asserts that the models M1 and M2 are conditionally independent given the truth T. This is essentially the foundation of the truth-centred approach, and it would be great if true, but clearly many analyses of the models in CMIP ensembles have shown that it is not reasonable. An alternative conditionality condition, which we think is more relevant and interesting, is whether models are independent, conditional on the distribution of models. We illustrate how this does seem to encapsulate much of the discussion of model similarity, in that models from different research centres seem independent whereas pairs of models from the same research centres do not, according to a fairly straightforward analysis of model similarity.

It is quite possible – likely – that some others will be able to improve on how we’ve tried to define independence, but our point was really to argue that in principle we must use a mathematical foundation in order to make any meaningful progress – and also observe that mathematical definitions do exist which seem to match at least some real-world usage reasonably well.