OK, it’s answer time for these questions. First, a little background. This is the paper, or rather, here it is to download. The questions were asked of over 100 psychology researchers and 400 students and virtually none of them got all the answers right, with more wrong than right answers overall.
The questions were modelled on a paper by Gigerenzer who had done a similar investigation into the misinterpretation of p-values arising in null hypothesis significance testing. Confidence intervals are often recommended as an improvement over p-values, but as this research shows, they are just as prone to misinterpretation.
Some of my commenters argued that one or two of the questions were a a bit unclear or otherwise unsatisfactory, but the instructions were quite clear and the point was not whether one might think the statement probably right, but whether it could be deduced as correct from the stated experimental result. I do have my own doubts about statement 5, as I suspect that some scientists would assert that “We can be 95% confident” is exactly synonymous with “I have a 95% confidence interval”. That’s a confidence trick, of course, but that’s what confidence intervals are anyway. No untrained member of the public could ever guess what a confidence interval is.
Anyway, the answer, for those who have not yet guessed, is that all of the statements were false, broadly speaking because they were making probabilistic statements about the parameter of interest, which simply cannot be deduced from a frequentist confidence interval. Under repetition of an experiment, 95% of confidence intervals will contain the parameter of interest (assuming they are correctly constructed and all auxiliary hypotheses are true) but that doesn’t mean that, ONCE YOU HAVE CREATED A SPECIFIC INTERVAL, the parameter has a 95% probability of lying in that specific range.
In reading around the topic, I found one paper which had an example which is similar to my own favourite. We can generate valid confidence intervals for an unknown parameter with the following procedure: with probability 0.95, say “the whole number line”, otherwise say “the empty set”. If you repeat this many times, the long-run coverage frequency tends to 0.95, as 95% of the intervals do include the true parameter value. However, for a given example, we can state with absolute certainty whether the parameter is either in or outside the interval, so we will never be able to say, once we have generated an interval, that there is 95% probability that the parameter lies inside that interval.
(Someone is now going to raise the issue of Schrödinger’s interval, where the interval is calculated automatically, and sealed in a box. Yes, in this situation we can place 95% probability on that specific interval containing the parameter, but it’s not the situation we usually have where someone has published a confidence interval, and it’s not the situation in the quiz).
And how about my readers? These questions were asked on both blogs (here and here) and also on twitter, gleaning a handful of replies in all places. Votes here and on twitter were majority wrong (and no-one got them all right), interestingly all three of the commenters on the Empty Blog were basically correct though two of them gave slightly ambiguous replies but I think their intent was right. Maybe helps that I’ve been going on about this for years there 🙂