New manuscript! Now up on github too.

Title as post. Yes, this is us dipping our toes into epidemiology. Turns out that calibrating a simple model with observational data is much the same whether it’s paleoclimate or epidemics. The maths and the methods are much the same. In fact this one is a particularly easy one as the model is embarrassingly linear (once you take the logarithm of the time series). I’ve been posting my analyses on Twitter and the other blog, but since this is a real paper with words and figures and references and stuff, it can go here too (plus, I can upload a pdf here unlike blogspot).

We have been doing a very straightforward MCMC calibration of a simple SEIR model (equivalent of energy balance box model in climate science, pretty much). The basic concept is to use the model to invert the time series of reported deaths back through the time series of underlying infections in order to discover the model parameters such as the famous reproductive rate R. It’s actually rather simple and I am still bemused by the fact that none of the experts (in the UK at least) are doing this. I mean what on earth are mathematical epidemiologists actually for, if not this sort of thing? They should have been all over this like a rash. The exponential trend in the data is a key diagnostic of the epidemic and the experts didn’t even bother with the most elementary calibration of this in their predictions that our entire policy is based on. It’s absolutely nuts. It’s as if someone ran a simulation with a climate model and presented the prediction without any basic check on whether it reproduced the recent warming. You’d get laughed out of the room if you tried that at any conference I was attending. By me if no-one else (no, really, I wouldn’t be the only one).

Anyway, the basic result is that the method works like a charm and we can reliably deduce the changes in R due to imposed controls, and it looks increasing clear that it’s been less than 1 in the UK for several weeks now, while the experts are still talking about the peak being a couple of weeks away. The whole experience is just…so strange.

Anyway, I did try talking politely to some of the experts but just got brushed off which may partly explain the tone in the manuscript. Or maybe that’s just me 🙂

The paper has been submitted to medrxiv but who knows what they will make of it. My experiences when I have poked my nose into other peoples’ fields has not usually be a very encouraging one so I’m half expecting them to reject it anyway. So be it.

Here is today’s forecast to encourage you to read the paper.

Not sure if you are using the right data? The daily announced deaths are not the daily deaths. They are a catch-up of all non-announced deaths going back as far as necessary. The data for England is easily available here: https://www.england.nhs.uk/statistics/statistical-work-areas/covid-19-daily-deaths/

It shouldn’t make too much difference, but the actual data is available, but not daily.

Yes thanks I’m aware of this issue. So long as the reporting protocol is broadly consistent (eg, most deaths announced actually happened 1-3 days ago), it doesn’t affect estimates of the growth *rate* (though it does mean the current true epidemic size will be significantly underestimated in a growth phase as today’s true deaths will be bigger than the daily report). The problem with using the true date of death data – which I agree is better in principle – is that it is not yet complete for the last few days, so these have to be ignored and we would have to wait longer to get a reliable estimate of Rt. (I suppose with care it might be possible to bias-correct the data for the last few days, but I think this would be a big correction and subject to a lot of uncertainty.)

It may be interesting at some point in the future to go back and do a simulation using the best corrected data to see how different things were from how we think they are now though!

How is this better than

Which uses MCMC calibration on intervention dates, using more data at higher resolution, and a more sophisticated disease model?

It’s an interesting question and I’m not entirely sure why our approach is better, but empirically it clearly is as can be seen from looking at the reliability of the actual forecasts they are producing with this system. I suspect it is because the extra structure they are imposing to pool analyses between countries means that they can’t actually fit the different effects in each one properly. I also wonder if the statistical model might be failing to represent the time lags in the true dynamics properly, but don’t really know if that is the case.

Hi from New Zealand. It would Interesting to do a hindcast/forecast on our little islands’ trajectory, particularly as, with a hard lock-down, we are busily foobarring our economy for (so far) 10 deaths, all in the over-60’s, over a population of 5 million.

https://github.com/folkehelseinstituttet/spread

https://www.fhi.no/sv/smittsomme-sykdommer/corona/koronavirus-modellering/

Hi Dr. Annan, I’m curious about the specifics of the function used to calculate deaths from a vector of infectious cases in your model since it is only mentioned briefly in your paper (& not really discussed in the paper from Fergusson et al). Do you have a reference for the formula used? (or, can you explain the reasoning behind it?).

It’s outlined more clearly in this other IC report:

https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-13-europe-npi-impact/

see their appendix.

Thank-you! The report appendix had a very good explanation.