The Low-Down: The Reason Covid-19 Epidemiology Models So Often Disagree

The reality is that the models almost all use different input factors and different data to attempt to understand various outcomes, some of which are similar or overlapping - or simply related. The result is that the metrics they generate are going to be different - and that even within specific models they will change over time as new and better data become available.

Which is to say, there are only estimates, not certainties. JL

John Timmer reports in ars technica:

Why have models produced so many different numbers, and why have the numbers seemingly changed so often? Models don't just have to be adjusted for the disease; they have to handle our own behavior. And there are a lot of options. Different approaches will produce different numbers. It's not a question of whether the numbers are right or wrong; not even whether they're useful or not. The key question is whether they're appropriate for a specific use. Over time, the data has improved, giving researchers. So they've updated and re-run their models and ended up with different results.

One of the least expected aspects of 2020 has been the fact that epidemiological models have become both front-page news and a political football. Public health officials have consulted with epidemiological modelers for decades as they've attempted to handle diseases ranging from HIV to the seasonal flu. Before 2020, it had been rare for the role these models play to be recognized outside of this small circle of health policymakers.

Some of that tradition hasn't changed with the SARS-CoV-2 pandemic. International bodies, individual countries, most states, and even some cities have worked with modelers to try to shape policy responses to the threat of COVID-19. But some other aspects of epidemiological modeling life clearly have changed. The models, some of which produce eye-catching estimates of fatalities, have driven headlines in addition to policy responses. And those policy responses have ended up being far more controversial than anyone might have expected heading into the pandemic.

With the severity of COVID-19, it's no surprise that there has been increased scrutiny of epidemiological models. Models have become yet another aspect of life embroiled in political controversy. And it's fair for the public to ask why different models—or even the same model run a few days apart—can produce dramatically different estimates of future fatalities.

What's much less fair is that the models and the scientists behind them have come under attack by people who don't understand why these different numbers are an expected outcome of the modeling process. And it's downright unfortunate that these attacks are often politically motivated—driven by a focus on whether the numbers are convenient from a partisan perspective.

So why have models produced so many different numbers, and why have the numbers seemingly changed so often? There's no simple answer to those questions. But that's only because there are a lot of pretty simple answers.

There are different models

The fact that we refer to "models" (plural) should indicate that there's more than a single software package that we just drop a few numbers into. Instead, many researchers have developed models, motivated by the desire to solve different problems or because they felt that a different approach would produce more accurate numbers. Almost all of these (see sidebar) are based on a simple premise: diseases are spread when humans come into contact with one another, so the model has to account for a combination of these contacts and the disease's properties.

The disease's properties tend to be things like what percentage of contacts results in the transfer of an infection, how long a person remains infectious, the disease's incubation period, and so on. These considerations will vary from disease to disease. HIV, for example, is primarily transferred through activities like intercourse and sharing needles, so it spreads much more rarely than the flu, which can be spread when two people simply share the same space.

Other diseases, like malaria and Dengue fever, involve an intermediate host for spread, so a model that focuses on direct person-to-person interactions won't be sufficient. A completely different approach to modeling—one that takes into account things like mosquito control—may be required for these diseases.

In any case, the models don't just have to be adjusted for the disease; they have to handle our own behavior, as well. And there are a lot of options here. The Imperial College model that helped drive policy in the US and UK early in the pandemic is incredibly sophisticated, taking into account things like average classroom and office sizes to estimate likely transmission opportunities. Other models have used cellphone data to inform contact estimates. Still others may take much simpler approaches to estimating human contact, trading a bit of precision for the ability to perform multiple model runs quickly.

Naturally, the different approaches will produce different numbers. It's not a question of whether the numbers are necessarily right or wrong; it's not even a question of whether they're useful or not. The key question is whether they're appropriate for a specific use.

Different things go into the models

We mentioned just above that the models need to have the properties of the disease supplied. But unlike the flu, we simply don't have definitive numbers for a recently emerged pathogen like SARS-CoV-2. We know people are infectious in advance of the onset of symptoms, but how far in advance? How long do they remain infectious? How long after infection do they start experiencing symptoms?

For now, we at least have estimates for all of these numbers. In fact, we have more than one estimate for modelers to choose from. Should they take the numbers for something like a cruise ship, where the small, contained population can help provide a degree of precision to the estimates? Do they take numbers from a country like Korea, where contact tracing was done efficiently? That gives us a good sense of what transmission looks like in a mobile population. But South Korea also managed to isolate cases effectively, making it a poor model for many other countries. Finally, data from a country like Italy may provide some good overall estimates of the disease's progression, but that data will suffer from limited overall testing and a likely undercount of the total fatalities.

There are logical cases to be made for using any of these numbers, and researchers can reasonably disagree over what the "best" properties to feed into their models. But again, the different choices will almost certainly produce somewhat different numbers.

These things change

At the start of the pandemic, when the first models were being built, we didn't necessarily have a good grip on any of these numbers. Korea was just beginning its contact tracing, cruise ships were just starting to run into problems, and most of the data on the virus's spread came out of a few hospitals in China. Over time, the data we have has improved dramatically, giving researchers a wealth of choices. So, naturally, they've updated and re-run their models and almost certainly ended up with different results.

That's science working as it should, making improvements as better data comes in.

And it's not just the disease itself that changes the equation. For instance, an early model we looked at simply assumed that compliance with social distancing rules would end up being 75 percent. Not an unreasonable assumption, and one that was perhaps based on experience with earlier pandemics. But now we know how different political authorities define essential jobs, which aren't subject to isolation rules. We've tracked how well people comply with distancing using cell phone data. We can make far more informed estimates of the compliance value and use those in the model. Similar things are true for the properties of SARS-CoV-2, as well.

Lots of factors like these will change the estimates that the models produce, but ideally, the changes are making the models more accurately reflect reality.

Models are used to ask different questions

So if the models are increasingly reflecting reality, why do some of them—sometimes using the same underlying code—produce such radically different numbers? To answer that, you have to think about what the models end up being used for.

It's important to run models with realistic assumptions in order to validate that they can accurately reflect reality. But it can often be valuable to run them with unrealistic assumptions to help understand how to craft responses. For example, early COVID-19 models were often run with no policy or public responses to the pandemic factored in. That kind of model is very unlikely to reflect reality, but it can be useful to understand a worst-case scenario. After all, if the worst case had been a relatively small number of deaths, that would have drastically altered the policy responses that were chosen.

Similarly, different model runs can be used to discriminate among reasonable situations, such as alternate policy responses. What happens to our hospital capacity if we shut schools? If we shut down public transit? What if we do both? All of those are the sorts of things that can be explored using epidemiological models.

And in fact, these ideas were explored. The Imperial College model mentioned above was run once with no policy responses, and then additional runs selected a variety of policy responses: closing schools, family-level quarantine of those infected, social distancing of the elderly population, and so on. Then, even more runs were made with different combinations of restrictions in order to identify the combination that provided the greatest benefit with the fewest restrictions.

Running all those models necessarily produced very different estimates of outcomes like total infections and deaths each time a model was run. That's not a problem with the models—in fact, if the different variations hadn't produced different outcomes, it would probably be an indication that there was something wrong with the model. Critically, it's precisely because these model runs produce different outcomes that epidemiologists can provide valuable information and help provide a rational basis for policy.

Models vs. cheap shots

If none of these details sound unreasonable, that's the point. All of the work so far, even when it produced radically different numbers, was done for solid scientific and policy reasons. And while it may be easy to use these extremely divergent numbers to take cheap shots at the models and the scientists who build them, the reality is that they're part of science operating as it should.

As Ars' Scott Johnson recently described, models are a way of doing experiments; they allow us to use the known rules that govern a system to find out what happens under conditions we could not otherwise create due to physical or ethical issues. In doing so, they let us put those rules to test against real-world outcomes. Lots of factors go in to estimating what those rules are, and we'll often test a variety of conditions. These differences mean that the models will produce different numbers.

What scientists look for out of this process is a growing consensus. When models that are structured differently try out the same set of conditions and start producing similar outcomes, then scientists will start feeling confident that their models have the basic outlines of the system right: they've chosen the right properties for the virus, they have an accurate take on the human behavioral responses, and so on. If their models instead consistently produce different results, then it's an opportunity to look at the models more carefully to see what details drive the differences in outcomes.

What's a non-scientist to do while that consensus is still emerging? When you see different numbers produced by the models, focus on the important questions: what were those models looking at, how do the conditions they were testing differ, and what assumptions went into them? It's not enough to know they're different numbers—it's important to have a sense of why they're different.

And remember that while any given model run shouldn't be viewed as the final word on what reality will look like, relying on a model will be a lot better than trying to set policy without any idea of what the outcome might be.