Bad Graphs

Lots of people these days are making graphs about how the number of cases of COVID-19 is growing in a particular location. Since the data is messy, people are having a hard time extrapolating into the future from these graphs. Well, epidemiologists and other experts probably know what they are doing, but the rest of us are just squinting at the graphs and trying to figure out what “flattening the curve” should look like.

There is a physicist with a viral YouTube video in which he promotes a new way of graphing this data. Can we still say “viral” in this situation or has that become insensitive? Normally I despise all videos on the internet (for various reasons), but I watched this one because my smart friends seemed to like it – and to like the graph that the physicist was hawking.

I’m not linking to his video because I am petty.

I’m also too lazy to draw some graphs of my own, so this is going to be all math and words. Sorry not sorry.

In any event, the physicist is drawing a graph that he says doesn’t include time but is actually a parametric graph in which time is the parameter. The \(x\)-axis of his graph is the log of the number of cases, and the \(y\)-axis of his graph is the log of the number of new cases. This is already sufficiently obfuscated that most people are not actually understanding the underlying ideas. Even those in quantitative fields might not realize that this use of a log-log plot means that there are some shenanigans going on and that we should be wary of this graph.

Now, he admits that he’s not actually graphing the log of the number of new cases on the \(y\)-axis because that was too noisy. He is graphing the log of the average number of new cases per day, averaged over the past week. Those of us in the math biz should recognize “the number of new cases per day” as “the rate of change of the number of cases,” which is to say “the derivative.”

So to recap, we have some function \(f(t)\) that represents the number of cases on day \(t\), and we are drawing a parametric curve of the form \((x(t), y(t))\) where \(x(t) = \log(f(t))\) and \(y(t) = \log(f’(t))\).

Since we’re in the exponential growth part of the pandemic, we can approximate \(f(t) = Ce^{kt}\). We also have \(f’(t) = Cke^{kt}\). So plugging these into the physicist’s parametric equations, we are plotting points of the form \[(\log(Ce^{kt}), \log(Cke^{kt})),\] which simplifies to \[(\log(C) + kt, \log(C) + \log(k) + kt).\] We can eliminate the parameter \(t\), which makes our plot the much more comprehensible \(y = x + \log(k)\).

This is why the plots from so many countries ended up on the same line: Because if the underlying model is exponential, this log-log plot is going to give you a line of slope 1 and with intercept \(\log(k)\). But based on the scale of the plot (and the noise in the data), you are not going to see much difference in the value of \(\log(k)\) over the sorts of time scales that we are looking at.

In plain language, this physicist’s graph can not give us any information about whether we are flattening the curve. This graph can make it really clear once we’re no longer following an exponential model, but it does not make it easy to see how the parameters of an exponential model change over time.