There was an article in the local paper about the spread of COVID-19 in the county. They have a paywall, so you might or might not be able to read the article. It was posted several days ago, and Letters To The Editor are no longer A Thing (being replaced by comment sections where everyone argues), so my thoughts about this article are going to have to go in my blog.

The gist of the story is that they claim that the number of cases of COVID-19 is growing the fastest in ZIP code 92154, an upside-down U shaped ZIP code right on the Mexican border. This is true, in one sense, because this is the ZIP code in San Diego County that has the most COVID-19 cases overall. You need to have, on average, the highest rate of cases per day in order to end up with the highest number of cases.

The paper notes:

To determine average growth rates, The San Diego Union-Tribune first calculated a change in cases between each day from March 31 through May 31, and then averaged these totals for an overall daily growth rate.

Assuming that I am understanding what they mean here, this is equivalent to taking the number of cases on May 31, subtracting the number of cases on March 31 and then dividing by 61. They are not taking into account any changes in the rate of growth over the course of the pandemic. Also, they are assuming that the growth rate is linear, which is fine because since the middle of April it has been fairly linear. I would have been happier if they had calculated growth rates over shorter time intervals. I haven’t yet built out the analysis to look at growth rates over time, but the data that I’ve seen suggests that a naive model of “92154 is growing the fastest” does not tell the whole story. Once I run all the data, I’ll have a better story to tell, but mine would probably be “Things were pretty bad near the border, especially in San Ysidro, but all the other places that you think of as ‘poor’ or ‘Mexican’ or ‘immigrants’ are starting to catch up pretty quickly.” Not as catchy of a hot take, I know. I’m planning on changing my “new cases GIF” to show a moving average rather than the daily data, so we’ll see what that shows.

One thing that the newspaper’s analysis entirely ignores is population. The population of 92154 is 81,645. On the day that the paper did its reporting (June 3), there were 707 reported cases in that ZIP code. This meant that 86.6 per 10,000 people in the ZIP code had contracted the disease. However, this ZIP code surrounds San Ysidro (92173), which had 339 cases out of a population of 27,741, or a rate of 122.20 cases per 10,000 population. You can see this by going to my COVID-19 tracking site and setting the map to June 3.

By neglecting how the rate of growth changed over time (what we in the math biz would call the second derivative) as well as the relative populations, the newspaper article misses the fact that both San Ysidro and its surrounding ZIP code both had roughly the same number of cases per capita in early May (in fact, San Ysidro had a slightly lower rate) but that San Ysidro now has significantly more cases per capita. Based on this information, I would say that COVID-19 is spreading faster in San Ysidro than it is in nearby areas.

I know that the local newspaper can’t get too sophisticated in terms of the math that it uses in articles for the general public, but I would have liked for them to acknowledge that the growth rate changes over time and that the impact on a ZIP code depends both on the number of cases as well as the population.